Hardware benchmarking for HASH 3 (for non Hardware designers)
|
|
- Carmella Lambert
- 6 years ago
- Views:
Transcription
1 Hardware benchmarking for HASH 3 (for non Hardware designers) Ingrid Verbauwhede ingrid.verbauwhede-at-esat.kuleuven.be K.U.Leuven, COSIC Computer Security and Industrial Cryptography with input from: Junfeng Fan, Miroslav Knezevic, Patrick Schaumont Slides from: own Course notes, Rabaey s Digital Integrated Circuit KULeuven - COSIC Tenerife, Hash 3 1 Nov 2009
2 Outline Goal of hardware design What is hardware design? What are the different options? What are the different contexts? How to compare hardware design: benchmark Where are we now? KULeuven - COSIC Tenerife, Hash 3 2 Nov 2009
3 HW - SW continuum When Hardware design? KULeuven - COSIC Tenerife, Hash 3 3 Nov 2009
4 When Hardware design? Fast Small Low power Security (Analog, RF) HW HW SW continuum KULeuven - COSIC Tenerife, Hash 3 4 Nov 2009
5 HW-SW continuum HW HW-SW SW ASIC FPGA Domain specific DSP VLIW General purpose High Area efficiency Intel AES-NI Westmere Low Performance/Energy unit Low High Programmability KULeuven - COSIC Tenerife, Hash 3 5 Nov 2009
6 Design parameters Speed or throughput: Gbits/sec or Mbits/sec/slice Cycles/byte (see D. Bernstein) Area: mm2 (gate or transistor count) Memory Power or energy consumption: Power (Watts) for cooling or transmission (RFID) Energy: battery operated devices Security: Side channel resistance: special circuits styles KULeuven - COSIC Tenerife, Hash 3 6 Nov 2009
7 Power density problem Intel S. Borkar power density problem [Author: S. Borkar, Intel] KULeuven - COSIC Tenerife, Hash 3 7 Nov 2009
8 Power, energy Include picture of Intel cooling issues Immediate need to add 8MWto prepare for 2007 installs of new systems Need total of MW for projected systems by Numbers just for computers, add 75% for cooling. Cooling will require tons of chiller capacity. Source: ORNL Oak Ridge National Lab, US Dept. of Energy KULeuven - COSIC Tenerife, Hash 3 8 Nov 2009
9 Heat and parallelism Reduce power = reduce WASTE!! M P memory processor C Power (Heat) P mono = CV 2 f (Watt) M/4 P/4 M/4 P/4 M/4 P/4 M/4 P/4 C/4 C/4 C/4 C/4 4 (C/4)V 2 (f/4) = P mono /4 but since f ~ V can be even P mono /4 3 TREND: MULTI-CORE!! KULeuven - COSIC Tenerife, Hash 3 9 Nov 2009
10 Low Energy: battery capacity Rabaey slide battery capacity KULeuven - COSIC Tenerife, Hash 3 10 Nov 2009
11 What is hardware design? KULeuven - COSIC Tenerife, Hash 3 11 Nov 2009
12 Skiing down a mountain Translation from spec into RTL (Register Transfer Level, e.g. VHDL, Verilog)l C, C++, block diagram Specification:HASHX pipelining, unrolling Algorithm Transformations loop merging, compaction Memory Transformations and Optimizations 40 bit accumulator Multi-precision arithmetic ASIC FPGA Retargetable coprocessor DSP processor DSP- RISC GPU KULeuven - COSIC Tenerife, Hash 3 12 Nov 2009
13 From RTL to tape-out or FPGA Back-end : VHDL, Verilog, synthesis, FPGA ASIC FPGA Retargetable coprocessor DSP DSP RISC, VLIW, Extensions GPU, CPU To RISC Hardware Software Verilog-VHDL Synopsys synthesis Cadence place&route FPGA download C-compilation Assembly optimization System-on-a-chip, system in package KULeuven - COSIC Tenerife, Hash 3 13 Nov 2009
14 Context 1: ASIC design Standard cell based design KULeuven - COSIC Tenerife, Hash 3 14 Nov 2009
15 Semicustom Design Flow Design Capture Behavioral Design Iteration Pre-Layout Simulation Post-Layout Simulation HDL Logic Synthesis Floorplanning Placement Structural Physical Circuit Extraction Routing Timing closure! Tape-out Technology/library/manufacturer input KULeuven - COSIC Tenerife, Hash 3 15 Nov 2009
16 Cell-based Design (or standard cells) Feedthrough cell Logic cell Routing channel Functional module (RAM, multiplier, ) Routing channel requirements are reduced by presence of more interconnect layers KULeuven - COSIC Tenerife, Hash 3 16 Nov 2009
17 Standard Cell Example [Brodersen92] KULeuven - COSIC Tenerife, Hash 3 17 Nov 2009
18 Standard Cell The New Generation Cell-structure hidden under interconnect layers KULeuven - COSIC Tenerife, Hash 3 18 Nov 2009
19 The Design Closure Problem Iterative Removal of Timing Violations (white lines) Courtesy Synopsys KULeuven - COSIC Tenerife, Hash 3 19 Nov 2009
20 Synthesis together w Physical Design RTL (Timing) Constraints Physical Synthesis Macromodules Fixed netlists Netlist with Place-and-Route Info Place-and-Route Optimization Technology/library manufacturer input Artwork KULeuven - COSIC Tenerife, Hash 3 20 Nov 2009
21 Benchmark on gate count?? Gate count (GE) depends on library and tools! Definition of one GATE? Example: PRESENT[20] contains 1,000 GE in 0.35 m technology 53,974 m 2. PRESENT[20] contains 1,169 GE in 0.25 m technology 32,987 m 2. PRESENT[20] contains 1,075 GE in 0.18 m technology 10,403 m 2. Comparison is fair ONLY if the SAME library, SAME tools, and SAME settings are used. KULeuven - COSIC Tenerife, Hash 3 21 Nov 2009
22 Benchmark on synthesis settings?? Same VHDL design synthesized with different constraints will result in different performance. Benchmark on area-time product?? Note: 2.7GHz is synthesis report: NOT FEASIBLE in practice! [source: M. Knezevic] KULeuven - COSIC Tenerife, Hash 3 22 Nov 2009
23 Context 2: FPGA design KULeuven - COSIC Tenerife, Hash 3 23 Nov 2009
24 Late-Binding Implementation Array-based Pre-diffused (Gate Arrays) Pre-wired (FPGA's) KULeuven - COSIC Tenerife, Hash 3 24 Nov 2009
25 Look-up Table Based Logic Cell In Out Out ln1 ln2 KULeuven - COSIC Tenerife, Hash 3 25 Nov 2009
26 LUT-Based Logic Cell C 1...C 4 4 xx xxxx xxxx xxxx D 4 D 3 D 2 Logic function of xxx xx xx xx xx Bits control x xx x xxxx xx D 1 F 4 F 3 F 2 Logic function of xxx Logic function x of xxx x xx xx xx xx x Bits control xx x xx xx x xxxx x xx F 1 x xxxxx Xilinx 4000 Series Not most up to date H P x xx xx Multiplexer Controlled by Configuration Program Courtesy Xilinx x KULeuven - COSIC Tenerife, Hash 3 26 Nov 2009
27 RAM-based FPGA Xilinx XC4000ex Courtesy Xilinx KULeuven - COSIC Tenerife, Hash 3 27 Nov 2009
28 Xilinx Virtex-II Pro FPGA IBM PowerPC RISC CPU Synchronous Dual-Port RAM Conexant 3.125Gb Serial XtremeDSP SelectIO-ltra SystemIO & XCITE KULeuven - COSIC Tenerife, Hash 3 28 Nov 2009
29 Multi-Pass Place-and-Route Analysis GMU SHA-512, Xilinx Virtex runs for different placement starting points ~ 20% The smaller the better best worst 29 Minimum clock [courtesy: Kris Gaj] 29 KULeuven - COSIC Tenerife, Hash 3 29 Nov 2009
30 Dependence of Results on Requested Clock freq. [courtesy: Kris Gaj] KULeuven - COSIC Tenerife, Hash 3 30 Nov
31 Saar Drimer, Figure 5.2 Ph.D. thesis Distribution max achievable clock frequency for Place&Route with 100 different PAR seeds. 1 & 2: for 1 or 4 AES instances 3 & 4: same on different platform 5: different speed grade KULeuven - COSIC Tenerife, Hash 3 31 Nov 2009
32 FPGA benchmarks?? Easier than ASIC Tools are (almost) free (at least at universities) Options: similar to software Trend getting worse: FPGA becomes heterogeneous machine Report with/without block-rams Report with/without DSP multipliers Report with/without high speed IO KULeuven - COSIC Tenerife, Hash 3 32 Nov 2009
33 FPGA benchmarks?? Area numbers: Slices, LUT s, CLB s, Xilinx application engineer: The number of CLB s inside LUT s changes from generation to generation. (or was it LUT s inside CLB s?) Speed: accurately reported by tools Power: Poorly reporting by tools Hard to measure on board KULeuven - COSIC Tenerife, Hash 3 33 Nov 2009
34 Context 3: HW-SW interface Dan would call this the API? KULeuven - COSIC Tenerife, Hash 3 34 Nov 2009
35 Intro: SHA3-ZOO 3 types of Hardware reporting, but no interface! SHA3 Mem Fully Autonomous Fully Autonomous With external memory Core functionality Integration of Hash module?? KULeuven - COSIC Tenerife, Hash 3 35 Nov 2009
36 Integration of the Hash module: options for HW/SW co-design Option 1: instruction set extension SHA3 Tightly coupled Reuse of busses Reuse of registers Define instruction Usually: C-intrinsic or pragma Example: AES-NI off Intel (see Shay s presentation!) Example: Build your own extension to embedded processor see e.g. Xtensa or Target Compiler Technologies KULeuven - COSIC Tenerife, Hash 3 36 Nov 2009
37 Option 2: Memory mapped Main processor SHA3 Local memory Memory-mapped coprocessor Loosely coupled Typical for DSP and other embedded processors No need to change compiler Check latency of coprocessor & memory consistency! KULeuven - COSIC Tenerife, Hash 3 37 Nov 2009
38 Option 3: novel forms of co-operation router router SHA3 Custom HW or Network on Chip (NOC) Loosely coupled Flexible interconect Popular for large multicore designs (80 or 100 cores) One of many other cores KULeuven - COSIC Tenerife, Hash 3 38 Nov 2009
39 Can have different forms in on Systemon-chip (SOC) external memory CPU Memory custom dp I$ D$ Memory Controller Timer Parallel I/O Local Bus High-speed Bus Bridge Peripheral Bus Custom HW DMA Bus Master UART Custom HW direct I/O KULeuven - COSIC Tenerife, Hash 3 39 Nov 2009
40 AES acceleration for SH3-DSP AES Co-processor For 128bit key Using GEZEL Communicate with the SH3-DSP ISS via the memory mapped interface KVM on SH3-DSP ISS GEZEL-SH Co-Simulator { volatile char *ins = 0x2f000; volatile int *dout = 0x2f004; volatile int *din = 0x2f008; } address 0x2f000 0x2f004 0x2f008 memory-mapped interface 8 ins 32 dout 32 din aes_encoder aes_top [Ref: Y. Matsuoka et al, CASES04] load reset key text_in done text_out 128 Co-processor in GEZEL Simulation Kernel KULeuven - COSIC Tenerife, Hash 3 40 Nov 2009
41 AES Optimization results Number of lock cycles per AES encryption (Key scheduling + Block encryption) Starting from Java function call in user application KNI overhead limits the overall performance gain Java API I/F (a) Java (b) Java+C (c) Java+C+GEZEL KNI I/F Acceleration I/F Mem-Mapped I/F Total Cycles [Ref: Y. Matsuoka et al, CASES04] (6.8x) (10.4x) KULeuven - COSIC Tenerife, Hash 3 41 Nov 2009
42 Context 4: Bandwidth KULeuven - COSIC Tenerife, Hash 3 42 Nov 2009
43 Adapt HW platform to application Simple example: Key Schedule for secret key Two options: On the fly = just in time processing Pre-compute and store in memory Key Schedule BC Key Schedule Memory Typical for Hardware BC Typical for Software KULeuven - COSIC Tenerife, Hash 3 43 Nov 2009
44 Key schedule on the fly The cost of fast key context switching in SW Example for IPSEC router one 128 bit key = 1408 bits round keys (10 rounds + initial key) half of internet packets are only 64 bytes in length (512 bits) Context bandwidth (Gbps) Data at 1Gbps ARC4 AES 3DES Record Size (bytes) [source: J. Goodman] KULeuven - COSIC Tenerife, Hash 3 44 Nov 2009
45 Benchmark?? Cost of HW module (minimum minimorum): Key storage assume sub-keys on the fly State storage: Does all state need to be alive all the time? Wide pipe - narrow pipe Windowing? Think context switching Input block / output block Can I process input already before the complete input block and/or padding is present? Same for output: can I send output, or do I have to wait for the complete output block KULeuven - COSIC Tenerife, Hash 3 45 Nov 2009
46 Context 5: gap between application and architecture KULeuven - COSIC Tenerife, Hash 3 46 Nov 2009
47 Match between algorithm & architecture Close the gap: Application Dedicated HW: ASIC Programmable HW: FPGA Custom instructions, handcoded assembly Compiled code Power JAVA on virtual machine, compiled on a real machine Cost ASIC Fixed Platform??? General Purpose KULeuven - COSIC Tenerife, Hash 3 47 Nov 2009
48 AES 128bit key 128bit data 0.18μm CMOS Throughput Energy numbers Throughput 3.84 Gbits/sec Power 350 mw Figure of Merit (Gb/s/W = Gb/J) 11 (1/1) FPGA [1] 1.32 Gbit/sec 490 mw 2.7 (1/4) ASM StrongARM [2] 31 Mbit/sec 240 mw 0.13 (1/85) Asm Pentium III [3] 648 Mbits/sec 41.4 W (1/800) C Emb. Sparc [4] 133 Kbits/sec 120 mw (1/10.000) Java [5] Emb. Sparc 450 bits/sec 120 mw (1/ ) [1] Amphion CS5230 on Virtex2 + Xilinx Virtex2 Power Estimator [2] Dag Arne Osvik: 544 cycles AES ECB on StrongArm SA-1110 [3] Helger Lipmaa PIII assembly handcoded + Intel Pentium III (1.13 GHz) Datasheet [4] gcc, Mhz Sparc assumes 0.25 u CMOS [5] Java on KVM (Sun J2ME, non-jit) on MHz Sparc assumes 0.25 u CMOS KULeuven - COSIC Tenerife, Hash 3 48 Nov 2009
49 Context 6: transformations KULeuven - COSIC Tenerife, Hash 3 49 Nov 2009
50 Data Flow Graph representation Illustrate with RIPEMD Indicate loops, operations, and delays TD TD TD TD TD B CTD rol(10) D TD E F TD rol(s) A TD Ki Xi KULeuven - COSIC Tenerife, Hash 3 50 Nov 2009
51 Iteration Bound t l loop calculation time w l number of algorithmic delays (marked with T D ) in the l-th loop TD TD TD TD TD B CTD rol(10) D TD E F TD rol(s) A TD Ki Xi KULeuven - COSIC Tenerife, Hash 3 51 Nov 2009
52 Iteration Bound TD TD TD TD TD B CTD rol(10) D TD E F TD rol(s) A TD Ki Xi KULeuven - COSIC Tenerife, Hash 3 52 Nov 2009
53 Critical path The longest path between any two storage elements. - Determines the clock frequency! Problem: Critical Path > Iteration Bound! TD TD TD TD TD B CTD rol(10) D TD E F rol(s) A TD TD Ki Xi KULeuven - COSIC Tenerife, Hash 3 53 Nov 2009
54 Retiming transformation Transformation technique that changes the locations of unit-delay elements in a circuit without affecting the input/output characteristic. After retiming: Critical Path = Iteration Bound! TD TD TD TD TD B CTD rol(10) D TD E F rol(s) + + A1 + + TD Ki+1 Xi+1 KULeuven - COSIC Tenerife, Hash 3 54 Nov 2009
55 Hardware tricks For speed: Parallelism Pipelining Loop unrolling FPGA: Block RAM instead of Logic For area: Multiplexing Composite field instead of Sbox For power/energy: Parallelism Pipelining KULeuven - COSIC Tenerife, Hash 3 55 Nov 2009
56 Algorithm properties As they affect HW realization Internal state Block size Initialization cost Iterative, sequential, Parallelism KULeuven - COSIC Tenerife, Hash 3 56 Nov 2009
57 Benchmark efforts Benchmarks on FPGA, ASIC API efforts Open questions KULeuven - COSIC Tenerife, Hash 3 57 Nov 2009
58 Stefan Tillich See his presentation for the context KULeuven - COSIC Tenerife, Hash 3 58 Nov 2009
59 Brian Baldwin FPGA: CubeHash, Grostl, Shabal, SIMD, JH, Hamsi and Fugue Core functionality & compression function See his presentation for context KULeuven - COSIC Tenerife, Hash 3 59 Nov 2009
60 Christian Wenzel-Benner external Benchmarking extension KULeuven - COSIC Tenerife, Hash 3 60 Nov 2009
61 Miroslav Knezevic Illustration of transformations: applied to Luffa and others More observations KULeuven - COSIC Tenerife, Hash 3 61 Nov 2009
62 Patrick Schaumont: API for HW INIT & GETCONFIG: initialization, type of I/O, etc IDATA & ODATA: parameter 16, 32 bit: low end processor 64, 128 (256): high end processors KULeuven - COSIC Tenerife, Hash 3 62 Nov 2009
63 Kris Gaj: ATHENa 5 Database query ATHENa Server 6 User Ranking of designs 1 Download scripts and configuration files8 HDL + scripts + configuration files FPGA Synthesis and Implementation 2 3 Result Summary + Database Entries HDL + FPGA Tools 4 Database Entries Designer Interfaces + Testbenches 63 0 KULeuven - COSIC Tenerife, Hash 3 63 Nov 2009
64 ATHENa Major Features synthesis, implementation, and timing analysis in the batch mode support for devices and tools of multiple FPGA vendors: generation of results for multiple families of FPGAs of a given vendor automated choice of a best-matching device within a given family KULeuven - COSIC Tenerife, Hash 3 64 Nov
65 Open questions Area comparisons Throughput comparisons Power/Energy comparisons Sets of environments KULeuven - COSIC Tenerife, Hash 3 65 Nov 2009
66 Conclusions Results depend on: ASIC set-up FPGA set-up Hardware API Bandwidth Transformations Need: Set of contexts area and speed, but also POWER and ENERGY! KULeuven - COSIC Tenerife, Hash 3 66 Nov 2009
Accelerating Financial Computation
Accelerating Financial Computation Wayne Luk Department of Computing Imperial College London HPC Finance Conference and Training Event Computational Methods and Technologies for Finance 13 May 2013 1 Accelerated
More informationHigh throughput implementation of the new Secure Hash Algorithm through partial unrolling
High throughput implementation of the new Secure Hash Algorithm through partial unrolling Konstantinos Aisopos Athanasios P. Kakarountas Haralambos Michail Costas E. Goutis Dpt. of Electrical and Computer
More informationAutomatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations
Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo s David B. Thomas, Jacob A. Bower, Wayne Luk {dt1,wl}@doc.ic.ac.uk Department of Computing Imperial College London Abstract
More informationReconfigurable Acceleration for Monte Carlo based Financial Simulation
Reconfigurable Acceleration for Monte Carlo based Financial Simulation G.L. Zhang, P.H.W. Leong, C.H. Ho, K.H. Tsoi, C.C.C. Cheung*, D. Lee**, Ray C.C. Cheung*** and W. Luk*** The Chinese University of
More informationDesign of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA
Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical
More informationAnalytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London
Analytics in 10 Micro-Seconds Using FPGAs David B. Thomas dt10@imperial.ac.uk Imperial College London Overview 1. The case for low-latency computation 2. Quasi-Random Monte-Carlo in 10us 3. Binomial Trees
More informationLoad Test Report. Moscow Exchange Trading & Clearing Systems. 07 October Contents. Testing objectives... 2 Main results... 2
Load Test Report Moscow Exchange Trading & Clearing Systems 07 October 2017 Contents Testing objectives... 2 Main results... 2 The Equity & Bond Market trading and clearing system... 2 The FX Market trading
More informationMark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling
EE 357 Unit 12 Performance Modeling An Opening Question An Intel and a Sun/SPARC computer measure their respective rates of instruction execution on the same application written in C Mark Redekopp, All
More informationAnne Bracy CS 3410 Computer Science Cornell University
Anne Bracy CS 3410 Computer Science Cornell University These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Complex question How fast is the
More informationPhysical Unclonable Functions (PUFs) and Secure Processors. Srini Devadas Department of EECS and CSAIL Massachusetts Institute of Technology
Physical Unclonable Functions (PUFs) and Secure Processors Srini Devadas Department of EECS and CSAIL Massachusetts Institute of Technology 1 Security Challenges How to securely authenticate devices at
More informationWhy know about performance
1 Performance Today we ll discuss issues related to performance: Latency/Response Time/Execution Time vs. Throughput How do you make a reasonable performance comparison? The 3 components of CPU performance
More informationPUF Design - User Interface
PUF Design - User Interface September 27, 2011 1 Introduction Design an efficient Physical Unclonable Functions (PUF): PUFs are low-cost security primitives required to protect intellectual properties
More informationFPGA PUF Based on Programmable LUT Delays
FPGA PUF Based on Programmable LUT Delays Bilal Habib Kris Gaj Jens-Peter Kaps Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE, Volgenau School of Engineering,
More informationHPC IN THE POST 2008 CRISIS WORLD
GTC 2016 HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016 STANFORD CENTER FOR FINANCIAL AND RISK ANALYTICS HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016 BACK TO 2008 FINANCIAL MARKETS
More information3.6V / 2600mAh Primary Lithium x 0.85 (6 cm x 2.1 cm) 1.0 oz (28 gr) -25 C to 65 C. Bluetooth Low Energy dbm.
SPECIFICATION SHEET ibeek VER 1.3 HARDWARE SPECIFICATION Battery Size Weight Temperature Range Bluetooth Type Bluetooth Sensitivity Bluetooth Max Power Output Bluetooth Antena Bluetooth Frequency Bluetooth
More informationUltimate Control. Maxeler RiskAnalytics
Ultimate Control Maxeler RiskAnalytics Analytics Risk Financial markets are rapidly evolving. Data volume and velocity are growing exponentially. To keep ahead of the competition financial institutions
More informationUNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C Form 10-K
[X] UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 Form 10-K ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the year ended October 31,
More informationEfficient Reconfigurable Design for Pricing Asian Options
Efficient Reconfigurable Design for Pricing Asian Options Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Department of Computing Imperial College London, UK {htt08,dt10,khtsoi,wl}@doc.ic.ac.uk ABSTRACT
More informationFinancial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA
Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Rajesh Bordawekar and Daniel Beece IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation
More informationMicroprocessor Based Physical Unclonable Function
Microprocessor Based Physical Unclonable Function Sudeendra kumar K, Sauvagya Sahoo, Abhishek Mahapatra, Ayas Kanta Swain, K.K.Mahapatra kumar.sudeendra@gmail.com, sauvagya.nitrkl@gmail.com, kmaha2@gmail.com
More information2 4 1 Revenue Information by Product Groups. 4 2 Revenue by Geographic Region. 7 4 Revenue and Contract Duration
To enhance the level of disclosure we provide and help investors gain better insight into our business, we are providing investors the following financial information: Page Table Description 2 4 1 Revenue
More informationBarrier Option. 2 of 33 3/13/2014
FPGA-based Reconfigurable Computing for Pricing Multi-Asset Barrier Options RAHUL SRIDHARAN, GEORGE COOKE, KENNETH HILL, HERMAN LAM, ALAN GEORGE, SAAHPC '12, PROCEEDINGS OF THE 2012 SYMPOSIUM ON APPLICATION
More information3.6V / 2600mAh Primary Lithium x 0.85 (60mm x 21mm) 1.0 oz (28 gr) -30 C to +77 C. Bluetooth Low Energy dBm. +5dBm. 1Mbit/s / 2Mbit/s*
SPECIFICATION SHEET BEEKs Industrial VER 1.6 HARDWARE SPECIFICATION Battery Size Weight Temperature Range Bluetooth Type Bluetooth Sensitivity Bluetooth Max Power Output Bluetooth Antenna Frequency Supported
More informationMcKesson Radiology 12.0 Web Push
McKesson Radiology 12.0 Web Push The scenario Your institution has radiologists who interpret studies using various personal computers (PCs) around and outside your enterprise. The PC might be in one of
More informationA PUF Design for Secure FPGA-Based Embedded Systems
A PUF Design for Secure FPGA-Based Embedded Systems author line author line2 author line3 Abstract The concept of having an integrated circuit (IC) generate its own unique digital signature has broad application
More informationReal-Time Market Data Technology Overview
Real-Time Market Data Technology Overview Zoltan Radvanyi Morgan Stanley Session Outline What is market data? Basic terms used in market data world Market data processing systems Real time requirements
More informationThe Dynamic Cross-sectional Microsimulation Model MOSART
Third General Conference of the International Microsimulation Association Stockholm, June 8-10, 2011 The Dynamic Cross-sectional Microsimulation Model MOSART Dennis Fredriksen, Pål Knudsen and Nils Martin
More informationMorningstar Advisor Workstation Enterprise Edition
SM Morningstar Advisor Workstation Enterprise Edition 15 24 25 11 6 4 8 4 3 Advisor Workstation Enterprise Edition is a Webbased solution that brings together the best of Morningstar s capabilities in
More informationLecture 8: Skew Tolerant Domino Clocking
Lecture 8: Skew Tolerant Domino Clocking Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2001 by Mark Horowitz (Original Slides from David Harris) 1 Introduction Domino
More informationwww.unique-project.eu Exchange of security-critical data Computing Device generates, stores and processes security-critical information Computing Device 2 However: Cryptographic secrets can be leaked by
More informationList of Abbreviations
List of Abbreviations (CM) 2 ACP AGP AJD ALU API ASIC ATA ATM AVX AXI BAR BIOS BLAST BM BS CAN CAPEX CDR CI CPU CRUD DAL Center for Mathematical and Computational Modelling. Accelerator Coherency Port.
More informationCollateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result
Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result Lurng-Kuo Liu Virat Agarwal Outline Objectivee Collateralized Debt Obligation Basics CDO on the Cell/B.E. A preliminary result
More informationIMPROVING THE QUALITY OF A PHYSICAL UNCLONABLE FUNCTION USING CONFIGURABLE RING OSCILLATORS Abhranil Maiti, Patrick Schaumont
IMPROVING THE QUALITY OF A PHYSICAL UNCLONABLE FUNCTION USING CONFIGURABLE RING OSCILLATORS Abhranil Maiti, Patrick Schaumont Electrical and Computer Engineering Department Virginia Tech Blacksburg, VA
More informationF1 Acceleration for Montecarlo: financial algorithms on FPGA
F1 Acceleration for Montecarlo: financial algorithms on FPGA Presented By Liang Ma, Luciano Lavagno Dec 10 th 2018 Contents Financial problems and mathematical models High level synthesis Optimization
More informationA New Redundancy Strategy for High-Availability Power Systems
Kevin Covi IBM STG High-End Server Development A New Redundancy Strategy for High-Availability Power Systems Outline RAS Philosophy Review Power 6 Traditional approach Power 7 Hybrid approach Load lines
More informationA PUF Design for Secure FPGA-Based Embedded Systems
A PUF Design for Secure FPGA-Based Embedded Systems Jason H. Anderson Department of Electrical and Computer Engineering University of Toronto Toronto, Ontario, Canada e-mail: janders@eecg.toronto.edu Abstract
More informationDelay Budgeting in Sequential Circuit with Application on FPGA Placement
13.2 Delay Budgeting in Sequential Circuit with Application on FPGA Placement Chao-Yang Yeh and Malgorzata Marek-Sadowska Department of Electrical and Computer Engineering, University of California, Santa
More informationCOS 318: Operating Systems. CPU Scheduling. Jaswinder Pal Singh Computer Science Department Princeton University
COS 318: Operating Systems CPU Scheduling Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Today s Topics u CPU scheduling basics u CPU
More informationColor Pay : Next Paradigm for Instant Payment
Color Pay : Next Paradigm for Instant Payment Table of Contents Table of Contents 2 Abstract 2 What is PUF? 3 Overview of PUF 3 Architecture of PUF Chip 3 Internals of PUF Chip 4 External Interfaces of
More informationTEPZZ 858Z 5A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/15
(19) TEPZZ 88Z A_T (11) EP 2 88 02 A1 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: 08.04. Bulletin / (1) Int Cl.: G06Q /00 (12.01) (21) Application number: 13638.6 (22) Date of filing: 01..13
More informationEfficient Reconfigurable Design for Pricing Asian Options
Efficient Reconfigurable Design for Pricing Asian Options Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Department of Computing Imperial College London, UK (htt08,dtl O,khtsoi,wl)@doc.ic.ac.uk
More informationLecture 8: Skew Tolerant Design (including Dynamic Circuit Issues)
Lecture 8: Skew Tolerant Design (including Dynamic Circuit Issues) Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2007 by Mark Horowitz w/ material from David Harris 1
More informationVariation Aware Placement for Efficient Key Generation using Physically Unclonable Functions in Reconfigurable Systems
University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses Dissertations and Theses 2016 Variation Aware Placement for Efficient Key Generation using Physically Unclonable Functions
More informationChapter 7. Registers & Register Transfers. J.J. Shann. J. J. Shann
Chapter 7 Registers & Register Transfers J. J. Shann J.J. Shann Chapter Overview 7-1 Registers and Load Enable 7-2 Register Transfers 7-3 Register Transfer Operations 7-4 A Note for VHDL and Verilog Users
More informationFPGA ACCELERATION OF MONTE-CARLO BASED CREDIT DERIVATIVE PRICING
FPGA ACCELERATION OF MONTE-CARLO BASED CREDIT DERIVATIVE PRICING Alexander Kaganov, Paul Chow Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada M5S 3G4 email:
More informationApplication-Based Performance and Power Analysis on Dynamically Reconfigurable Processor
223 8522 3-4- E-mail: drp@am.ics.keio.ac.jp NEC Dynamically Reconfigurable Processor (DRP) DRP, Abstract Application-Based Performance and Power Analysis on Dynamically Reconfigurable Processor Yohei HASEGAWA,
More informationFinite state machines (cont d)
Finite state machines (cont d)! Another type of shift register " Linear-feedback shift register (LFSR)! Used to generate pseudo-random numbers! Some FSM examples Autumn 2014 CSE390C - VIII - Finite State
More informationInnovation in the global credit
2010 IEEE. Reprinted, with permission, from Stephen Weston, Jean-Tristan Marin, James Spooner, Oliver Pell, Oskar Mencer, Accelerating the computation of portfolios of tranched credit derivatives, IEEE
More informationDMI Certification. David G. Lawrence DMI Working Group
DMI Certification David G. Lawrence DMI Working Group Today s Objectives Desktop Management Interface (DMI) Overview DMI 2.0 Self-certification process Why would I care about DMI Conformance? Why DMI?
More informationAn Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model
2011 International Conference on Reconfigurable Computing and FPGAs An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model Christian de Schryver, Ivan Shcherbakov, Frank
More informationMEET THE NEXT GENERATION OF PROGRESSIVE MANAGEMENT SYSTEMS: BEPS
1 TM MEET THE NEXT GENERATION OF PROGRESSIVE MANAGEMENT SYSTEMS: BEPS WHITE PAPER // BEPS 2 Today s progressives are a jumbled mix of different controllers, stand-alone systems, and legacy displays. Couple
More informationAssessing Solvency by Brute Force is Computationally Tractable
O T Y H E H U N I V E R S I T F G Assessing Solvency by Brute Force is Computationally Tractable (Applying High Performance Computing to Actuarial Calculations) E D I N B U R M.Tucker@epcc.ed.ac.uk Assessing
More informationAggregation of an FX order book based on complex event processing
Aggregation of an FX order book based on complex event processing AUTHORS ARTICLE INFO JOURNAL Barret Shao Greg Frank Barret Shao and Greg Frank (2012). Aggregation of an FX order book based on complex
More informationUNITED STATES SECURITIES AND EXCHANGE COMMISSION. Washington, D.C FORM 10-K
UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) È ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the year ended October
More informationHigh Performance and Low Power Monte Carlo Methods to Option Pricing Models via High Level Design and Synthesis
High Performance and Low Power Monte Carlo Methods to Option Pricing Models via High Level Design and Synthesis Liang Ma, Fahad Bin Muslim, Luciano Lavagno Department of Electronics and Telecommunication
More informationEE115C Spring 2013 Digital Electronic Circuits. Lecture 19: Timing Analysis
EE115C Spring 2013 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-
More informationHow Computers Work Lecture 12
How Computers Work Lecture 12 A Common Chore of College Life Introduction to Pipelining How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Page 2 Page 1 1 Propagation Times Doing 1 Load
More informationMoving PUFs out of the lab
Moving PUFs out of the lab Patrick Schaumont 2/3/2012 Research results by Abhranil Maiti, Jeff Casarona, Luke McHale, Logan McDougall, Vikash Gunreddy, Michael Cantrell What is a Physical Unclonable Function?
More informationLiangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*
2017 2 nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5 Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform Gang
More informationstratification strategy controlled by CPUs, to adaptively allocate the optimal number of simulations to a specific segment of the entire integration d
FPGA-accelerated Monte-Carlo integration using stratified sampling and Brownian bridges Mark de Jong, Vlad-Mihai Sima and Koen Bertels Department of Computer Engineering Delft University of Technology
More informationUnparalleled Performance, Agility and Security for NSE
white paper Intel Xeon and Intel Xeon Scalable Processor Family Financial Services Unparalleled Performance, Agility and Security for NSE The latest Intel Xeon processor platform provides new levels of
More informationTechnical Requirements of HUDEX Hungarian Derivative Energy Exchange Ltd.
Technical Requirements of HUDEX Hungarian Derivative Energy Exchange Ltd. HUDEX Technical Requirements 1 Tartalomjegyzék I. General rules... 3 I.1. Scope of the Technical Requirements... 3 I.2. Persons
More informationInstruction Selection: Preliminaries. Comp 412
COMP 412 FALL 2018 Instruction Selection: Preliminaries Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationSD-WAN as a Service Schedule Terms and Conditions & SLA
SD-WAN as a Service Schedule Terms and Conditions & SLA NEUTRONA S SD-WAN AS A SERVICE The following Neutrona s Software Defined WAN as a Service ( SD-WAN ) document is applicable as a Customer Experience
More informationOptimal Integer Delay Budget Assignment on Directed Acyclic Graphs
Optimal Integer Delay Budget Assignment on Directed Acyclic Graphs E. Bozorgzadeh S. Ghiasi A. Takahashi M. Sarrafzadeh Computer Science Department University of California, Los Angeles (UCLA) Los Angeles,
More informationChapter 7 A Multi-Market Approach to Multi-User Allocation
9 Chapter 7 A Multi-Market Approach to Multi-User Allocation A primary limitation of the spot market approach (described in chapter 6) for multi-user allocation is the inability to provide resource guarantees.
More informationCOS 318: Operating Systems. CPU Scheduling. Today s Topics. CPU Scheduler. Preemptive and Non-Preemptive Scheduling
Today s Topics COS 318: Operating Systems u CPU scheduling basics u CPU scheduling algorithms CPU Scheduling Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/)
More informationA Branch-and-Price method for the Multiple-depot Vehicle and Crew Scheduling Problem
A Branch-and-Price method for the Multiple-depot Vehicle and Crew Scheduling Problem SCIP Workshop 2018, Aachen Markó Horváth Tamás Kis Institute for Computer Science and Control Hungarian Academy of Sciences
More informationBenchmarks Open Questions and DOL Benchmarks
Benchmarks Open Questions and DOL Benchmarks Iuliana Bacivarov ETH Zürich Outline Benchmarks what do we need? what is available? Provided benchmarks in a DOL format Open questions Map2Mpsoc, 29-30 June
More informationActive and Passive Side-Channel Attacks on Delay Based PUF Designs
1 Active and Passive Side-Channel Attacks on Delay Based PUF Designs Georg T. Becker, Raghavan Kumar Abstract Physical Unclonable Functions (PUFs) have emerged as a lightweight alternative to traditional
More informationArchitecture Exploration for Tree-based Option Pricing Models
Architecture Exploration for Tree-based Option Pricing Models MEng Final Year Project Report Qiwei Jin qj04@doc.ic.ac.uk http://www.doc.ic.ac.uk/ qj04/project Supervisor: Prof. Wayne Luk 2nd Marker: Dr.
More informationFAST ACCESS BLOCKCHAIN
WHITEPAPER October 10, 2017 FAST ACCESS BLOCKCHAIN A Highly Scalable Public Blockchain Network Fast Access Blockchain Foundation 68 West Bay Road, Cayman Islands Technical Partner: FA Enterprise System
More informationOutline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE
Outline GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing Monte Carlo PDE Conclusions 2 Why GPU for Finance? Need for effective portfolio/risk management solutions Accurately measuring,
More informationEnergy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL
Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL Valentin Mena Morales, Pierre-Henri Horrein, Amer Baghdadi, Erik Hochapfel, Sandrine Vaton Institut Mines-Telecom; Telecom
More informationUNITED STATES SECURITIES AND EXCHANGE COMMISSION. Washington, D.C TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF
UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K È ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended May 31,
More informationBlitzTrader. Next Generation Algorithmic Trading Platform
BlitzTrader Next Generation Algorithmic Trading Platform Introduction TRANSFORM YOUR TRADING IDEAS INTO ACTION... FAST TIME TO THE MARKET BlitzTrader is next generation, most powerful, open and flexible
More informationHardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking
Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking Christian de Schryver #, Henning Marxen, Daniel Schmidt # # Micrelectronic Systems Design Department, University
More informationThermOS. System Support for Dynamic Thermal Management of Chip Multi-Processors
22nd International Conference on Parallel Architectures and Compilation Techniques (PACT-22), 2013 September 9, 2013 Edinburgh, Scotland, UK ThermOS System Support for Dynamic Thermal Management of Chip
More information2007 Investor Meeting
2007 Investor Meeting December 11 th, 2007 Altera, Stratix, Cyclone, MAX, HardCopy, Arria, HardCopy, Nios, Quartus, Nios, Quartus, and MegaCore and MegaCore are trademarks are trademarks of Altera of Altera
More informationFujitsu s CPF Based Low Power Design Status & Today s Power Format
Fujitsu s CPF Based Low Power Design Status & Today s Power Format 2009/05/20 Fujitsu Microelectronics Ltd. Agenda Fujitsu s Low Power Design History and Results Fujitsu s CPF Low Power Design Flow CPF
More informationBCN1043. By Dr. Mritha Ramalingam. Faculty of Computer Systems & Software Engineering
BCN1043 By Dr. Mritha Ramalingam Faculty of Computer Systems & Software Engineering mritha@ump.edu.my http://ocw.ump.edu.my/ authors Dr. Mohd Nizam Mohmad Kahar (mnizam@ump.edu.my) Jamaludin Sallim (jamal@ump.edu.my)
More informationHedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud
Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud GPU Technology Conference 2013 Aon Benfield Securities, Inc. Annuity Solutions Group (ASG) This document is the confidential property
More informationThe good, the bad and the statistical
The good, the bad and the statistical Noel Menezes Strategic CAD Labs Design and Technology Solutions Intel Corp. Acknowledgements Keith Bowman Yossi Abulafia Steve Burns Mahesh Ketkar Vivek De Jim Tschanz
More informationWESTERNPIPS TRADER 3.9
WESTERNPIPS TRADER 3.9 FIX API HFT Arbitrage Trading Software 2007-2017 - 1 - WESTERNPIPS TRADER 3.9 SOFTWARE ABOUT WESTERNPIPS TRADER 3.9 SOFTWARE THE DAY HAS COME, WHICH YOU ALL WERE WAITING FOR! PERIODICALLY
More informationRules for the Technical Installations of the Trading Systems
Rules for the Technical Installations of the Trading Systems 1. General rules for access to the exchange EDP system (1) The Rules for the Technical Installations govern access to the EDP system of the
More informationPARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES
PARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES WIKTOR JAKUBIUK, KESHAV PURANMALKA 1. Introduction Dijkstra s algorithm solves the single-sourced shorest path problem on a
More informationA Pattern Matching Approach to Map Cognitive Domain Ontologies to the IBM TrueNorth Processor
A Pattern Matching Approach to Map Cognitive Domain Ontologies to the IBM TrueNorth Processor CCAA 217 Nayim Rahman 1, Tanvir Atahary 1, Tarek Taha 1, and Scott A. Douglass 2 1 Electrical and Computer
More informationScaling SGD Batch Size to 32K for ImageNet Training
Scaling SGD Batch Size to 32K for ImageNet Training Yang You Computer Science Division of UC Berkeley youyang@cs.berkeley.edu Yang You (youyang@cs.berkeley.edu) 32K SGD Batch Size CS Division of UC Berkeley
More informationPrintFleet Enterprise 2.2 Security Overview
PrintFleet Enterprise 2.2 Security Overview PrintFleet Inc. is committed to providing software products that are secure for use in all network environments. PrintFleet software products only collect the
More informationTake the lead on user experience, speed to market and upselling.
Take the lead on user experience, speed to market and upselling. Enhance user experience in all distribution channels, from traditional face-to-face to direct online distribution. Available disconnected
More informationEasy Ways to Use EFTPS. For Tax Practitioners, Accountants and. Payroll Companies
4 Easy Ways to Use EFTPS For Tax Practitioners, Accountants and Payroll Companies The Electronic Federal Tax Payment System EFTPS is the easiest way to make federal tax payments, and it offers you and
More informationSPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)
SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU) NIKOLA VASILEV, DR. ANATOLIY ANTONOV Eurorisk Systems Ltd. 31, General Kiselov str. BG-9002 Varna, Bulgaria Phone +359 52 612 367
More informationApplication of High Performance Computing in Investment Banks
British Computer Society FiNSG and APSG Public Application of High Performance Computing in Investment Banks Dr. Tony K. Chau Lead Architect, IB CTO, UBS January 8, 2014 Table of contents Section 1 UBS
More informationLTE RF Optimization Training
LTE RF Optimization Training Why should you choose LTE RF Optimization Training: Certified LTE Radio Planning & Optimization LTE RF Optimization Training provides knowledge and skills needed for successful
More information2004 ANNUAL REPORT & PROXY XILINX X X
X X X X 2004 ANNUAL REPORT & PROXY X XILINX X X X X X X ABOUT THE COVER When you want to stay connected, X marks the spot. Digital technology is becoming a central part of our lives. Everything is going
More informationA t S + b r t T B (h i + 1) (t S + t T ) C h i (t S + t T ) + t S + b t T D (h i + n) (t S + t T )
Suppose we have a primary B+-tree index where the leaves contain search keys and RIDs, and the RIDs point to the records in a file that is ordered on the index search key. Assume that the blocks in the
More informationFPGA based acceleration of compute-intensive workloads in finance. Intel Software Developer Conference London, 2017
FPGA based acceleration of compute-intensive workloads in finance Intel Software Developer Conference London, 2017 Trends FPGA architecture High level design flows Finance Library for FPGA 2 Where Intel-FPGAs
More informationBlack-Scholes option pricing. Victor Podlozhnyuk
Black-Scholes option pricing Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 0.9 007/03/19 Victor Podlozhnyuk Initial release 1.0 007/04/06
More informationAdvanced Verification Management and Coverage Closure Techniques
Advanced Verification Management and Coverage Closure Techniques Nguyen Le, Microsoft Harsh Patel, Mentor Graphics Roger Sabbagh, Mentor Graphics Darron May, Mentor Graphics Josef Derner, Mentor Graphics
More informationUnderstanding the customer s requirements for a software system. Requirements Analysis
Understanding the customer s requirements for a software system Requirements Analysis 1 Announcements Homework 1 Correction in Resume button functionality. Download updated Homework 1 handout from web
More information