COSC 6385 Computer Architecture. Fundamentals

Similar documents
COSC 6385 Computer Architecture. Performance Measurement

Calculation of the Annual Equivalent Rate (AER)

A Technical Description of the STARS Efficiency Rating System Calculation

Internal Control Framework

TENS Unit Prior Authorization Process

Estimating Proportions with Confidence

Subject CT1 Financial Mathematics Core Technical Syllabus


CHAPTER 2 PRICING OF BONDS

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Lecture 5: Sampling Distribution

Statistics for Economics & Business

Execution Risk Management at Wachovia Yousef Valine

Standard BAL a Real Power Balancing Control Performance

Guide for. Plan Sponsors. Roth 401(k) get retirement right

Productivity depending risk minimization of production activities

The ROI of Ellie Mae s Encompass All-In-One Mortgage Management Solution

Mine Closure Risk Assessment A living process during the operation

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012

Online appendices from The xva Challenge by Jon Gregory. APPENDIX 10A: Exposure and swaption analogy.


CAPITAL PROJECT SCREENING AND SELECTION

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

APPLICATION OF GEOMETRIC SEQUENCES AND SERIES: COMPOUND INTEREST AND ANNUITIES

Forecasting bad debt losses using clustering algorithms and Markov chains

Electronic Transactions Association Certified Payments Professional. Maintaining Your Credential

We learned: $100 cash today is preferred over $100 a year from now

ISBN Copyright 2015 The Continental Press, Inc.

CAPITAL ASSET PRICING MODEL

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Review Procedures and Reporting by Peer Reviewer

STRAND: FINANCE. Unit 3 Loans and Mortgages TEXT. Contents. Section. 3.1 Annual Percentage Rate (APR) 3.2 APR for Repayment of Loans

CMSC22200 Computer Architecture Lecture 8: Out-of-Order Execution. Prof. Yanjing Li University of Chicago

When you click on Unit V in your course, you will see a TO DO LIST to assist you in starting your course.

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

ad covexity Defie Macaulay duratio D Mod = r 1 = ( CF i i k (1 + r k) i ) (1.) (1 + r k) C = ( r ) = 1 ( CF i i(i + 1) (1 + r k) i+ k ) ( ( i k ) CF i

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

Non-Inferiority Logrank Tests

MA Lesson 11 Section 1.3. Solving Applied Problems with Linear Equations of one Variable

(Zip Code) OR. (State)

Appendix 1 to Chapter 5

CHAPTER 8 Estimating with Confidence

Sampling Distributions and Estimation

DESCRIPTION OF MATHEMATICAL MODELS USED IN RATING ACTIVITIES

Agent product guide. Advantage Builder II. Indexed Universal Life /09

of Asset Pricing R e = expected return

Chapter 8: Estimation of Mean & Proportion. Introduction

Life Products Bulletin

Annual compounding, revisited

Using Math to Understand Our World Project 5 Building Up Savings And Debt

The material in this chapter is motivated by Experiment 9.

Chapter 4 - Consumer. Household Demand and Supply. Solving the max-utility problem. Working out consumer responses. The response function

of Asset Pricing APPENDIX 1 TO CHAPTER EXPECTED RETURN APPLICATION Expected Return

Overlapping Generations

between 1 and 100. The teacher expected this task to take Guass several minutes to an hour to keep him busy but

Summary of Benefits RRD

Terms and conditions for the 28 - Day Interbank Equilibrium Interest Rate (TIIE) Futures Contract (Cash Settlement)

Reach higher with all of US

Class Sessions 2, 3, and 4: The Time Value of Money

Accelerated Access Solution. Chronic Illness Protection Rider. Access your death benefits while living.

1 Random Variables and Key Statistics

setting up the business in sage

FOR TEACHERS ONLY. The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION PHYSICAL SETTING/PHYSICS

T4032-ON, Payroll Deductions Tables CPP, EI, and income tax deductions Ontario Effective January 1, 2016

Models of Asset Pricing

Where a business has two competing investment opportunities the one with the higher NPV should be selected.

Models of Asset Pricing

Revolving Credit Facility. Flexible Funds for Flexible Needs

An Empirical Study on the Contribution of Foreign Trade to the Economic Growth of Jiangxi Province, China

Risk Assessment for Project Plan Collapse

FOUNDATION ACTED COURSE (FAC)

Statistical techniques

Further Pure 1 Revision Topic 5: Sums of Series

ECSE 425 Lecture 5: Quan2fying Computer Performance

FEHB. Health Benefits Coverage for Noncareer Employees

1031 Tax-Deferred Exchanges

Angola. A: Identification. B: CPI Coverage. Title of the CPI: Indice de Preços No Consumidor de Luanda

SUPPLEMENTAL MATERIAL

Section 3.3 Exercises Part A Simplify the following. 1. (3m 2 ) 5 2. x 7 x 11

The Time Value of Money in Financial Management

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

T4032-MB, Payroll Deductions Tables CPP, EI, and income tax deductions Manitoba Effective January 1, 2016

T4032-BC, Payroll Deductions Tables CPP, EI, and income tax deductions British Columbia Effective January 1, 2016

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Highlights of 3 rd Generation SPP Rules 2017

MTA (Mark-to-Auction) Examples

KEY INFORMATION DOCUMENT CFD s Generic

living well in retirement Adjusting Your Annuity Income Your Payment Flexibilities

Mark to Market Procedures (06, 2017)

ACTUARIAL RESEARCH CLEARING HOUSE 1990 VOL. 2 INTEREST, AMORTIZATION AND SIMPLICITY. by Thomas M. Zavist, A.S.A.

Helping you reduce your family s tax burden

The Valuation of the Catastrophe Equity Puts with Jump Risks

Success through excellence!

Chapter 4: Time Value of Money

Endowment Trustees Report

Osborne Books Update. Financial Statements of Limited Companies Tutorial

Masan Consumer Achieved Net Profit Post MI of VND780 billion in 1Q2018, Increase of ~6.5x as a Result of Higher Sales and Increased Profit Margins

Models of Asset Pricing

Chapter 5: Sequences and Series

Transcription:

COSC 6385 Computer Architecture Fudametals Edgar Gabriel Sprig 208 Measurig performace (I) Respose time: how log does it take to execute a certai applicatio/a certai amout of work Give two platforms X ad Y, X is times faster tha Y for a certai applicatio if Y X Performace of X is times higher tha performace of Y if Y X Perf Perf Y X Perf Perf X Y () (2)

Measurig performace (II) Timig how log a applicatio takes Wall clock time/elapsed time: time to complete a task as see by the user. Might iclude operatig system overhead or potetially iterferig other applicatios. CPU time: does ot iclude time slices itroduced by exteral sources (e.g. ruig other applicatios). CPU time ca be further divided ito User CPU time: CPU time spet i the program System CPU time: CPU time spet i the OS performig tasks requested by the program. Measurig performace E.g. usig the UNIX time commad Elapsed time User CPU time System CPU time 2

Speedup overall Amdahl s Law Describes the performace gais by acig oe part of the overall system (code, computer) org Perf Speedup (3) Perforg Amdahl s Law depeds o two factors: Fractio of the executio time affected by acemet The improvemet gaied by the acemet for this fractio org (( Fractio Fractio ) Speedup ) (4) Speedup overall org Fractio ( Fractio ) Speedup (5) 6 Speedup Amdahl s Law (III) overall Fractio ( Fractio ) Speedup 5 4 3 2 Fractio aced: 20% Fractio aced: 40% Fractio aced: 60% Fractio aced: 80% 0 0 20 40 60 80 00 Speedup aced 3

Amdahl s Law (IV) Speedup accordig to Amdahl's Law 2 0 Speedup overall 8 6 4 Speedup aced: 2 Speedup aced: 4 Speedup aced: 0 2 0 0 0.2 0.4 0.6 0.8 Fractio aced Amdahl s Law - example Assume a ew web-server with a CPU beig 0 times faster o computatio tha the previous web-server. I/O performace is ot improved compared to the old machie. The web-server speds 40% of its time i computatio ad 60% i I/O. How much faster is the ew machie overall? Fractio 0.4 Speedup 0 usig formula (5) Speedup overall ( Fractio Fractio ) Speedup 0.4 ( 0.4) 0 0.64.56 4

Amdahl s Law example (II) Example: Cosider a graphics card 50% of its total executio time is spet i floatig poit operatios 20% of its total executio time is spet i floatig poit square root operatios (FPSQR). Optio : improve the FPSQR operatio by a factor of 0. Optio 2: improve all floatig poit operatios by a factor of.6 Speedup FPSQR 0.2 ( 0.2) ( ) 0 SpeedupFP 0.5 ( 0.5) ( ).6 0.82 0.825.22.23 Optio 2 slightly faster CPU Performace Equatio Micro-processors are based o a clock ruig at a costat rate Clock cycle time: CC t legth of the discrete time evet i s Equivalet measure: Rate CPUr CCtime Expressed i MHz, GHz CPU time of a program ca the be expressed as or CPU CPU time time o cycles o CPU cycles r CC time (6) (7) 5

CPU Performace equatio (II) CPI: Average umber of clock cycles per istructio IC: umber of istructios o CPI IC cycles Sice the CPI is ofte kow (average), the CPU time is (8) CPU IC CPI time CC time Expadig formula (6) leads to CPU time istructios ocycles program istructio time o cycles (9) (0) CPU performace equatio (III) Accordig to (7) CPU performace is depedig o Clock cycle time Hardware techology CPI Orgaizatio ad istructio set architecture Istructio cout ISA ad compiler techology Note: o the last slide we used the average CPI over all istructios occurrig i a applicatio Differet istructios ca have strogly varyig CPI s o cycles CPU time i IC CPI i i ICi CPIi CC i time () (2) 6

CPU performace equatio (IV) The average CPI for a applicatio ca the be calculated as CPI i ICi CPIi IC total i ICi IC total CPI i (3) ICi IC total : Fractio of occurrece of that istructio i a program Example (I) (Page 54 i the 6 th Editio) Cosider a graphics card, with FP operatios (icludig FPSQR): frequecy 25%, average CPI 4.0 FPSQR operatios oly: frequecy 2%, average CPI 20 all other istructios: average CPI.3333333 Desig optio : decrease CPI of FPSQR to 2 Desig optio 2: decrease CPI of all FP operatios to 2.5 Usig formula (3): CPI org ICi IC i CPI CPI total CPI 2.0 0.02(20 2) org i (4*0.25) (.333333 *0.75) 2.0.64 ICi CPI2 CPIi (2.5*0.25) (.333333 *0.75).625 IC i total 7

Example (II) Slightly modified compared to the previous sectio: cosider a graphics card, with FP operatios (excludig FPSQR): frequecy 25%, average CPI 4.0 FPSQR operatios: frequecy 2%, average CPI 20 all other istructios: average CPI.33 Desig optio : decrease CPI of FPSQR to 2 Desig optio 2: decrease CPI of all FP operatios to 2.5 Usig formula (3): ICi CPIorg CPIi (4*0.25) (20*0.02) (.33*0.73) 2.3709 i ICtotal ICi CPI CPIi (4*0.25) (2*0.02) (.33*0.73) 2.009 i ICtotal ICi CPI2 CPIi (2.5*0.25) (20*0.02) (.33*0.73).9959 IC i total Choosig the right programs to test a system Most systems host a wide variety of differet applicatios Profiles of certai systems give by their purpose/fuctio Web server: high I/O requiremets hardly ay floatig poit operatios A system used for weather forecastig simulatios Very high floatig poit performace required Lots of mai memory Number of processors have to match the problem size calculated i order to deliver at least real-time results 8

Choosig the right programs to test a system (II) Real applicatio: use the target applicatio for the machie i order to evaluate its performace Best solutio if applicatio available Modified applicatios: real applicatio has bee modified i order to measure a certai feature. E.g. remove I/O parts of a applicatio i order to focus o the CPU performace Applicatio kerels: focus o the most time-cosumig parts of a applicatio E.g. extract the matrix-vector multiply of a applicatio, sice this uses 80% of the user CPU time. Choosig the right programs to test a system (III) Toy bechmarks: very small code segmets which produce a predictable result E.g. sieve of Eratosthees, quicksort Sythetic bechmarks: try to match the average frequecy of operatios ad operads for a certai program Code does ot do ay useful work 9

SPEC Bechmarks Slide based o a talk ad courtesy of Matthias Mueller, RWTH Aache What is SPEC? The Stadard Performace Evaluatio Corporatio (SPEC) is a o-profit corporatio formed to establish, maitai ad edorse a stadardized set of relevat bechmarks that ca be applied to the ewest geeratio of high-performace computers. SPEC develops suites of bechmarks ad also reviews ad publishes submitted results from our member orgaizatios ad other bechmark licesees. For more details see http://www.spec.org Slide based o a talk ad courtesy of Matthias Mueller, RWTH Aache 0

SPEC groups Ope Systems Group (desktop systems, high-ed workstatios ad servers) CPU (CPU bechmarks) JAVA (java cliet ad server side bechmarks) MAIL (mail server bechmarks) SFS (file server bechmarks) WEB (web server bechmarks) High Performace Group (HPC systems) OMP (OpeMP bechmark) HPC (HPC applicatio bechmark) MPI (MPI applicatio bechmark) Graphics Performace Groups (Graphics) Apc (Graphics applicatio bechmarks) Opc (OpeGL performace bechmarks) Slide based o a talk ad courtesy of Matthias Mueller, RWTH Aache Why do we eed bechmarks? Idetify problems: measure machie properties evolutio: verify that we make progress Coverage: help vedors to have represetative codes Icrease competitio by trasparecy Drive future developmet Relevace: help customers to choose the right computer Slide based o a talk ad courtesy of Matthias Mueller, RWTH Aache

SPEC-Bechmarks All SPEC bechmarks are publicly available ad well kow/uderstood Compiler ca itroduce special optimizatios for these bechmarks, which might be irrelevat for other, real-world applicatios. user has to provide the precise list of compile-flags user has to provide performace of base (o-optimized) ru Compiler ca use statistical iformatios collected durig the first executio i order to optimize further rus (Cache hit rates, usage of registers) Bechmarks desiged such that exteral iflueces are kept at a miumum (e.g. iput/output) Slide based o a talk ad courtesy of Matthias Mueller, RWTH Aache SPEC CPU207 43 idepedet programs SPECspeed suites always ru oe copy of each bechmark. SPECrate suites ru multiple cocurret copies of each bechmark (The tester selects how may). 0 Iteger bechmarks: 5 writte i C, 4 i C++, i Fortra 4 Floatig Poit: 6 i Fortra, 5 i C++, 3 i C Additioal iformatio is available for each bechmark: Author of the bechmark Detailled descriptio Documetatio regardig Iput ad Output Potetial problems ad refereces. 2

SPEC CPU 207 Iteger bechmarks SPEC CPU 207 Floatig Poit Bechmarks 3

Example for a CINT bechmark 4

Performace metrics SPECspeed Iteger Metrics: SPECspeed207_it_base (Required Base metric) SPECspeed207_it_peak (Optioal Peak metric) SPECspeed Floatig Poit Metrics: SPECspeed207_fp_base (Required Base metric) SPECspeed207_fp_peak (Optioal Peak metric) SPECrate Iteger Metrics: SPECrate207_it_base (Required Base metric) SPECrate207_it_peak (Optioal Peak metric) SPECrate Floatig Poit Metrics: SPECrate207_fp_base (Required Base metric) SPECrate207_fp_peak (Optioal Peak metric) Base metrics: all modules of a give suite must be compiled with same argumets Peak metrics: differet optios for each bechmark are allowed Performace metrics (II) All results are relative to a referece system The fial results is computed by usig the geometric mea values Speed: Rate: SPEC SPEC i ref it/ fp ( *00) i i t ru t t i ref it/ fp ( *.6* N) i i t ru with: : umber of bechmarks i a suite t i ref / t i ru : executio time for bechmark i o the referece/test system N: Number of simultaeous tasks Slide based o a talk ad courtesy of Matthias Mueller, RWTH Aache 5

Reportig results SPEC produces a miimal set of represetative umbers: + Reducies complexity to uderstad correlatios + Easies compariso of differet systems - Loss of iformatio Results have to be compliat to the SPEC bechmakrig rules i order to be approved as a official SPEC report All compoets have to available at least 3 moth after the publicatio (icludig a rutime eviromet for C/C++/Fortra applicatios) Usage of SPEC tools for compilig ad reportig Each idividual bechmark has to be executed at least three times Verificatio of the bechmark output A maximum of four optimizatio flags are allowed for the base ru (icludig preprocessor ad lik directives) Disclosure report cotaiig all relevat data has to be available 6

7