Lecture 8: Skew Tolerant Domino Clocking

Similar documents
Lecture 8: Skew Tolerant Design (including Dynamic Circuit Issues)

Lecture 20: Sequential Circuits. Sequencing

EE115C Spring 2013 Digital Electronic Circuits. Lecture 19: Timing Analysis

Statistical Static Timing Analysis: How simple can we get?

Practice 10: Ratioed Logic

Sequential Gates. Gate Level Design. Young Won Lim 3/15/16

A Heuristic Method for Statistical Digital Circuit Sizing

Chapter 7. Registers & Register Transfers. J.J. Shann. J. J. Shann

Chapter 7 A Multi-Market Approach to Multi-User Allocation

Introduction to Real-Time Systems. Note: Slides are adopted from Lui Sha and Marco Caccamo

Bitline PUF:! Building Native Challenge-Response PUF Capability into Any SRAM. Daniel E. Holcomb Kevin Fu University of Michigan

EXERCISES ON PERFORMANCE EVALUATION

Handout 4: Deterministic Systems and the Shortest Path Problem

TDT4255 Lecture 7: Hazards and exceptions

4 Reinforcement Learning Basic Algorithms

COS 318: Operating Systems. CPU Scheduling. Jaswinder Pal Singh Computer Science Department Princeton University

Data Dissemination and Broadcasting Systems Lesson 08 Indexing Techniques for Selective Tuning

A Physical Unclonable Function based on Capacitor Mismatch in a Charge-Redistribution SAR-ADC

Hello Traders, Cutting Edge Forex Proudly Presents our finest work. Silicon Raptor

Fibonacci Heaps Y Y o o u u c c an an s s u u b b m miitt P P ro ro b blle e m m S S et et 3 3 iin n t t h h e e b b o o x x u u p p fro fro n n tt..

SCHEDULE CREATION AND ANALYSIS. 1 Powered by POeT Solvers Limited

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Ti 83/84. Descriptive Statistics for a List of Numbers

Why know about performance

Soft Response Generation and Thresholding Strategies for Linear and Feed-Forward MUX PUFs

How Computers Work Lecture 12

ITM1010 Computer and Communication Technologies

REPORT TO BRITISH COLUMBIA S FAIR WAGES COMMISSION. Jeff Guignard, Executive Director of ABLE BC November 23, 2017 in Vancouver

Analytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Anne Bracy CS 3410 Computer Science Cornell University

Option Properties Liuren Wu

Physical Unclonable Functions (PUFs) and Secure Processors. Srini Devadas Department of EECS and CSAIL Massachusetts Institute of Technology

INTERNATIONAL UNIVERSITY OF JAPAN Public Management and Policy Analysis Program Graduate School of International Relations

ULTIMATE REVERSAL TRADING STRATEGY

WESTERNPIPS TRADER 3.9

SCHOOL OF BUSINESS, ECONOMICS AND MANAGEMENT. BF360 Operations Research

Expectimax and other Games

Structured Buying & Energy Risk Management Assessment

P1: TIX/XYZ P2: ABC JWST JWST075-Goos June 6, :57 Printer Name: Yet to Come. A simple comparative experiment

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT

Lesson Plan for Simulation with Spreadsheets (8/31/11 & 9/7/11)

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

To acquaint yourself with the practical applications of simulation methods.

Three Components of a Premium

Swing Trading Using The 4 Hour Chart 1 Part 1 Introduction To Swing Trading

Conoco s Value and IPO: Real Options Analysis 1

Real-time Scheduling of Aperiodic and Sporadic Tasks (2) Advanced Operating Systems Lecture 5

Financial Optimization ISE 347/447. Lecture 18. Dr. Ted Ralphs

Validating TIP$TER Can You Trust Its Math?

Mark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling

EtherJack.io is the first fully smart contract based jackpot game. The core game process is safe and secure, running completely on-chain, operated by

Modeling Logic Gates with Delay- Part#1

FOREX UNKNOWN SECRET. by Karl Dittmann DISCLAIMER

Recap First-Price Revenue Equivalence Optimal Auctions. Auction Theory II. Lecture 19. Auction Theory II Lecture 19, Slide 1

RPM Presentation #2. Slide 1:

ECONOMICS 103. Topic 7: Producer Theory - costs and competition revisited

So we turn now to many-to-one matching with money, which is generally seen as a model of firms hiring workers

Lecture Notes 1: Solow Growth Model

Optimal Integer Delay Budget Assignment on Directed Acyclic Graphs

High throughput implementation of the new Secure Hash Algorithm through partial unrolling

Accelerating Financial Computation

Notes 6: Examples in Action - The 1990 Recession, the 1974 Recession and the Expansion of the Late 1990s

Erdem Başçi: Recent economic and financial developments in Turkey

Copyright by Profits Run, Inc. Published by: Profits Run, Inc Beck Rd Unit F1. Wixom, MI

ADVANCED QUANTITATIVE SCHEDULE RISK ANALYSIS

Lecture outline W.B. Powell 1

ECO 445/545: International Trade. Jack Rossbach Spring 2016

PROBLEM SET 7 ANSWERS: Answers to Exercises in Jean Tirole s Theory of Industrial Organization

3/1/2016. Intermediate Microeconomics W3211. Lecture 4: Solving the Consumer s Problem. The Story So Far. Today s Aims. Solving the Consumer s Problem

Foreign Exchange Risk Management at Merck: Background. Decision Models

Fabrizio Perri Università Bocconi, Minneapolis Fed, IGIER, CEPR and NBER October 2012

Legend. Extra options used in the different configurations slow Apache (all default) svnserve (all default) file: (all default) dump (all default)

AFRL-RI-RS-TR

Instruction Selection: Preliminaries. Comp 412

Project Management. Project Mangement. ( Notes ) For Private Circulation Only. Prof. : A.A. Attarwala.

Day Trade Warrior. Chapter 5. Intraday Chart Patterns

Agenda. Lecture 2. Decision Analysis. Key Characteristics. Terminology. Structuring Decision Problems

ATOP-DOWN APPROACH TO ARCHITECTING CPI COMPONENT PERFORMANCE COUNTERS

The Agent-Environment Interface Goals, Rewards, Returns The Markov Property The Markov Decision Process Value Functions Optimal Value Functions

Global Financial Management

Lecture Outline. Scheduling aperiodic jobs (cont d) Scheduling sporadic jobs

Chapter 11: PERT for Project Planning and Scheduling

CSE 417 Dynamic Programming (pt 2) Look at the Last Element

Business Analysis for Engineers Prof. S. Vaidhyasubramaniam Adjunct Professor, School of Law SASTRA University-Thanjavur

Business Analysis for Engineers Prof. S. Vaidhyasubramaniam Adjunct Professor, School of Law SASTRA University-Thanjavur

Radner Equilibrium: Definition and Equivalence with Arrow-Debreu Equilibrium

Comparison of two worst-case response time analysis methods for real-time transactions

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Penny Stock Guide. Copyright 2017 StocksUnder1.org, All Rights Reserved.

Demographics, Structural Reform and the Growth Outlook for Europe

The Fish Hook Pattern

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras

Real Estate Private Equity Case Study 3 Opportunistic Pre-Sold Apartment Development: Waterfall Returns Schedule, Part 1: Tier 1 IRRs and Cash Flows

Other Regarding Preferences

CCFp DASHBOARD USER GUIDE

1 The Solow Growth Model

Processor-Based Strong Physical Unclonable Functions with Aging-Based Response Tuning

Transcription:

Lecture 8: Skew Tolerant Domino Clocking Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2001 by Mark Horowitz (Original Slides from David Harris) 1

Introduction Domino Circuits are becoming ubiquitous in high speed digital ICs Offer 30% (or more) speedup over static CMOS raw gate delay Dual-rail domino becoming more common because many functions are nonmonotonic, area is less of an issue Nevertheless, traditional domino pipelines have significant overhead Latch required to hold result while next stage evals, prev. precharges Skew budget, no time borrowing, latch delay Look at several ways to reduce this overhead Better latches, Self-timing Skew-tolerant domino is a powerful new technique Evaluate performance benefits of skew-tolerant domino 2

Domino from a System Perspective Domino doesn t look so attractive in the context of a traditional pipeline clk clk_b Legend: Static: One inverting static gate Domino: One inverting dynamic gate Latch: Inverting tristate latch 1. Pay clock skew twice each phase 2. Balancing short phases is hard since there is no time borrowing 3. Latches become a significant fraction of the cycle time 3

Traditional Domino Performance Evaluation Let T = cycle time = 20 FO4 delays; t skew = 2; t setup = 1 1 Difficult filling cycle exactly (no time borrowing) -> t imbalance = 1 T phase-logic = T/2 - t skew -t setup -t imbalance Baseline Design: T phase-logic = 20/2-2-1-1 = 6 40% of the phase is wasted in overhead! Slower than static! Optimized Design: Define clock domains and use t skew-local = 1 Work hard to balance logic between phases: t imbalance = 0 (optimistic) T phase-logic = 20/2-1- 1-0= 8 Still, 20% of the phase is overhead! 1. Remember for this situation, the setup time must be large enough that the output has settled before clock arrives since the output might go into a dynamics gate on the next cycle and might not be monotonic 4

Early Enhancements Good designers have recognized this problem for years. The largest problem is the hard edges set by the latches. A variety of latches soften this edge: Gate outputs are already _q1, so why use another clock. An SR latch will work instead Use the monotonic nature of the signal to feed it into a precharged latch stage SR Latch φ Dual-Monotonic Latch from domino Still have a problem if you want to use non-monotonic logic somewhere, since logic must settle before earliest clock, while gate might not evaluate until a late clock φ TSPC Latch But if you only have monotonic gates... 5

Skew Tolerant Domino Clocking If inputs are all dual rail, then as long as the clock arrives before the data, The gate will wait and fire when the data arrives If the next gate fires before the current gate precharges, There is no need for a latch Like the self-timed pipeline Can generate these properties using overlapping clocks 6

Skew-Tolerant Domino Circuits How much clock skew could we tolerate given N clock phases? Divide logic into N phases of T/N duration each. Overlapping clocks eliminates need for latches Extra overlap accommodates clock skew and time borrowing φ1 φ2 φ1 φ1 φ1 φ1 φ2 φ2 φ2 φ2 As with other domino techniques, budget skew on the transition from static to domino 7

Skew Tolerance T = t e +t p t p =t prech +t skew ;t e = T/N + t skew +t hold Hence t skew-max = [T(N-1)/N - t prech -t hold ] / 2 φ1 φ2 φ1a φ1b t e must o verlap by t hold tp φ2a Effective Precharge Window φ1a φ1b φ2a 8

Numerical Example Let t prech = 4, long enough to: precharge domino gate make subsequent skewed static fall below V t t hold is slightly negative for reasonable cell libraries next phase can evaluate before precharge ripples through static gate conservatively bound t hold at 0 N t skew t p 2 2 6 3 3.33 7.33 4 4 8 6 4.66 8.66 8 5 9 Sweet spots: N=2 (fewest clocks), N=4 (good tolerance, 50% duty cycle) 9

Global & Local Skew This is good, but we can do better! Local skew can be more tightly controlled than global skew (~ 1 FO4) Require that each phase of logic fit in a local clock domain: t p =t prech + t skew-local ; t e = T/N + t skew-global +t hold Hence t skew-global-max = T(N-1)/N - t skew-local -t prech -t hold When t skew-global gets huge, precharge interferes with subsequent phase N t skew-global t p 2 2 5 3 5.66 5 4 6 6 6 6 7.33 8 6 8 10

Time Borrowing We don t need such a large global skew tolerance! Use some of this time instead to allow time borrowing t borrow = T(N-1)/N - t skew-global -t skew-local -t prech -t hold Intentional borrowing helps balance logic between phases Opportunistic time borrowing compensates for uncertainties in models, analysis tools, and processing If actual t skew-global = 2, t skew-local = 1: N t borrow t p 2 1 5 3 3.66 5 4 5 5 6 6..33 5 8 7 5 11

Other Design Issues State is no longer stored in the latch at the end of a phase Instead, it is held by the first domino gate in the phase Use a full keeper to allow stop-clock operation from φ1 block weak φ 2 All systems with overlapping clocks require min-delay checks Domino paths are presumably critical anyway, so few mindelay errors 4-phase has effectively no min-delay risk Overlap of all four phases is at most very small A minimum of 8 gates are in the cycle anyway 12

Skew-Tolerant Performance Evaluation Evaluate ALU self-bypass of superscalar µproc (like DEC Alpha) 3-metal 0.6 µm process FO4 delay in TT corner = 138 ps Compare traditional domino to 4-phase skew-tolerant domino x2 Add/Sub 64-bit Adder Traditional Result Mux 1 mm x4 Bypass Mux To Data Cache 1 mm 2 mm Other ALU blocks (150 ff) x2 Add/Sub 64-bit Adder Skew-Tolerant Result Mux 1 mm x4 Bypass Mux To Data Cache 1 mm 2 mm Other ALU blocks (150 ff) 13

Simulation Results No Skew: Traditional Domino: Latency = 13.0 FO4, cycle time = 16.6 Cycles are unbalanced; no time borrowing available Skew-Tolerant Domino: Latency = 11.9 FO4, cycle time = 11.9 Remove latches from critical path, balance pipe stages 1 FO4 local skew: Traditional Domino: Latency = 15.0 FO4, cycle time = 17.6 Skew adds to both phases for latency Unbalanced second stage already has margin in the cycle time Skew-Tolerant Domino: Latency = 11.9 FO4, cycle time = 11.9 Skew is tolerated 14

Summary Offers most of the benefits of self-timed designs while preserving the simplicity of a synchronous methodology. Clock generation & distribution becomes key issue. However, control generation and distribution can be just as tough in self-timed designs. Skew-Tolerant Domino eliminates most of the overhead found in traditional domino systems: Tolerates clock skew Removes latches from the critical path Allows time borrowing Robust High-performance microprocessor designs have used these ideas but they don t talk about them. 15