Introduction to CMOS VLSI Design Lecture 19: Design for Skew Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture notes) Outline Clock Distribution Clock Skew Skew-Tolerant Static Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits 19: Design for Skew CMOS VLSI Design Slide 2 Clocking Synchronous systems use a clock to keep operations in sequence – Distinguish this from previous or next – Determine speed at which machine operates Clock must be distributed to all the sequencing elements – Flip-flops and latches Also distribute clock to other elements – Domino circuits and memories 19: Design for Skew CMOS VLSI Design Slide 3 Clock Distribution On a small chip, the clock distribution network is just a wire – And possibly an inverter for clkb (clock’s complement) On practical chips, the RC delay of the wire resistance and gate load is very long – Variations in this delay cause clock to get to different elements at different times – This is called clock skew Most chips use repeaters to buffer the clock and equalize the delay – Reduces but doesn’t eliminate skew 19: Design for Skew CMOS VLSI Design Slide 4 Example Skew comes from differences in gate and wire delay – With right buffer sizing, clk1 and clk2 could ideally arrive at the same time. – But power supply noise changes buffer delays – clk2 and clk3 will always see RC skew gclk 3 mm 3.1 mm clk1 clk2 1.3 pF 19: Design for Skew 0.4 pF CMOS VLSI Design 0.5 mm clk3 0.4 pF Slide 5 Review: Skew Impact F1 Q1 Combinational Logic D2 Tc clk tpcq Q1 tskew tpdq tsetup D2 clk t pd Tc t pcq tsetup tskew Q1 CL clk D2 tcd thold tccq tskew F2 sequencing overhead clk F2 clk F1 Ideally full cycle is available for work Skew adds sequencing overhead Makes hold time worse tskew clk thold Q1 tccq D2 19: Design for Skew tcd CMOS VLSI Design Slide 6 Cycle Time Trends Much of CPU performance comes from higher f – f is improving faster than simple process shrinks – Sequencing overhead is bigger part of cycle 1000 100 MHz SpecInt95 10 1 80386 80486 Pentium Pentium II / III 0.1 0.01 1985 1988 1991 1994 80386 80486 Pentium Pentium II / III 1997 10 1985 2000 1988 1991 1994 1997 2000 100 500 VDD = 3.3 VDD = 5 FO4 inverter delays / cycle Fanout-of-4 (FO4) Inverter Delay (ps) 100 VDD = 2.5 200 100 50 2.0 1.2 0.8 0.6 0.35 0.25 50 80386 80486 Pentium Pentium II / III 20 10 1985 1988 1991 1994 1997 2000 Process 19: Design for Skew CMOS VLSI Design Slide 7 Solutions Reduce clock skew – Careful clock distribution network design – Plenty of metal wiring resources Analyze clock skew – Only budget actual, not worst case skews – Local vs. global skew budgets Tolerate clock skew – Choose circuit structures insensitive to skew 19: Design for Skew CMOS VLSI Design Slide 8 Clock Dist. Networks Ad hoc Grids H-tree Hybrid 19: Design for Skew CMOS VLSI Design Slide 9 Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die 19: Design for Skew CMOS VLSI Design Slide 10 Alpha Clock Grids Alpha 21064 Alpha 21164 Alpha 21264 PLL gclk grid Alpha 21064 19: Design for Skew gclk grid Alpha 21164 CMOS VLSI Design Alpha 21264 Slide 11 H-Trees Fractal structure – Gets clock arbitrarily close to any point – Matched delay along all paths Delay variations cause skew A A and B might see big skew 19: Design for Skew CMOS VLSI Design B Slide 12 Itanium 2 H-Tree Four levels of buffering: – Primary driver – Repeater – Second-level clock buffer – Gater Route around obstructions Repeaters Typical SLCB Locations Primary Buffer 19: Design for Skew CMOS VLSI Design Slide 13 Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid Ex: IBM Power4, PowerPC – H-tree drives 16-64 sector buffers – Buffers drive total of 1024 points – All points shorted together with grid 19: Design for Skew CMOS VLSI Design Slide 14 Skew Tolerance Flip-flops are sensitive to skew because of hard edges – Data launches at latest rising edge of clock – Must setup before earliest next rising edge of clock – Overhead would shrink if we can soften edge Latches tolerate moderate amounts of skew – Data can arrive anytime latch is transparent 19: Design for Skew CMOS VLSI Design Slide 15 Skew: Latches Q1 Combinational Logic 1 D2 1 Q2 Combinational Logic 2 D3 Q3 pdq 1 sequencing overhead tcd 1 , tcd 2 thold tccq tnonoverlap tskew tborrow 2 L3 2t D1 L1 t pd Tc 1 L2 2-Phase Latches 2 Tc tsetup tnonoverlap tskew 2 Pulsed Latches t pd Tc max t pdq , t pcq tsetup t pw tskew sequencing overhead tcd thold t pw tccq tskew tborrow t pw tsetup tskew 19: Design for Skew CMOS VLSI Design Slide 16 Dynamic Circuit Review Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS transistors from the inputs – Precharge: = 0 output forced high – Evaluate: = 1 output may pull low A B C D A B 19: Design for Skew Y C A Y B C D D CMOS VLSI Design Slide 17 Domino Circuits Dynamic inputs must monotonically rise during evaluation – Place inverting stage between each dynamic gate – Dynamic / static pair called domino gate Domino gates can be safely cascaded domino AND W X A B dynamic static NAND inverter 19: Design for Skew CMOS VLSI Design Slide 18 Domino Timing Domino gates are 1.5 – 2x faster than static CMOS – Lower logical effort because of reduced Cin Challenge is to keep precharge off critical path Look at clocking schemes for precharge and eval – Traditional schemes have severe overhead – Skew-tolerant domino hides this overhead 19: Design for Skew CMOS VLSI Design Slide 19 Traditional Domino Ckts Hide precharge time by ping-ponging between halfcycles – One evaluates while other precharges – Latches hold results during precharge Tc clk clk tpdq 19: Design for Skew CMOS VLSI Design Latch Dynamic clk clk Static Dynamic clk Static Dynamic clk Static Dynamic Latch Dynamic clk clk clk Static Dynamic clk Static Dynamic clk Static clk Dynamic t pd Tc 2t pdq tpdq Slide 20 Clock Skew Skew increases sequencing overhead – Traditional domino has hard edges – Evaluate at latest rising edge – Setup at latch by earliest falling edge clk Latch Dynamic clk clk Static Dynamic Dynamic clk Static clk Latch clk clk Dynamic Static clk Static Dynamic clk Dynamic t pd Tc 2tsetup 2tskew clk tsetup tskew 19: Design for Skew CMOS VLSI Design Slide 21 Time Borrowing Logic may not exactly fit half-cycle – No flexibility to borrow time to balance logic between half cycles Traditional domino sequencing overhead is about 25% of cycle time in fast systems! clk Latch clk Static clk Dynamic clk Static clk Dynamic Static Dynamic clk Static Dynamic clk Latch clk tsetup tskew 19: Design for Skew CMOS VLSI Design Slide 22 Relaxing the Timing Sequencing overhead caused by hard edges – Data departs dynamic gate on late rising edge – Must setup at latch on early falling edge Latch functions – Prevent glitches on inputs of domino gates – Holds results during precharge Is the latch really necessary? – No glitches if inputs come from other domino – Can we hold the results in another way? 19: Design for Skew CMOS VLSI Design Slide 23 Skew-Tolerant Domino Use overlapping clocks to eliminate latches at phase boundaries. – Second phase evaluates using results of first No latch at phase boundary b c 1 1 2 2 a a b b c c 19: Design for Skew Static a Dynamic 2 Static Dynamic 1 d CMOS VLSI Design Slide 24 Full Keeper After second phase evaluates, first phase precharges Input to second phase falls – Violates monotonicity? But we no longer need the value Now the second gate has a floating output – Need full keeper to hold it either high or low H X f 19: Design for Skew weak full keeper transistors CMOS VLSI Design Slide 25 Time Borrowing Overlap can be used to – Tolerate clock skew – Permit time borrowing No sequencing overhead toverlap tborrow tskew 1 Phase 1 19: Design for Skew CMOS VLSI Design Static Dynamic 2 Static Dynamic 2 Static Dynamic 2 Static Dynamic 1 Static Dynamic 1 Static Dynamic 1 Static Dynamic 1 Static 1 Dynamic t pd Tc 2 Phase 2 Slide 26 Multiple Phases With more clock phases, each phase overlaps more – Permits more skew tolerance and time borrowing 1 2 3 4 Phase 1 19: Design for Skew Phase 2 Phase 3 CMOS VLSI Design Static Dynamic 4 Static Dynamic 4 Static Dynamic 3 Static Dynamic 3 Static Dynamic 2 Static Dynamic 2 Static Dynamic 1 Static Dynamic 1 Phase 4 Slide 27 Clock Generation en clk 1 2 3 4 19: Design for Skew CMOS VLSI Design Slide 28 Summary Clock skew effectively increases setup and hold times in systems with hard edges Managing skew – Reduce: good clock distribution network – Analyze: local vs. global skew – Tolerate: use systems with soft edges Flip-flops and traditional domino are costly Latches and skew-tolerant domino perform at full speed even with moderate clock skews. 19: Design for Skew CMOS VLSI Design Slide 29
© Copyright 2026 Paperzz