TSPC Flip-Flop Circuit Design with Three-Independent

TSPC Flip-Flop Circuit Design with
Three-Independent-Gate Silicon Nanowire FETs
Xifan Tang1 , Jian Zhang2 , Pierre-Emmanuel Gaillardon2 , Giovanni De Micheli2
1
School of Computer and Communication Sciences, EPFL, Switzerland
2
Integrated Systems Laboratory, EPFL, Switzerland
Email: [email protected]
Abstract—True Single-Phase Clock (TSPC) Flip-Flops, based
on dynamic logic implementation, are area-saving and highspeed compared to standard static flip-flops. Furthermore, logic
gates can be embedded into TSPC flip-flops which significantly
improves performance. As a promising approach to keep the pace
of Moore’s Law, functionality-enhanced devices with multiple independent gates have drown many recent interests. In particular,
Three-Independent-Gate Silicon Nanowire FETs (TIG SiNWFETs)
can realize the functionality of two serial transistors in a single
device. Therefore, they open new opportunities to compact designs
in both arithmetic and control circuits. In this paper, we propose
TSPC flip-flop implementation with asynchronous set and reset
using the compactness of TIG SiNWFET. Electrical simulations
show that TIG SiNWFET-based TSPC flip-flop improves nearly
20%, 30% and 7% in area, delay and leakage power respectively
as compared to its LSTP FinFET counterpart at 22nm.
I.
S
PGS
CG
PGD
D
(a)
S
S
PGS
CG
S
PGS
CG
CG
PGD
PGD
D
D
I NTRODUCTION
PGD = GND
D
PGS = VDD
(c)
As an essential component of sequential circuits, flipflops have a large impact on the area, speed and power
consumption of modern digital circuits [1]. Flip-flops can be
built using either static or dynamic design [2]. Static flip-flops
are usually based around two inverter-based latches, that have
a large impact on area and delay. However, flip-flops based on
dynamic logic have advantages in high operating speed and
area density compared to static ones [1]. Most flip-flops require
both clock signal and its inversion, which challenges clock
tree synthesis. However, True Single-Phase Clock (TSPC) flipflop [3], a dynamic flip-flop, needs only a single clock signal.
Using single phase clock and dynamic logic do not only lead to
compact design but also faster response. In addition, a unique
feature of TSPC flip-flop is the reduction of the overall setup
time and delay obtained by embedding logic gates inside the
structure [2]. For these reasons, TSPC flip-flop is widely used
in many high-speed applications [4].
In order to extend Moore’s Law, intensive researches on
emerging technologies, such as carbon nanotubes [5], graphene
[6], and SiNWFET [7] recently proposed novel devices that
have better channel controllability and functionality. Among
them, SiNWFET is promising thanks to its controllable ambipolar behavior [9] [7], i.e., the dynamic reconfiguration of
the device polarity from n- to p-type thanks to an extra gate
terminal, and its CMOS compatible fabrication process. Recent
researches [9] [10] present efficiencies of Double-IndependentGate (DIG) and Three-Independent-Gate (TIG) SiNWFETs
in implementing combinational circuits, in terms of area and
delay, due to the design compactness brought by the devices.
However, only limited investigations has been done so far on
sequential elements.
In this paper, we propose to combine the performance
of TSPC flip-flop design with the compactness offered by
978-1-4799-3432-4/14/$31.00 ©2014 IEEE
(b)
1660
Fig. 1. TIG SiNWFET structure (a), symbol (b) and polarity
control (c)
TIG SiNWFETs. A TIG SiNWFET-based TSPC flip-flop with
asynchronous set and reset is therefore presented. Performance
evaluation obtained by electrical simulations demonstrates an
area, delay and leakage power reduction of almost 20 %, 30 %
and 7% respectively. TSPC flip-flop with logic gate embedded
is investigated as well. For AND gate, TIG SiNWFET implementation presents a 21% area gain, a 6% speed up and 45%
leakage reduction.
The rest of the paper is organized as follows. In Section
II, basics of TIG SiNWFET and CMOS TSPC flip-flop are
introduced. In Section III, TIG SiNWFET TSPC flip-flop
circuit design and electrical characterizations are presented.
In Section IV, XOR embedded TIG SiNWFET TSPC flip-flop
is introduced. In Section V, conclusions are drawn.
II.
BACKGROUND
In this section, operation of TIG SiNWFET are briefly
introduced. Then, the classical CMOS design of TSPC flipflop and its logic-gate-embedded feature are reviewed.
A. TIG SiNWFET
As shown in Fig. 1, TIG SiNWFET [9] is a verticallystacked silicon nanowire FET with three independent gate-allaround regions. Two independent polarity gates, Polarity Gate
at Source (PGS) and Polarity Gate at Drain (PGD) respectively
control the tunneling of electrons and holes, while the Control
VDD
VDD
S
S
D
S
CLK
CLK
X
S
CLK
CLK
R
R
Y
Q
Y
A
B
CLK
S
CLK
R
R
CLK
A
S
B
GND
Q
X
Q
D
Qb
CLK
S
S
GND
GND
Fig. 2. CMOS TSPC flip-flop with set and reset [3] [14]
Fig. 3. CMOS AND-gate embedded TSPC schematic
Gate (CG) switches the channel conduction as in conventional
MOSFETs. As shown in Fig. 1, TIG SiNWFET can equivalently realize two serial transistors with a unique device, by
setting PGS to VDD (for two serial n-type transistors) or PGD
to GND (for two serial p-type transistors). In traditional CMOS
design style, replacing two serial transistors by a unique TIG
SiNWFET leads to about 33% area saving though the area
of a TIG SiNWFET is 50% larger than a MOSFET. Note
that two serial transistors frequently take place not only in
combinational gates, such as XOR, NAND, and AOI, but also
in dynamic logic, such as TSPC flip-flop. Hence, by applying
conversion in Fig. 1, area savings are expected. Additionally,
circuits can be speed up as a single TIG SiNWFET presents
less internal capacitance than two serial CMOS transistors.
B. CMOS TSPC Design
CMOS TSPC flip-flop can be built with only 9 transistors,
which is very compact as compared to static version with 22
transistors [2]. A TSPC flip-flops with asynchronous reset and
set requires 6 additional transistors for pulling-up to VDD or
pulling-down to GND at each stage.
As depicted in Fig. 2, CMOS TSPC flip-flop is composed
of three stages.
1) 1st Stage - Precharge Stage, CLK-Low Enable: The
first stage is a precharge stage. The stage is transparent when
CLK=0, i.e. the node X is updated with the value of D only
when CLK=0.
2) 2nd Stage - Latch Stage: The second stage is a latch
stage, storing the value of node X at the rising edge of CLK.
When CLK=0, node Y is precharged to VDD. At the rising
edge of CLK, node Y can be updated to GND if node X=1.
Note that even if node X can be pulled-down when CLK=1,
the latched value at node Y cannot be modified.
rd
3) 3 Stage - Precharge Stage, CLK-High Enable: The
third stage is another precharge stage but is transparent when
CLK=1. The third stage propagates node Y only when CLK=1.
However, based on precharging and discharging, TSPC
flip-flop depends on internal capacitance for storage, which
requires periodical refreshing (on the order of milliseconds
[2]).
In addition, logic gates can be directly inserted into the first
stage of CMOS TSPC flip-flop. Fig. 3 depicts a TSPC flip-flop
with a prior AND function. The setup time of a single TSPC
flip-flop increases but considering a AND gate cascaded by a
standard TSPC flip-flop, the overall setup time decreases [2].
1661
III.
S TANDARD TIG S I NWFET TSPC F LIP - FLOP
In this section, TIG SiNWFET-based TSPC flip-flop design
is proposed. First, we discuss the circuit structure. Second,
the functionality of the TIG SiNWFET TSPC flip-flop is
verified by electrical simulations, and comparison in area and
delay between standard CMOS and TIG SiNWFET designs is
discussed.
A. TSPC Flip-flop Structural Modifications
As introduced in Section II, TIG SiNWFET leads to area
and timing efficiency in realizing two serial in a unique device.
This situation frequently takes place in CMOS TSPC flipflop operations (Fig. 2). In addition, TIG SiNWFET can be
dynamically reconfigured from n- to p-type [9], depending
on the signal applied to its gates. Therefore, it is possible to
dynamically convert a pull-up transistor into a pull-down one
and vice-versa. A combination of these two properties is used
to compact the original TSPC flip-flop design.
Fig. 4 depicts a TIG SiNWFET TSPC flip-flop with asynchronous set and reset operations (High Enabled). The novel
circuit consists of only 8 TIG SiNWFETs and its schematic fits
the layout style in [10], consuming only 3 tiles. By applying
design rules of Fig. 1, all control gates and polarity gates
are fully used. Compared with CMOS design (Fig. 2), the
novel circuit removes the pull-up and pull-down transistors
at first stage and third stage. Transistor N1 (highlighted by
a red rectangle in Fig. 4) plays a dual role, as it replaces
part of pull-up and pull-down branches of the first stage of
Fig. 2. Transistor N2 plays an identical role for the third
stage. In addition to merge serial transistors into single devices,
improvements are done at the two pre-charge stages.
1) Improvements at 1st Stage: Asynchronous set requires
node X to be pulled down to GND at the first stage even
when CLK=0 and D=0. In CMOS design, an additional pulldown transistor, controlled by signal S, is required in order to
avoid a path from VDD to GND (blue dashed line in Fig. 2)
when CLK=1 and D=0. Then, to prevent another path from
VDD to GND (red dashed line at first stage in Fig. 2) when
CLK=0 and D=0, a pull-up transistor, controlled again by
signal S, is added. The transistor N1 in Fig. 4 realizes this
dual functionality. When set is enabled, transistor N1 switches
from n-type to p-type, while its source is set to VDD. Thus,
transistor N1 is only in on state when CLK=0 and X=0 (when
D=0, X=0), which ensures that node Y can be pulled up to
VDD whatever CLK and D are.
VDD
S
CLK
CLK
D
S
R
GND
Y
X
S
S
GND
S
N1
CLK
S
LSTP
TIG
FinFET
SiNWFET
Area(# of transistors)
15
12
Rise (ps)
-110.13
-109.22
Setup Time
Fall (ps)
-79.55
-81.75
Average (ps)
-94.84
-95.5
Rise (ps)
86.03
86.90
Hold Time
Fall (ps)
117.46
114.5
Average (ps)
101.75
100.7
Rise (ps)
22.22
13.42
Clock To Q Delay
Fall (ps)
34.88
25.35
Average (ps)
28.55
19.39
Leakage Power(nW)
0.26
0.24
*SiNWFET area = 1.5 * # of transistors.
*Setup time can be less than zero, which explained in [13].
Benchmark/TSPC
VDD
R
CLK
S
GND
N2
Q
VDD
S
VDD
D
VDD
TABLE I. Comparison between FinFET and TIG SiNWFET
TSPC flip-flops
S
R
Q
GND
Fig. 4. TIG SiNWFET TSPC schematic
2) Improvements at 3rd Stage: Asynchronous set requires
immediate logic-high output when enabled. Therefore, the
third stage should be pulled down once set is enabled even
when CLK=0. In CMOS design, a pull-down transistor is required. However, in TIG SiNWFET design, similarly, transistor
N2 switches from p-type to n-type, while its source is set to
GND when set is enabled. As node Y is pulled up to VDD
when set is enabled, transistor N2 is in on state, thereby pulling
down node Q to GND.
B. Transient Validation
Comparison
-20.00%
0.83%
-2.77%
-0.68%
1.01%
-2.52%
-1.03%
-39.60%
-27.32%
-32.10%
-7.00%
by electrical simulations, are compared between TSPC flipflop designs, implemented both in traditional CMOS (Fig. 2)
and in TIG SiNWFET (Fig. 4), using a technological node of
22nm. Note that static flip-flops are out of the scope of this
paper and therefore are not discussed. For CMOS technology,
we use PTM 22nm LSTP FinFET model [12]. To accurately
measure the minimum setup time/hold time and clock to Q
delay, a binary search approach is used by setting a delay
tolerance corresponding to 10% of the reference delay and a
resolution of 0.01ps [11].
To validate the correct behavior of the cell under asynchronous set and reset, electrical simulations are run and
transient waveforms are shown in Fig. 5 and Fig. 6. A simple
22nm TIG SiNWFET table-based compact model, derived
from [9], is used with HSPICE simulator. Fig. 5 verifies
that the output Q can be pulled up once set is enabled even
during the most challenging case (CLK=0, D=0 and Q=0). For
asynchronous reset, the most challenging case happens when
CLK=0, D=1 and Q=1. From Fig. 6, the output Q is observed
to be correctly pulled down, when set operation is triggered.
Once set/reset signals are de-asserted, the output Q switches
again accordingly to the next clock rising edge.
Table I shows the comparison the two TSPC flip-flop
implementations. Thanks to the compactness properties of TIG
SiNWFETs, an area saving of up to 20% is achievable. Regarding timing performances, TIG SiNWFET TSPC flip-flop
reduces its internal delay by 30% on average. The remarkable
performance gains come from the area reduction given by
TIG SiNWFETm as well as the intrinsic parasitic capacitance
reduction given by a unique device instead of two serial CMOS
transistors.
C. Circuit-level Performance Results
In addition to the realization of more compact pull-up
and pull-down networks, TIG SiNWFETs are also able to
implement the AND logic function natively [9]. This feature
can be efficiently embedded into TIG SiNWFET flip-flop.
To evaluate the performance of our flip-flop design, we
consider fours major metrics: area, setup time, hold time, and
clock to Q delay. In this section, experimental results, obtained
Set=1
CLK=0
D=0
Q: 0=>1
Set=1
S
CLK=1
CLK
D=0
D
Q holds
Q
Fig. 5. TIG SiNWFET TSPC transient simulation for
asynchronous set
IV.
L OGIC G ATES E MBEDDED TIG S I NWFET TSPC
FLIP - FLOP
Reset=1
Reset=0
CLK=0
CLK=1
CLK
D=1
D=1
D
Q: 1=>0
Q holds
Q
Fig. 6. TIG SiNWFET TSPC transient simulation for
asynchronous reset
1662
R
VDD
S
CLK
A
S
CLK R
B
CLK
S
GND
S
X
VDD
A
B
Y
GND
R
S
S
GND
S
V.
Qb
VDD VDD
CLK
S
R
S
note that the timing performance gain is lowered because of
the use of the low-leakage controllability gate of TIG FETs.
S
Q
CLK
GND
Fig. 7. TIG SiNWFET AND-gate embedded TSPC schematic
CLK
Rises
CLK
Rises
CLK
Rises
CLK
Rises
A=0
A=1
A=0
A=1
B=0
Q: 1=>0
In this paper, TSPC flip-flop circuit designs, leveraging
the compactness offered by TIG SiNWFETs, are proposed.
Experimental results, obtained by electrical characterization,
show that TIG SiNWFET implementation improves area, delay
and leakage power by nearly 20%, 30% and 7% respectively
compared to CMOS design. In addition, we show that an AND
gate can be embedded into the structure. This leads to an
area reduction of 21%, a delay reduction of 6% and a leakage
reduction of 45% on average.
ACKNOWLEDGMENT
CLK
This work has been partially supported by the ERC senior
grant NanoSys ERC-2009-AdG-246810.
A
R EFERENCES
[1]
B=0
B=1
B=1
Q=0
Q=0
Q: 0=>1
B
Q
[2]
[3]
Fig. 8. TIG SiNWFET AND-gate embedded TSPC transient
simulation
[4]
[5]
TABLE II. Comparison between LSTP FinFET and TIG
SiNWFET TSPC flip-flops
[6]
LSTP
FinFET
Area(# of transistors)
17
Rise (ps)
-9.21
Setup Time
Fall (ps)
28.14
Average (ps)
9.47
Rise (ps)
19.65
Hold Time
Fall(ps)
16.60
Average (ps)
18.13
Rise (ps)
131.11
Clock To Q Delay
Fall (ps)
121.01
Average (ps)
126.06
Leakage Power(nW)
0.61
*SiNWFET area = 1.5 * # of transistors
Benchmark/TSPC AND
TIG
SiNWFET
13.5
-12.51
20.99
4.24
11.96
17.15
14.56
89.95
145.97
117.96
0.33
C ONCLUSION
Comparison
[7]
-20.59%
-35.83%
-25.41%
-55.20%
-39.13%
3.31%
-19.70%
-31.39%
20.63%
-6.43%
-45.53%
[8]
[9]
[10]
For instance, Fig. 7 shows a TIG SiNWFET flip-flop with
AND gate embedded in the first stage. In the TIG SiNWFET
implementation, the first stage is a clocked AND gate, derived
from [9]. The following stages remain unchanged as compared
to Fig. 4. Note that the clock signal in the first stage is wired to
the low-leakage controllability gate, leading to a larger power
efficiency as a trade-off in internal delay.
Electrical simulations, depicted in Fig. 8, validates the
AND/TSPC flip-flop functionality. Indeed, it shows that the
output Q equals to A AND B at each clock rising edge. Finally,
we compare the performance between CMOS design in Fig. 3
and TIG SiNWFET design in Fig. 7. Table II shows that the
area shrinks by 21%, while the delay reduces by 6% on average
and leakage power drops by 45%. Leakage gain is accounted
for power-efficiency of the TIG AND gate [9]. Nevertheless,
1663
[11]
[12]
[13]
[14]
C. Teh, T. Fujita, H. Hara, and M. Hamada, A 77% energy-saving
22-transistor single-phase-clocking D-flip-flop with adaptive-coupling
configuration in 40nm CMOS, IEEE International Solid-State Circuits
Conference (ISSCC), pp. 338-340, Feb. 2011.
J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated
Circuits, Second Edition, Prentice-Hall Publisher, 2002.
Y. Ji-ren and C. Svensson, New Single-Clock CMOS Latches and FlipFlops Improved Speed and Power Savings, IEEE Journal Solid-State
Circuits, Vol. 32, No. 1, pp. 62-69, Jan. 1997.
S. Nomura, et al., A 9.7mW AAC-Decoding, 620mW H.264 720p 60fps
Decoding, 8-Core Media Processor with Embedded Forward-BodyBiasing and Power-Gating Circuit in 65nm CMOS Technology, ISSCC
Dig. Tech. Papers, pp. 262-264, Feb. 2008.
Y. Lin et al., High-Performance Carbon Nanotube Field-Effect Transistor
with Tunable Polarities, IEEE Transaction Nanotechnology, Vol. 4, No.
5, pp. 481-489, 2005.
N. Harada et al., A polarity-controllable graphene inverter, Applied
Physics Letters, Vol. 96, No. 1, pp. 012102 - 012102-3, 2010.
M. De Marchi et al., Polarity Control in Double-Gate, Gate-All-Around
Vertically Stacked Silicon Nanowire FETs,
International Electron
Devices Meeting (IEDM), San Francisco, California, USA, pp. 841-844,
2012.
H. Iwai, Technology Roadmap for 22nm and Beyond, International
Workshop on Electron Devices and Semiconductor Technology, pp. 1-4,
2009.
J. Zhang, P.-E. Gaillardon, and G. De Micheli, Dual-Threshold-Voltage
Configurable Circuits with Three-Independent-Gate Silicon Nanowire
FETs, IEEE International Symposium on Circuits and Systems (ISCAS),
Beijing, China, pp. 2111-2114, 2013.
S. Bobba, M. De Marchi, Y. Leblebici and G. De Micheli, Physical Synthesis onto Sea-of-Tiles with Double-Gate Silicon Nanowire Transistors,
49th Design Automation Conference (DAC), San Francisco, California,
USA, pp. 42-47, 2012.
Cadence, Encounter Library Characterize User Guide, Product Version
8.1.3, September 2009.
Predictive technology model. [Online]. Available: http://ptm.asu.edu
P. Larson and C. Svensson, Impact of Clock Slope on True Single Phase
Clocked (TSPC) CMOS Circuits, IEEE Journal Solid-State Circuits, Vol.
29, No. 6, pp. 723-726, Jun. 1994.
N. Golinescu, A 2.5 Gb/s CMOS Add - Drop Multiplexer for ATM,
Master Thesis, University of Toronto, 1999.