International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 13 (2016) pp 7992-7996
© Research India Publications. http://www.ripublication.com
Design and Implementation of Low Power and Efficient Adders - A Review
G.V.Arunraj1 and R.Varatharajan2
Research Scholar, Department of ECE, Bharath University, Selaiyur, Chennai, Tamil Nadu, India.
2
Professor and Head Department of ECE, Sri Lakshmiammal College of Engineering, Tambaram East, Chennai (TN), India.
1
without the explosion of wires. Although looks attractive it
increases the logical depth. Han and Carlson [1987] give a
good overview of prefix addition formulations, and present
their own hybrid synthesis of the Ladner-Fischer and KoggeStone graphs. Again this trades an increase in logical depth for
a reduction in fan-out. Kowalczuk et al. [1991] achieved a
similar compromise by serializing the prefix computation
occurring at the higherfan-out nodes and Beaumont-Smith &
Burgess [1997] combine this idea with the Han-Carlson
scheme. Knowles [1999] showed that fan-out and wire flux
gains are available without increasing logical depth from the
minimum used in the Ladner-Fischer and Kogge-Stone
structures. There are many investigations [Feng Liu et al.,
2009; Matthew et al., 2004; Konstantinos Vitoroulis and Asim
J. Al-Khalili, 2007] available that compares between different
parallel tree adders and shows the advantage of these adders
in terms of area, fan-out and wire tracks. There are many carry
select adder approaches available but most of them use ripple
carry adders [Youngjoon Kim and Lee-Sup Kim, 2001] to
implement the adder. Behnam Amelifard et al. [2005]
suggested a new adder called carry select adder with sharing
(CSAS) which is area efficient but the delay is more. Alioto
et.al [2004] suggested using variable size block sizing
depending on the MUX delay. Some other papers [Kuldeep
Rawat, 2002; Yan Sun et al., 2008; Ramkumar and Harish M
Kittur, 2011] suggested using add one circuit to eliminate the
second adder required for the CSA with Cin=1 condition.
Feng Liu et al. [2009] compares different parallel prefix
adders in IBMs EAC adder, which is implemented using
Kogge Stone tree.
This paper briefly discusses various adders and relative
advantages.
Abstract
In the present VLSI circuit designs, there is enormous
increase in the power consumption due to the increasing speed
and complexity of the circuits. It is known that the demand for
portable equipment like laptops and cellular phones is
increasing rapidly. Attention has been focused on power
efficient circuit designs. Adders are the basic building blocks
of the complex arithmetic circuits and are widely used in
Central Processing Unit (CPU), Arithmetic Logic Unit (ALU),
and floating point units, for address generation in case of
cache or memory access and in digital signal processing. This
paper discusses various types of adders and their features.
Brief description has been provided on various multipliers.
Categorization of adders w.r.t delay time and capacity has
been exemplified. From the review, it is noted that
(i)
designers should see carefully on the accuracy of the
models keeping in view of consequences
(ii)
an accurate representation of how a microchip
dissipates power for good design and
(iii)
reliable and proper circuit activity for energy
dissipation.
INTRODUCTION
Design of high speed digital adders with efficient area and
power is one of the important areas of research in VLSI
system design. In digital adder circuits, the speed of addition
is limited by the time required for a carry to propagate through
the adder. The CSA (carry select adder) is used in many
arithmetic systems to solve the problem of carry propagation
delay by independently generating multiple carries and then
select a carry to generate the final sum [Bedrij, 1962;
Sklansky, 1960]. However, the CSA is not area efficient
because it uses multiple pairs of adders to generate partial sum
and carry by considering carry input Cin=0 and Cin=1, then
the final sum and carry are selected by the multiplexers.
Another way is to express it as a prefix addition. Several
examples of such adders have been published and there are
many efficient implementations. Kogge and Stone [1973]
scheme uses the idem potency property to limit the lateral
logical fan-out at each node to unity, but at the cost of a
dramatic increase in the number of lateral wire at each level.
Ladner and Fischer [1980] introduced the minimum depth
prefix graph. The longest lateral fanning wires go from a node
to n/2other nodes. Capacitive fan-out loads become large for
later levels in the graph as increasing logical fan-out combines
with increasing span of the wires. Buffering inverters are to be
added appropriately to support these large loads and there is a
corresponding increase in the delay. Brent & Kung [1982]
proposed the fan-out trees such that the lateral fan out of each
node is restricted to unity, as for the Kogge-Stone graph, but
THE ADDERS
Addition is the most common and often used arithmetic
operation on microprocessor, digital signal processor,
especially digital computers. Also, it serves as a building
block for synthesis all other arithmetic operations. Therefore,
regarding the efficient implementation of an arithmetic unit,
the binary adder structures become a very critical hardware
unit. With respect to asymptotic delay time and area
complexity, the binary adder architectures can be categorized
into four primary classes as given in Table 1[Partha Sarathi
Mohanty, 2009]. The given results in the table are the highest
exponent term of the exact formulas, very complex for the
high bit lengths of the operands. The first class consists of the
very slow ripple-carry adder with the smallest area. In the
second class, the carry-skip, carry-select adders with multiple
levels have small area requirements and shortened
computation times. From the third class, the carry-look ahead
7992
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 13 (2016) pp 7992-7996
© Research India Publications. http://www.ripublication.com
adder and from the fourth class, the parallel prefix adder
represents the fastest addition schemes with the largest area
complexities.
signals to avoid waiting for a ripple to determine if the first
group generates a carry or not. From the point of view of carry
propagation and the design of a carry network the actual
operand digits are not important. What matters is whether in a
given position a carry is generated, propagated or annihilated.
In the case of binary addition the generate, propagate and
annihilate signals [Parhami, 2000; Neil and Weste, 1998;
Raahemifar and Ahmadi, 1999] are characterized by the
following logic equations:
Table 1: categorization of adders w.r.t delay time and
capacity
In general, it is noted that there are five different types of
adders, namely, a pass logic based ripple adder, a mirror
ripple adder, a carry-skip adder, a carry-select adder, and a
Han-Carlson tree adder are available.
Thus assuming that the above signals are produced and made
available, the rest of the carry network design can be based on
them and become completely independent of the operands or
even the number representation radix. Using the preceding
signals, the carry recurrence can be written as follows
Pass Ripple adder
The well known adder architecture, ripple carry adder is
composed of cascaded full adders for n-bit adder, as shown in
figure.1.It is constructed by cascading full adder blocks in
series. The carry out of one stage is fed directly to the carry-in
of the next stage. For an n-bit parallel adder it requires n full
adders. This adder uses elements that are a combination of
NMOS pass logic and standard CMOS logic. This is the
simplest design in which the carry-out of one bit is simply
connected as the carrying to the next [Koc, 1996]. It can be
implemented as a combination circuit using n full-adder in
series as shown in Figure 1 and is called ripple-carry adder.
Ripple carry adder (i) is not very efficient when large number
bit numbers are used and (ii) Delay increases linearly with bit
length. Delay from Carry-in to Carry-out is more important
than from A to carry-out or carry-in to SUM, because the
carry-propagation chain will determine the latency of the
whole circuit for a Ripple-Carry adder.
Typical carry-look ahead adder is shown in Fig. 2.
Figure 2: Typical carry-look ahead adder [Parhami, 2000]
Carry Look Ahead Adder can produce carries faster due to
carry bits generated in parallel by an additional circuitry
whenever inputs change. This technique uses carry bypass
logic to speed up the carry propagation.
Carry Skip Adder
The carry skip adder [Parhami, 2000] was invented for
decimal arithmetic operations. The CSKA is an improvement
over the ripple-carry adder. By grouping the ripple cells
together into blocks, it makes the carry signal available to the
blocks further down the carry chain, earlier. The primary carry
Ci coming into a block can go out of it unchanged if and only
if, Xi and Yi are exclusive-or of each other. This means that
corresponding bits of both operands within a block should be
dissimilar.
If Xi = Yi = 1, then the block generates a carry without
waiting for the incoming carry signal. And the generated carry
will be used by blocks beyond this block in the carry chain. If
Figure 1: Ripple carry implementation of Carry Propagate
Adder [Parhami, 2000]
The latency of k-bit ripple-carry adder can be derived by
considering the worst-case signal propagation path.
Carry-look ahead Adder
The carry-look ahead adder [Weinberger and Smith, 1958]
computes group generate signals as well as group propagate
7993
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 13 (2016) pp 7992-7996
© Research India Publications. http://www.ripublication.com
Xi = Yi = 0, then the block does not generate a carry and will
absorb any carry coming into it by AND all pi of a block. The
skip signal is generated to select between the incoming carry
and the generated carry using a 2:1 multiplexer as shown in
Figure 3.
Features:
Because of multiplexers larger area is required.
Have a lesser delay than Ripple Carry Adders (half
delay of RCA).
Hence we always go for Carry Select Adder while
working with smaller no of bits.
THE MULTIPLIERS
Multiplication is one of the basic functions used in digital
signal processing (DSP). It requires more hardware resources
and processing time than addition and subtraction. In fact,
8.72% of all instructions in a typical processing unit is
multiplier. In computers, a typical central processing unit
devotes a considerable amount of processing time in
implementing
arithmetic
operations,
particularly
multiplication operations. Most high performance digital
signal processing systems rely on hardware multiplication to
achieve high data throughput. Multiplication is an important
fundamental arithmetic operation. Multiplication-based
operations such as Multiply and Accumulate (MAC) are
currently implemented in many Digital Signal Processing
(DSP) applications such as convolution, Fast Fourier
Transform (FFT), filtering and in microprocessors in its
arithmetic and logic unit. Since multiplication dominate the
execution time of most DSP algorithms, so there is a need of
high speed multiplier.
Multipliers are of several types [PARTHA SARATHI
MOHANTY, 2009].
Wallace Tree Multipliers
Booth’s Multiplier (radix-2)
Booth’s Multiplier (radix-4)
Analysis of Multipliers
Figure 3: Carry-skip logic multiplexers [Chan et al., 1991]
Carry Select Adder
The carry-select adder comes in the category of conditional
sum adder [Kantabutra, 1993]. Conditional sum adder works
on some condition this scheme; blocks of bits are added in
two ways: assuming an incoming carry of 0 or 1, with the
correct outputs selected later as the block's true carry-in
becomes known. With each level of selection, the number of
known output bits doubles, and leading to a logarithmic
number of levels and thus logarithmic time addition. Fig. 4
shows the basic building block of CSLA.
Figure 4: Basic Building Block of
http://en.wikipedia.org/wiki/Carry-select_adder )
CSLA
THE WALLACE TREE MULTIPLIER
The Wallace tree multiplier is considerably faster than a
simple array multiplier because its height is logarithmic in
word size, not linear. However, in addition to the large
number of adders required, the Wallace tree’s wiring is much
less regular and more complicated. As a result, Wallace trees
are often avoided by designers, while design complexity is a
concern to them. Wallace tree styles use a log-depth tree
network for reduction. Faster, but irregular, they trade ease of
layout for speed. Wallace tree styles are generally avoided for
low power applications, since excess of wiring is likely to
consume extra power. While subsequently faster than Carrysave structure for large bit multipliers, the Wallace tree
multiplier has the disadvantage of being very irregular, which
complicates the task of coming with an efficient layout. The
Wallace tree multiplier is a high speed multiplier. The
summing of the partial product bits in parallel using a tree of
carry-save adders became generally known as the “Wallace
Tree”. Three step processes are generally used to multiply
two numbers.
Formation of bit products.
Reduction of the bit product matrix into a two row
matrix by means of a carry save adder.
Summation of remaining two rows using a faster
Carry Look Ahead Adder (CLA).
(
In Carry select adder scheme, blocks of bits are added in two
ways: one assuming a carry-in of 0 and the other with a carryin of 1.This results in two precomputed sum and carry-out
signal pairs (s0i-1:k, c0i ; s1i-1:k, c1i), later as the block’s true
carry-in (ck) becomes known, the correct signal pairs are
selected. Generally multiplexers are used to propagate carries.
7994
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 13 (2016) pp 7992-7996
© Research India Publications. http://www.ripublication.com
THE BOOTH’S MULTIPLIER
Though Wallace Tree multipliers were faster than the
traditional Carry Save Method, it also was very irregular and
hence was complicated while drawing the Layouts. Slowly
when multiplier bits gets beyond 32-bits large numbers of
logic gates are required and hence also more interconnecting
wires which makes chip design large and slows down
operating speed Booth multiplier can be used in different
modes such as radix-2, radix-4, radix-8 etc.
SUMMARY AND CONCLUDING REMARKS
In digital adders, the speed of addition is limited by the time
required to propagate a carry through the adder. The sum for
each bit position in an elementary adder is generated
sequentially only after the previous bit position has been
summed and a carry propagated into the next position. The
circuit architecture is simple and area-efficient. However, the
computation speed is slow because each full-adder can only
start operation till the previous carry-out signal is ready. On
the other hand, Carry Look-ahead Adders (CSLAs) are the
fastest adders, but they are the worst from the area point of
view. Carry Select Adders have been considered as a
compromise solution between RCAs and CSLAs because they
offer a good trade-off between the compact area of RCAs and
the short delay of CSLAs. Reduced area and high speed data
path logic systems are the main areas of research in VLSI
system design. High speed addition and multiplication has
always been a fundamental requirement of high-performance
processors and systems. The sum for each bit position in an
elementary adder is generated sequentially only after the
previous bit position has been summed and a carry propagated
into the next position. There are many types of adder designs
available (Ripple Carry Adder, Carry Look Ahead Adder,
Carry Save Adder, Carry Skip Adder) which have its own
advantages and disadvantages. The major speed limitation in
any adder is in the production of carries and many authors
considered the addition problem. To solve the carry
propagation delay CSLA is developed which drastically
reduces the area and delay to a great extent. However, the
Regular CSLA is not area and speed efficient because it uses
multiple pairs of Ripple Carry Adders (RCA) to generate
partial sum and carry by considering carry input. The final
sum and carry are selected by the multiplexers (mux). Due to
the use of two independent RCA the area will increase which
leads an increase in delay. To overcome the above problem,
the basic idea of the proposed work is to use n-bit binary to
excess-1 code converters (BEC) to improve the speed of
addition. This logic can be replaced in RCA for Cin=1 to
further improves the speed and thus reduces the delay. Using
Binary to Excess -1 Converter (BEC) instead of RCA in the
regular CSLA will achieve lower area, delay which speeds up
the addition operation. The main advantage of this BEC logic
comes from the lesser number of logic gates than the Full
Adder (FA) structure because the number of gates used will
be decreased
Regarding the circuit area complexity in the adder
architectures, the ripple-carry adder (RCA) in the
first class is the most efficient one, but the carry
select adder (CSLA) in the fourth class with highest
complexity is the least efficient one.
Considering the circuit delay time, Carry Select
Adder (CSLA) is the fastest one for every n-bit
length, so has the shortest delay. Otherwise, Ripple
Carry Adder (RCA) is the slowest one, due to the
long carry propagation.
To make these choices designers require information
about the consequences of their decisions, and circuit
designers use models of varying accuracy and speed
to help them forsee the consequences of their
choices. An accurate understanding of the causes and
BOOTH MULTIPLICATION ALGORITHM (radix – 4)
One of the solutions realizing high speed multipliers is to
enhance parallelism which helps in decreasing the number of
subsequent calculation stages. The Original version of
Booth’s multiplier (Radix – 2) had two drawbacks.
The number of add / subtract operations became
variable and hence became inconvenient while
designing Parallel multipliers.
The Algorithm becomes inefficient when there are
isolated 1s.
These problems are overcome by using Radix 4 Booth’s
Algorithm which can scan strings of three bits with the
algorithm given below. The design of Booth’s multiplier in
this project consists of four Modified Booth Encoded (MBE),
four sign extension corrector, four partial product generators
(comprises of 5:1 multiplexer) and finally a Wallace Tree
Adder. This Booth multiplier technique is to increase speed by
reducing the number of partial products by half. Since an 8-bit
booth multiplier is used in this project, so there are only four
partial products that need to be added instead of eight partial
products generated using conventional multiplier.
The multiplier operand Y is often recoded into a radix higher
than 2 in order to reduce the number of partial products. The
most common recoding is radix – 4 recoding with digit set {2, -1, 0, 1, 2}. For a series of consecutive 1’s, the recoding
algorithm converts them 0’s surrounded by a 1 and a (-1),
which has potential of reducing switching activity. At the
binary level, there are many design possibilities. The
traditional design objectives are small delay and small area.
The power issues of different designs have not been addressed
well. In this chapter, we focus on the effects of radix-4
recoding schemes in multipliers and optimize their designs for
low power. Here, we give an overview and analysis of several
known recoding schemes and their designs and compare the
Timing and power consumption by them. Intuitively, radix-4
multipliers could consume less power than their radix- 2
counterparts as recoding reduce the number of PPs to half.
However, the extra recoding logic and the more complex PP
generation logic may present significant overheads. In
addition, recoding introduces extra unbalanced signal
propagation paths because of the additional delay on the paths
from operand Y to the product output. We have showed that
Wallace tree multipliers consumed less power than Boothrecoded radix-4 multipliers although the radix-2 scheme had
twice as many PPs as the radix-4 scheme. This leads us to
believe that the design of recoders and PP generators plays an
important role in the overall power consumption in
multipliers.
7995
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 13 (2016) pp 7992-7996
© Research India Publications. http://www.ripublication.com
severity of the power dissipation within a chip might
influence a designer to make certain tradeoffs
between speed and power, but a less accurate
understanding might lead to tradeoffs that are worse
for overall performance. Clearly an accurate
representation of how a microchip dissipates power
is instrumental to good design. Circuit activity is
generally one of the most important factors in energy
dissipation.
[16]
[17]
[18]
[19]
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
O. J. Bedrij, “Carry-select adder,” IRE Trans.
Electron. Computer, 1962, pp.340–344.
J. Sklansky, “Conditional-Sum Addition Logic” IRE.
Transactions on Electronic Computers, 1960, vol.
EC-9, pp. 226-231.
P.M. Kogge, H.S. Stone; “A Parallel Algorithm for
the Efficient Solution of a General Class of
Recurrence Equations”, IEEE Trans., Aug. 1973,C22(8):786-793.
R.E. Ladner, M.J. Fischer; “Parallel Prefix
Computation”, JACM, Oct. 1980,27(4):831-838.
R.P. Brent, H.T. Kung; “A Regular Layout for
Parallel Adders”, IEEE Trans, C-31(3):260-264,
March 1982.
T. Han, D.A. Carlson; “Fast Area-Efficient VLSI
Adders”, 8th IEEE Symp. Computer Arithmetic,
Como Italy, May 87, pp. 49-56,
J. Kowalczuk, S. Tudor, D. Mlynek; “A New
Architecture for Automatic Generation of Fast
Pipelined Adders”, ESSCIRC,Milano Italy, Sept 91,
pp. 101-104.
A. Beaumont-Smith, N. Burgess; “A GaAs 32-bit
Adder”, 13th Symp. Computer Arithmetic, hilomar
California, June 1997, pp. 10-17.
S. Knowles, “A family of adders”, Proceedings of
the 14th IEEE Symposium on Computer Arithmetic,
April 14-16, 1999, adelaide, Australia.
Feng Liu et.al “A Comparative Study of Parallel
Prefix Adders in FPGA Implementation of EAC”,
Proceedings of the 12thEuromicro conference on
digital system design,2009.
Matthew M. Ziegler and Mircea R. Stan “A Unified
Design Space for Regular Parallel Prefix Adders”,
Proceedings of the Design, Automation and Test in
Europe Conference and Exhibition, 2004.
Konstantinos Vitoroulis and Asim J. Al-Khalili,
“Performance of Parallel Prefix Adders implemented
with FPGA technology”, IEEE 2007.
Youngjoon Kim and Lee-Sup Kim, “A Low Power
Carry Select Adder with Reduced Area”,2001 IEEE.
Behnam Amelifard et.al “Closing the Gap between
Carry Select Adder and Ripple Carry Adder”,
Proceedings of the Sixth International Symposium on
Quality Electronic Design (ISQED’05),2005.
M. Alioto et.al, “A Gate Level Strategy to Design
Carry Select Adders”, ISCAS 2004.
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
7996
Kuldeep Rawat et al, “ A Low Power and Reduced
Area Carry Select Adder”, IEEE 2002.
Yan Sun et al., “High-Performance Carry Select
Adder Using Fast All-one Finding Logic”, Second
Asia International Conference on Modelling &
Simulation, IEEE 2008.
B. Ramkumar and Harish M Kittur, “Low-Power and
Area-Efficient Carry Select Adder”, IEEE Trans. on
very large scale integration systems, 2011.
Feng Liu et.al “A Comparative Study of Parallel
Prefix Adders in FPGA Implementation of EAC”,
12th Euromicro Conference on Digital System
Design / Architectures, Methods and Tools, IEEE
2009.
Partha
Sarathi
Mohanty,
"Design
and
Implementation of Faster And Low Power
Multipliers", B.E. Thesis, National Institute of
Technology Rourkela, 2009
Koc, C.K., “RSA Hardware Implementation”, RSA
Laboratories, RSA Data Security, Inc.,1996.
B. Parhami, “Computer Arithmetic, Algorithm and
Hardware Design”, Oxford University Press, New
York, pp.73-137, 2000.
A. Weinberger; J.L. Smith, “A Logic for High-Speed
Addition”, National Bureau of Standards, Circulation
591, pp. 3-12, 1958.
Neil. H. E. Weste, “Principle of CMOS VLSI
Design”, Adison-Wesley, 1998.
Raahemifar, K. and Ahmadi, M., “Fast carry look
ahead adder”, IEEE Canadian Conference on
Electrical and Computer Engineering, 1999.
Chan, P. K., Schlag, M. D. F., Thomborson, C. D.
and Oklobdzija, V. G.,“Delay Optimization of Carryskip Adders and Block Carry-lookahead Adders”,
proceedings, IEEE Symposium on Computer
Arithmetic, 1991.
Kantabutra,V., “Designing Optimum One-Level
Carry-Skip Adders”, IEEE Transactions on
Computers, pp.759-764, June, 1993.
http://en.wikipedia.org/wiki/Carry-select_adder
© Copyright 2026 Paperzz