IEEE TRANSACTIONS ON COMPUTERS, VOL. C-22, NO. 11, NOVEMBER 1973
1021
Parallel Counters
EARL E. SWARTZLANDER, JR.
Abstract-Multiple-input circuits that count the number of their inputs that are in a given state (normally logic ONE) are called parallel
counters. In this paper three separate types of counters are described,
analyzed, and compared. The first counter consists of a network of
full adders. The second counter uses a combination of full adders and
fast adders (that may be realized with READ-ONLY memories), while the
third type of counter uses quasi-digital (i.e., analog current summing)
techniques to generate an analog signal proportional to the count which
is then digitized.
The delay and complexity of each of the three counters is derived
and compared. The full-adder counter is slower than the full-adder/
fast-adder counter, while the quasi-digital counter appears to be potentially faster than either of the strictly digital counters (although it is
more complex and is prone to problems due to drift, etc.).
Index Tenns-Associative processors, carry-shower counters, computer arithmetic units, digital counters, full-adder counters, multipliers,
parallel counters, quasi-digital processing, READ-ONLY memory fast
adders, response counters.
INTRODUCTION
A PARALLEL counter is a device that determines how
many of its inputs are active (e.g., in the logic ONE state).
Alternatively, a parallel counter may be viewed as a multipleword input adder with 1-bit word length. Counters are useful
in the realization of parallel multipliers [1]-[3], computer
arithmetic units [4], multiple-input adders [5], [6], and
associative processors [7]. In this paper three forms of
counters are presented and examined. The:full-adder counter
that has been studied elsewhere [7] is reviewed and an upper
bound is developed for its delay. A modification of the fulladder counter that uses READ-ONLY memories as fast adders is
developed that may exhibit nearly twice the speed of the fulladder counter. Finally a quasi-digital (i.e., partially analog)
counter is described that is potentially faster than either of
the strictly digital counters.
Many methods may be used to implement parallel counters.
Some of the more obvious techniques are: two-level gate network; READ-ONLY memory; full-adder network; full-adder/
fast-adder array; and quasi-digital processor.
To implement an N-input counter by using a two-level gate
network requires 2N _ 1 logic AND gates (each with N inputs)
followed by a reduction network that requires M logic OR
gates (each with on the order of 2Nf-l inputs), where M is the
number of outputs from the counter:
M= 1 + [log2
(N)J
(1)
where [Xj denotes the largest integer i such that i < X (it is
Manuscript received July 12, 1972; revised May 21, 1973.
The author was with Hughes Aircraft Company, Culver City, Calif.
90230. He is now with Technology Service Corporation, Santa Monica,
Calif. 90404.
equivalent to the functions ENTIER (X), FLOOR (X), and [XI
used by other authors). Clearly then the two-level gate network is impractical for most values of N.
The READ-ONLY memory approach requires a memory with
2N words of M bits each to implement an N-input counter.
This total storage requirement ofM - 2N bits is also impractical
for most values of N.
The full-adder network, the full-adder/fast-adder array, and
the quasi-digital processor all appear to be of practical interest
and will be analyzed and compared in terms of speed and
simplicity in this paper.
FULL-ADDER NETWORK COUNTERS
Recently Foster and Stockton [7] have described a method
for implementing counters with a network of full adders.
Basically their design procedure involves grouping the counter
inputs into sets of three lines each. A full adder reduces each
set of three lines into a line with weight 1 (the sum output of
the adder) and a line with weight 2. This results in approximately N/3 lines of weight 1 and N/3 lines of weight 2 (the
count is equal to the sum of the products of the weights times
the number of active lines of each weight). Each of these sets
of lines is separated into groups of three and the reduction
continued until only one line of each weight remains. Analysis
of this method yields a lower bound on the number of adder
delay times (the sum and carry outputs of a full adder are each
assumed to exhibit the same delay), 6, required to determine
the count, for an N-input counter:
6 > [log3 (N- 1)] + [10g2
(N)].
(2)
This represents a slight improvement on the lower bound given
by Foster and Stockton [7, eq. (3)]. It is obtained by noting
that [7, eq. (2)] should be M = [log2 (N + 1)1, which gives
T > [log3 (N)1 + [log2 (N + 1)] - 2 (equivalent to (2) as
stated in this paper.
As shown in [7], at most N full adders are required to implement an N-input counter.
An alternative method for counter realization with a network of full adders that yields an upper bound to the delay
involves synthesizing a large counter with two smaller counters
and a ripple-carry adder. Given two counters each with k inputs, a counter with 2k + 1 inputs is obtained by the use of a
1 + [log2 (k)] stage adder as shown in Fig. 1.
If the outputs of two 2k + 1 input counters constructed in
this manner are combined with a 2 + [log2 (k)] stage adder, a
4k + 3 input counter results that is constructed from four kinput counters and three ripple-carry adders. Clearly this procedure may be repeated as many times as necessary to realize
counters of arbitrary size.
Starting with full adders (i.e., three-input counters) the
1022
IEEE TRANSACTIONS ON COMPUTERS, NOVEMBER 1973
IN PUTS
OUTPUTS
2 k +1I
_ 8
N PUTS
_I
_
N
PUt
4
A
COUNTER
2
Fig. 1. Synthesis of a large counter from two smaller counters and
an adder.
Fig. 2. Synthesis of a 15-input counter with a full-adder network.
counter will completely utilize each adder only if it has j outputs and N inputs (where j is an integer):
N=2'- 1.
(3)
Counters of sizes corresponding to noninteger values of j in (3)
are implemented with an incomplete counter of the next larger
size [i.e., j = M from (1)].
The delay of this type of counter is easily found since each
ripple-carry adder adds two-adder delay times to the total delay time for the counter. This may be seen by referring to
Fig. 2, which shows a 15-output counter constructed of three
levels of adders. The outputs of adders FA1 and FA2 are
available after a single-adder delay,. the output of FA6 depends
on the sum signals from FA1 and FA2 and is available after
two-adder delays, FAs depends on the carry outputs from
FA1, FA2, and FA6 (the latter is available after two-adder
delays, while the other two are available after a single-adder
delay), thus it is available after three-adder delays. Clearly at
each level of the tree two additional delays are incurred-one
because of the extra level of adders and one because the
"count" is 1 bit larger thus necessitating another adder (and
its delay). Since the delay of a three-input counter is oneadder delay, an N-input counter has a delay bounded by
8 S 2 [log2 (N)] - 1.
(4)
The number of adders required to implement counters for
integer values of j in (3) may be found by noting that each
adder has three inputs and two outputs; thus it reduces the
number of signal lines by one. Since the counter has N inputs
and j outputs
Nadd = N - j
(5)
where Nadd is the number of full adders required to effect the
implementation. Counters of sizes corresponding to noninteger values of j in (3) may use some half adders and thus be
slightly more complex but still will not require more than
N adder modules (i.e., full and half adders).
This section has presented a new method for analyzing
counters implemented with arrays of adders that leads to
upper and lower bounds on the counter delay:
[log3 (N - 1)1
+ [log2 (N)] < 6 S< 2 [log2 (N)] - 1.
(6)
Attention is now directed to a new way to increase the speed
of parallel counters.
FULL-ADDER/FAST-ADDER ARRAY COUNTERS
Since each level of ripple-carry adders in the previous
counter adds two-adder delays to the total counter delay, it is
desirable to reduce the adder delay times. This may be accomplished by using fast adders that are realized with READ-ONLY
memories in place of the ripple-carry adders of the previous
section. If the access delay of the READ-ONLY memory
"adder" is comparable to the delay of a full adder, the counter
will be nearly twice as fast.
The design procedure is similar to that which was used for
the full-adder network counter with the exception that fast
adders are used. The first step involves combining all but
2M-2 - 1 of the inputs with full adders. Then the full adders are
paired off and a two-stage fast adder is used to combine one of
the remaining inputs and two of the full-adder outputs into a
3-bit result. This procedure is repeated until a single M - 1
stage fast adder generates the M-bit count. This may be illustrated by considering Fig. 2: FAs and FA6 are replaced by a
2-bit fast adder, FA7 and FA8 are replaced by a 2-bit fast
adder, and FA9, FA10, and FAI1 are replaced by a 3-bit fast
adder. The delay of the resulting 15-input counter will be
'5count = '5FA + 26ROM
where 5count is the delay of the counter, '5FA is the delay of a
full adder, and 'ROM is the delay of the READ-ONLY memories used as adders. It is assumed that the delay of the
READ-ONLY memories is independent of the size of the
memory, which leads to the following result for counters of
other sizes:
'5count
5FA
(7)
For the case where 5ROM = '5FA, (7) reduces to 'count =
5FA [log2 (N)], which is roughly half the value for the comparable full-adder counter as given by (6).
+ (M- 2) '5ROM -
1023
SWARTZLANDER: PARALLEL COUNTERS
Since a 22K+1 by K + 1 bit READ-ONLY memory is required
to implement a fast adder for two K-bit numbers, this method
is only useful for relatively small adders. The total number of
bits of READ-ONLY memory required to implement an N-input
counter is i:
f3=
1
K=2
(K+l)22K+1 2M-K-1
where the third term gives the required number of memories
of each size. Thus
M -1
0= E (K+ 1)2m+K
K=2
(8)
Evaluation of (8) shows this method to be impractical for
M > 6 (i.e., counters with more than 63 inputs). Certainty for
counters with under 32 inputs it is a practical method to
implement a high-speed counter.
QUASI-DIGITAL COUNTERS
In the preceding sections fully digital techniques for the
realization of digital counters have been presented. An alternative implementation that uses analog current summing to
generate a voltage proportional to the count appears worthy
of consideration. The use of quasi-digital techniques is not
new (e.g., see [1] and [4]), but until recently the speeds attained have been slower than for comparable digital realizations.
Additional information on this form of counter and details of
the derivation have appeared elsewhere [8].
A circuit for a seven-input quasi-digital counter is shown in
Fig. 3. The numbers in parentheses indicate how many of the
inputs must be in the logic ONE state to generate a logic ONE
on each line. The resistor network to which the inputs are
applied generates a voltage at its node that is proportional to
the fraction of inputs that are logic ONE'S. Assuming that a
Fig. 3. Seven-input quasi-digital counter.
Combining (10) and (11) and solving for Tsettiing, the delay
incurred by the resistor network
Tsettling~
-Ry Cstray log (1/2N - et)
N
(13)
Since the stray capacitance is primarily due to the comparator
inputs, it is roughly proportional to the number of counter inlogic ZERO is at ground and a logic ONE is at Vref
puts. It is convenient to assume a relation of the form
Vn = Vref n/N
(9)
Cstray NCin
where n is the number of logic ONE inputs and N is the total where Cin is the capacitance of a comparator input terminal.
number of inputs. Note that the time-varying component of
Substituting this relationship into (13)
Vn has been ignored. The error in Vn due to stray capacitance
(14)
Tsettling -Ry Cin log (1/2N - et).
is
-
'
'
en = Vref exp {-tN/RyCstray}
(10)
where t is the time since the inputs were switched, and Cstray
is the total stray capacitance at the summing node.
The comparaters respond correctly when
en + et < 112N
(
where et is the total accumulated error, which is given by
e=AR x
+AR + Voffset
V~~ref/
(12)
where the first two terms are the tolerance of the divider resistors and input resistors, respectively, and Voffset is the comparator error band (if the differential inputs of each comparator
are separated by at least Voffset, the correct comparator output will be generated).
Equation (14) is graphed on Fig. 4 to demonstrate the interrelationship between the settling time, the number of inputs,
and the total accumulated error. Note that as the number of
inputs increases, so does the delay, although the increase can
be mitigated somewhat -by reducing the total error (i.e., using
more accurate resistors or comparators with a smaller error
band).
The total delay of this counter is the sum of the settling
time from (14), the comparator delay, and two levels of logic
delay as is evident from Fig. 3.
Thus the total delay is
(15)
Tcount = Tsettling + Tcomp + 2Tgate.
If the particular logic used to implement this circuit permits
wired OR operation, only one level of logic delay is required.
Using emitter-coupled logic facilitates implementation of the
1 024
IEEE TRANSACTIONS ON COMPUTERS, NOVEMBER 1973
6
ns-
~
~
~
~
~
z
-j
tn
0AE
10
In
20If%
I^
30
An%
40
4n
50
COUNTER INPUTS
Fig. 4. Settling time versus counter size for quasi-digital counters
(assuming: Ry *C = 10-9 s).
comparator function, allows wired OR operation, and permits
realization of speeds on the order of
Tcount
Tsettling + 5 ns.
It appears that counters of moderate size (i.e., 10 < N < 50)
will be faster with this implementation than with either of the
fully digital methods described above. Clearly larger counters
may be built using quasi-digital counters for the first stage and
ripple-carry adders for succeeding stages in exactly the same
manner as described above.
The disadvantage with this
method is that as with all analog or partly analog networks,
resistors may have to be trimmed to meet designed tolerances,
stray capacitance and inductance must be minimized, and
noise suppression may be difficult.
=
CONCLUSIONS
In this brief paper certain techniques for the realization of
parallel counters have been presented and analyzed. An upper
bound for the delay of counters built with networks of full
adders has been derived. This answers one of the questions
that was raised by Foster and Stockton [7]. A variation of
the full-adder network counter that uses full adders and fast
adders is developed which exhibits up to twice the speed of
the former scheme. A quasi-digital (i.e., hybrid) approach is
also described that appears to be potentially much faster than
either of the strictly digital approaches, although it is somewhat more complex.
REFERENCES
[1] L. Dadda, "Some schemes for parallel multipliers," Alta Freq.,
vol. 34, pp. 349-356, May 1965.
[2] D. Ferrari and R. Stefanelli, "Some new schemes for parallel
multipliers," Alta Freq., vol. 38, pp. 843-852, Nov. 1969.
[3] A. Habibi and P. A. Wintz, "Fast multipliers," IEEE Trans. Comput. (Short Notes), vol. C-19, pp. 153-157, Feb. 1970.
[4] R. H. S. Riordan and R. R. A. Morton, "The use of analog techniques in binary arithmetic units," IEEE Trans. Electron. Comput.,
vol. EC-14, pp. 29-35, Feb. 1965.
[5] A. Svoboda, "Adder with distributed control," IEEE Trans. Comput., vol. C-19, pp. 749-751, Aug. 1970.
[6] I. T. Ho and T. C. Chen, "Multiple addition by residue threshold
functions," in Dig. 6th Annu. Comput. Soc. Conf. (COMPCON72), San Francisco, Calif., 1972, pp. 283-286.
[7] C. C. Foster and F. D. Stockton, "Counting responders in an
associative memory," IEEE Trans. Comput. (Short Notes), vol.
C-20, pp. 1580-1583, Dec. 1971.
[8] E. E. Swartzlander, Jr., "The quasi serial multiplier," IEEE Trans.
Comput., vol. C-22, pp. 317-321, Apr. 1973.
Earl E. Swartzlander, Jr. (S'64-M'72), for a photograph and biography,
see page 321 of the April 1973 issue of this TRANSACTIONS.
© Copyright 2026 Paperzz