Counting Responders in an Associative Memory

1EEE TRANSACTIONS
1580
Counting Responders in
CAXTON C. FOSTER
an
AND
ON COMPUTERS, DECEMBER
o-.*. ZERO
Associative Memory
FRED D. STOCKTON
II
Abstract-A method of determining the number of responders to a
search in an associative memory is presented. It is shown that less than
one full adder per memory cell is required and that the maximum delay
in establishing the count is proportional to n, where n is the log to the
base 3 of the number of memory cells.
Index Terms-Associative memory, carry showers, content-addressable
distributed logic, multiple responses, number of responders, response counting, search techniques.
memory,
Consider the following problem. We wish to design a
circuit that has N input lines and m output lines. Each input
line may present either a ZERO or a ONE to the circuit. We
wish to have the pattern of ZERO'S and ONE'S on the m output
lines be a weighted binary representation of the number of
input lines that are in the ONE state.
We wish to design this circuit using only "full adders" as
elementary building blocks, and we further wish to minimize
both the number of elements used and the propagation delay
between input and output.
Such a circuit could be used to count "responders" to a
search in a content-addressable memory.
One possible way to design a circuit might be to construct
a counter and a scanner. The scanner examines the input
lines one after another and each time a ONE is detected, it
adds one to the counter. We will reject this scheme on the
grounds that: 1) it does not use full adders; and 2) more important, the time required to establish the count is linearly
proportional to the number of inputs N. We wish something
much less sensitive to N, perhaps proportional to the log of
N. At best we would like to be completely independent of N,
but we have not been able to achieve this.
EARLY ATTEMPTS
Falkoff [1] has suggested that the circuits used to determine whether there are any responders at all (SOME-NONE
network) be expanded to provide three possible output
states: NONE, ONE, and MANY. Fig. 1 shows how this might
be extended to provide NONE, ONE, TWO, * * *, N outputs,
where N is the total number of elements in the array. Unfortunately, the number of contacts or gates in the network
increases as the square of N. For any reasonable size machine
then this type of counting network soon becomes larger than
the rest of the machine put together.
Kaplan [2] has designed an associative memory that
allows the programmer to find the approximate number of
responders via an analog summation and an A to D counter.
He also provides an exact count by scanning the response
store, one element at a time. This latter method we must
reject since it will require a time that is dependent on the
size of the array.
1971
f
0
I
I
I
I
R E
-4--,
~ ~ ~ I %I
_
0
~-ONE
I 0
FU
I
I
I
I
I
Fig. 1. Full counting tree for four elements constructed of relays.
TABLE I
Number of Input Lines
That Are ONE
Output
Sum
Carry
0
0
1
1
0
1
2
3
0
1
0
1
FULL ADDER
Full adders which we will use as our elementary building
blocks are well known. For the sake of completeness we
present a brief description of one. A full adder has three
input lines that are mutually interchangeable and two output
lines called the "sum" and the "carry" line that are distinct. Table I shows the behavior of the element.
Note that the output of the full adder is a weighted binary
representation of the number of ONE'S in the input, so a single
full adder will serve as our circuit for N= 3. We will assume
for our purposes that the full adder has a unit delay between
input and output.
LOWER BOUND ON THE PROPAGATION DELAY
Let us consider first the establishment of the least significant bit of the output using only full adders. Suppose for a
moment that N is a power of 3. The sum output of a full
adder will be ONE if there was an odd number of ONE'S input
and ZERO otherwise. Let N= 9 and construct the circuit of
Fig. 2. The signals presented to the right-hand full adder
will be ONE or ZERO, depending on oddness of their input
triplets. Since we must add up an odd number of "odd numbers" to get an odd number, the sum output of the fourth
adder will be ONE only if 1, 3, 5, 7, or 9 of the original inputs
were ONE. But the least significant bit of the sum will be ONE
if there was an odd number of inputs. So the final sum output of such a pyramid is indeed the least significant bit of
the counter.
At the Kth level of a pyramid (working toward the left)
we can handle 3k inputs. At one unit delay per level, the
Manuscript received September 26, 1969; revised April 27, 1971. least significant bit of the count of N input lines can be
C. C. Foster is with the Department of Computer Science, Univerestablished in L unit delays where
sity of Massachusetts, Amherst, Mass. 01002.
F. D. Stockton is with the Department of Civil Engineering, University of Massachusetts, Amherst, Mass. 01002.
L
=
[log3N
(1)
SHORT NOTES
1581
31
3
M =
then
a lower bound
output will be
on
[log2N],
(2)
the time required to establish the
T==L+M-2= [log3Nl+[log2N]-2.
(3)
The last two bits of M are developed at the same time by the
last full adder and the least significant bit by the original
pyramid. This accounts for the -2 in (3).
METHOD OF CARRY SHOWERS
An earlier study [4] derived a circuit whose delay was
proportional to the square of M. In this section, we present
a design that gives a delay which is less than 1.2*T [T given
by (3) ] over the range investigated (0 < N< 235).
Consider once again the pyramid that is used to generate
the least significant bit of the output. Suppose that N is not
a power of 3. Then at some level in the pyramid we will have
one or two outputs from the previous level leftover. We pass
these leftover signals along to the next level and combine
4+3
5
1+2
3+1
1+1
1+1
2+1
I
where [xl represents the smallest integer not less than x.
The carry output of a full adder is ONE if two or three
inputs were ONE. If there were exactly two inputs equal to
ONE, then this carry output represents that "pair." By the
construction of the pyramid any extra input "leftover" after
generating a pair will be passed along to the next level as a
ONE on the sum output of the element. Eventually these
extras will be combined in their turn into possible pairs.
The second least significant bit of the output m of this
circuit should be ZERO if there was an even number of pairs
in the original input and ONE if there was an odd number.
Thus if we "count up" the carry outputs from the pyramid
we constructed for the least significant bit we will know
whether or not the second bit is to be ZERO or ONE. Let the
collection of elements to do this counting be called the
"second bit counter." The value arrived at for the output of
the second bit counter must include a consideration of the
carry output of the rightmost full adder of the least significant bit pyramid. But this carry output is not available until
after L delay times. Since it will take at least one element to
incorporate this carry, the second bit of the output can be
established no earlier than after L+ 1 unit delays. If there are
M bits in the output, where
1
1
I
Fig. 2. Simple adder pyramid with 9 inputs.
/ A3
10
I
t/
'2+1
(j)
k(L)
2+1
Fig. 3. Carry shower for N= 31. Dotted arrows indicate no processing.
them there if possible; if not, at the next level, or the next.
Eventually we will be down to three, two, or one signal remaining. If there is only one, we are through. If there are
three, we need only one more full adder and we are through.
If there are two signals, we can either use a half adder or
else we can use a full adder and zero out the superfluous
input line.
Now consider the second bit counter. After the first level
of the least significant bit pyramid we will have some carry
signals to work with. We shall begin to combine these as
soon as they appear. These will in turn generate some carries
of their own which we will process as early as possible, etc.
The reason for doing this is easy to understand. If we wait
for the last carry from one bit to be established before we
start work on the next bit, we will be cascading pyramids,
each with its consequent delay. This was the method used by
Favor [4]. But if we begin processing as early as possible we
will have, with any luck, reduced substantially the number
of signals left to be combined for bit i by the time we have
finished processing bit i- 1. In Fig. 3 we show the process
for N= 31. Fig. 4 shows the circuit to realize this operation.
Consider the way in which Fig. 3 is constructed: 31 divided
by 3 gives 10 and 1 left over. We put down 10 to the left and
adding the 10 and the 1 write down 11 under the 31. This
means that we will pass from the rightmost column on the
first level: 1) ten full adder sum signals and one unprocessed
signal to the rightmost column, and 2) ten full adder carry
signals to the column next to the left.
On the second level, 11 divided by 3 gives 3 and 2 over.
We write down 3 to the left and 3+2, or 5, under the 11. We
can proceed in this fashion down the first column until we
generate a 1 in that column. Now examine the second column. The 10 carry outputs we obtained from processing the
original 31 inputs can be put into 3 full adders with 1 signal
left over. So we write down 3 to the left (representing the
IEEE TRANSACTIONS ON
1582
COMPUTERS,
DECEMBER
1971
0
in
2
:.!
a
4J
CL
.4.
=1
4v)
C>
:3
QL
_
/
-----------
3
4
.-
I
If
Fig. 4. Carry shower circuit for N= 31 constructed of full adders. Solid lines-signals contributing to even-numbered output bits; dotted linessignals contributing to odd-numbered output bits.
TABLE II
carries) and 4 under the 10. The 4 we just wrote down must
COMPARISON OF THEORETICAL MINIMUM NUMBER OF DELAYS T
be combined with the 3 carries from the previous level of the
WITH ACTUAL NUMBER OF DELAYS A FOR INCREASING
2
column to the right. This gives 7, which in turn produces
NUMBER OF INPUTS N, GIVEN BY 2n- 1
carries and 3 signals to the next level in this column. The
A
T
n
T
A
procedure continues in this fashion until all signals have
n
been combined. Sometimes only one or two signals are avail35
1
29
19
0
0
able in a given column (see column 4) for several levels. We
36
31
20
1
1
2
37
33
21
3
3
will postpone combining these until we can utilize a full
3
39
34
22
5
5
4
adder or until it is obvious that no more signals will appear
41
36
23
7
7
5
in this column because all columns to its right have termi43
38
24
9
8
6
45
39
25
10
10
nated. To estimate the goodness of our lower bound, we
7
47
26
41
12
12
8
wrote a computer program which proceeded in the manner
49
43
27
14
13
9
of Fig. 3 to calculate the actual delays that would be found
51
44
28
16
15
10
53
46
18
29
16
in a carry shower counter of the type described here. These
11
55
47
20
30
18
12
delays were calculated for memory sizes of 2n- 1 with n
57
49
31
23
20
13
ranging from 2 to 35. The values so obtained are shown in
59
51
25
32
21
14
61
52
33
27
23
Table II together with the theoretical minimum delay calcu15
63
54
34
25
29
16
lated from (3). As may be seen, the actual values fall quite
65
56
35
31
26
17
close to the predicted minimum, never exceeding it by more
33
28
18
than 20 percent. The values do not behave smoothly with
increasing n, although of course they do increase monotonically. This must be attributed to the natural "lumpiness" full adders to process the rightmost column. This is easily
of integers.
proved by induction. It is obviously true for N= 3 where one
full adder does the job. If it is true for some N (odd), then
MAxIMuM NUMBER OF ELEMENTS
by combining the single output generated for those N input
To determine the number of full adders required to realize lines with two more input lines in one additional adder, it
a carry shower counter we present the following argument. will be true for N+2. This completes the proof.
Let there be N input lines and M output lines, where M is
Suppose that N is even.- We have shown that (4) holds for
N odd. Let N= 4)+ 1, where 4 is odd. Then the 4 signals can
given by (2).
In the rightmost column we wish to combine N lines and be combined in A (4-1)/2, and one additional adder (with
get one output (the least significant bit of the count). Each one of the three inputs set always to zero) will suffice to comfull adder accepts three lines and produces one sum output, bine the output of the first 4 signals with the one extra. This
a net decrease of two in the number of signals to be pro- will require a total of A+ 1 adders:
cessed. Suppose N is odd. Then we need
N-2 N
(5)
- ~=-*
x = A + 1= 1 +
2
2
A - (N 1)/2
(4)
-
1583
SHORT NOTES
Thus it never takes more than N/2 adders to get the first
bit of the count, and since each generates one carry signal,
there will be at most N/2 on the second level, N/4 on the
third, etc. Summing these up we have
+ N/2m < N
N/2 + N/4 +
(6)
a total of N full adders required to count the number of
ONE'S present in the input.
SUMMARY
We have presented a scheme for determining how many
of a set of N signal lines are in the ONE state. Such a circuit
would be most useful for counting responders in a contentaddressable memory. The propagation delay of this circuit
is quite close to the theoretical minimum which is also
derived. Approximately one full adder per input line is
required.
Several interesting avenues remain unexplored. For what
values of N do we utilize completely every full adder? Can
one establish an upper bound on the delay of a carry shower
counter? Is it possible or rather profitable to reuse the elements from early levels in processing at higher levels? Could
elements other than full adders be employed in such circuits?
Such questions may be examined at a later date.
ACKNOWLEDGMENT
As reported in [4], K. E. Batcher of Goodyear Aerospace
Corporation contributed substantially to the original development of this method of counting responders in a contentaddressable memory. It is only his modesty which keeps his
name from being among the authors.
REFERENCES
[1] A. D. Falkoff, "Algorithms for parallel search memories," JAss. Comput. Mach., vol. 9, pp. 488-551, Oct. 1962.
[2] A. Kaplan, "A search memory subsystem for a general purpose
computer," in 1963 Fall Joint Comput. Conf, AFIPS Conf. Proc.,
vol. 24. Baltimore, Md.: Spartan, 1963, pp. 193-200.
[3] C. C. Foster, "Parallel execution of iterative algorithms," Ph.D.
dissertation, University of Michigan, Ann Arbor, 1965.
[4] J. N. Favor, "A method of obtaining the exact count of responses
using full and half adders," Goodyear Aerospace Corp., Akron,
Ohio, AP-111770, 1964.
Dynamic Resolution of Memory Access Conflicts
J. M. DANIEL,
MEMBER, IEEE, AND
J. D. IRWIN,
SENIOR MEMBER, IEEE
Abstract-This note presents a method for designing and implementing
dynamic priority assignment memory-access conflict-resolution circuit for
use in multiprocessing systems with K processors. A decision rule which
establishes the priority, assignment is given and an example which illustrates the techniques is also discussed.
a
Index Terms-Conflict resolution, dynamic priority assignment,
access, memory
memory
protection, multiprocessing.
Manuscript received November 23, 1970.
J. M. Daniel is with the U. S. Navy, RVAW-120, Norfolk, Va.
J. D. Irwin is with the Department of Electrical Engineering, Auburn
University, Auburn, Ala. 36830.
INTRODUCTION
In multiprocessing systems, all processors share a common main memory. Hence memory protection techniques
must be employed as a system constraint which prevents
processor interference. Memory protection can be implemented via hardware, software, or a combination of the
two; however, most software solutions appear to require an
inordinate amount of processor time. Although, in general,
memory protection can be logically subdivided into the
areas of common data protection, private data protection,
and memory-access conflict resolution, only a dynamic
priority arrangement for the latter will be considered here.
DYNAMIC PRIORITY ASSIGNMENT
It will be assumed here that in systems where memoryaccess conflicts exist, no one processor has priority over
another and any memory module is addressable by any processor on a random basis. The probability of conflict when
K processors are accessing N modules of memory can be
obtained via the equation
N!
Pconf Iict = I - NK(N - K) !
(1)
which is developed by Mosier [1] for a slightly different
configuration.
Since a memory module can serve only one processor per
memory cycle, it is necessary that each module have the
ability to grant access to a specific processor and deny it to
all others during every cycle. This arbitration among processors by the storage unit is the memory-access conflict
problem. If there is no fixed priority assignment and access
is on a random basis, dynamic selection methods for granting access to processing units can be used.
There are several circuits proposed'by Wood [2] that will
resolve memory-access conflicts on a dynamic basis for the
special case of a cross-point cell processor-memory connection. Gordon [3] has also developed a dynamic priority
assignment circuit. Although Gordon's circuit has a number
of interesting features, its primary disadvantage is that it is
not applicable to systems with more than two processing
units.
DESIGN PROCEDURE
The following fundamental assumptions provide the
framework for the design procedure.
1) All processors access each memory module on a random basis.
2) The circuit to be designed will control access to only
one memory module.
3) Access, whenever possible, will be granted on the basis
of what processor had access last.
4) The system is asynchronous and flip-flop multivibrators
(or equivalent devices) will be used as memory elements. The
circuit will generate a start memory cycle signal at the same
time access is granted to a requesting processor. This signal
provides synchronous operation by allowing safe-race conditions and output clocking.