1EEE TRANSACTIONS 1580 Counting Responders in CAXTON C. FOSTER an AND ON COMPUTERS, DECEMBER o-.*. ZERO Associative Memory FRED D. STOCKTON II Abstract-A method of determining the number of responders to a search in an associative memory is presented. It is shown that less than one full adder per memory cell is required and that the maximum delay in establishing the count is proportional to n, where n is the log to the base 3 of the number of memory cells. Index Terms-Associative memory, carry showers, content-addressable distributed logic, multiple responses, number of responders, response counting, search techniques. memory, Consider the following problem. We wish to design a circuit that has N input lines and m output lines. Each input line may present either a ZERO or a ONE to the circuit. We wish to have the pattern of ZERO'S and ONE'S on the m output lines be a weighted binary representation of the number of input lines that are in the ONE state. We wish to design this circuit using only "full adders" as elementary building blocks, and we further wish to minimize both the number of elements used and the propagation delay between input and output. Such a circuit could be used to count "responders" to a search in a content-addressable memory. One possible way to design a circuit might be to construct a counter and a scanner. The scanner examines the input lines one after another and each time a ONE is detected, it adds one to the counter. We will reject this scheme on the grounds that: 1) it does not use full adders; and 2) more important, the time required to establish the count is linearly proportional to the number of inputs N. We wish something much less sensitive to N, perhaps proportional to the log of N. At best we would like to be completely independent of N, but we have not been able to achieve this. EARLY ATTEMPTS Falkoff [1] has suggested that the circuits used to determine whether there are any responders at all (SOME-NONE network) be expanded to provide three possible output states: NONE, ONE, and MANY. Fig. 1 shows how this might be extended to provide NONE, ONE, TWO, * * *, N outputs, where N is the total number of elements in the array. Unfortunately, the number of contacts or gates in the network increases as the square of N. For any reasonable size machine then this type of counting network soon becomes larger than the rest of the machine put together. Kaplan [2] has designed an associative memory that allows the programmer to find the approximate number of responders via an analog summation and an A to D counter. He also provides an exact count by scanning the response store, one element at a time. This latter method we must reject since it will require a time that is dependent on the size of the array. 1971 f 0 I I I I R E -4--, ~ ~ ~ I %I _ 0 ~-ONE I 0 FU I I I I I Fig. 1. Full counting tree for four elements constructed of relays. TABLE I Number of Input Lines That Are ONE Output Sum Carry 0 0 1 1 0 1 2 3 0 1 0 1 FULL ADDER Full adders which we will use as our elementary building blocks are well known. For the sake of completeness we present a brief description of one. A full adder has three input lines that are mutually interchangeable and two output lines called the "sum" and the "carry" line that are distinct. Table I shows the behavior of the element. Note that the output of the full adder is a weighted binary representation of the number of ONE'S in the input, so a single full adder will serve as our circuit for N= 3. We will assume for our purposes that the full adder has a unit delay between input and output. LOWER BOUND ON THE PROPAGATION DELAY Let us consider first the establishment of the least significant bit of the output using only full adders. Suppose for a moment that N is a power of 3. The sum output of a full adder will be ONE if there was an odd number of ONE'S input and ZERO otherwise. Let N= 9 and construct the circuit of Fig. 2. The signals presented to the right-hand full adder will be ONE or ZERO, depending on oddness of their input triplets. Since we must add up an odd number of "odd numbers" to get an odd number, the sum output of the fourth adder will be ONE only if 1, 3, 5, 7, or 9 of the original inputs were ONE. But the least significant bit of the sum will be ONE if there was an odd number of inputs. So the final sum output of such a pyramid is indeed the least significant bit of the counter. At the Kth level of a pyramid (working toward the left) we can handle 3k inputs. At one unit delay per level, the Manuscript received September 26, 1969; revised April 27, 1971. least significant bit of the count of N input lines can be C. C. Foster is with the Department of Computer Science, Univerestablished in L unit delays where sity of Massachusetts, Amherst, Mass. 01002. F. D. Stockton is with the Department of Civil Engineering, University of Massachusetts, Amherst, Mass. 01002. L = [log3N (1) SHORT NOTES 1581 31 3 M = then a lower bound output will be on [log2N], (2) the time required to establish the T==L+M-2= [log3Nl+[log2N]-2. (3) The last two bits of M are developed at the same time by the last full adder and the least significant bit by the original pyramid. This accounts for the -2 in (3). METHOD OF CARRY SHOWERS An earlier study [4] derived a circuit whose delay was proportional to the square of M. In this section, we present a design that gives a delay which is less than 1.2*T [T given by (3) ] over the range investigated (0 < N< 235). Consider once again the pyramid that is used to generate the least significant bit of the output. Suppose that N is not a power of 3. Then at some level in the pyramid we will have one or two outputs from the previous level leftover. We pass these leftover signals along to the next level and combine 4+3 5 1+2 3+1 1+1 1+1 2+1 I where [xl represents the smallest integer not less than x. The carry output of a full adder is ONE if two or three inputs were ONE. If there were exactly two inputs equal to ONE, then this carry output represents that "pair." By the construction of the pyramid any extra input "leftover" after generating a pair will be passed along to the next level as a ONE on the sum output of the element. Eventually these extras will be combined in their turn into possible pairs. The second least significant bit of the output m of this circuit should be ZERO if there was an even number of pairs in the original input and ONE if there was an odd number. Thus if we "count up" the carry outputs from the pyramid we constructed for the least significant bit we will know whether or not the second bit is to be ZERO or ONE. Let the collection of elements to do this counting be called the "second bit counter." The value arrived at for the output of the second bit counter must include a consideration of the carry output of the rightmost full adder of the least significant bit pyramid. But this carry output is not available until after L delay times. Since it will take at least one element to incorporate this carry, the second bit of the output can be established no earlier than after L+ 1 unit delays. If there are M bits in the output, where 1 1 I Fig. 2. Simple adder pyramid with 9 inputs. / A3 10 I t/ '2+1 (j) k(L) 2+1 Fig. 3. Carry shower for N= 31. Dotted arrows indicate no processing. them there if possible; if not, at the next level, or the next. Eventually we will be down to three, two, or one signal remaining. If there is only one, we are through. If there are three, we need only one more full adder and we are through. If there are two signals, we can either use a half adder or else we can use a full adder and zero out the superfluous input line. Now consider the second bit counter. After the first level of the least significant bit pyramid we will have some carry signals to work with. We shall begin to combine these as soon as they appear. These will in turn generate some carries of their own which we will process as early as possible, etc. The reason for doing this is easy to understand. If we wait for the last carry from one bit to be established before we start work on the next bit, we will be cascading pyramids, each with its consequent delay. This was the method used by Favor [4]. But if we begin processing as early as possible we will have, with any luck, reduced substantially the number of signals left to be combined for bit i by the time we have finished processing bit i- 1. In Fig. 3 we show the process for N= 31. Fig. 4 shows the circuit to realize this operation. Consider the way in which Fig. 3 is constructed: 31 divided by 3 gives 10 and 1 left over. We put down 10 to the left and adding the 10 and the 1 write down 11 under the 31. This means that we will pass from the rightmost column on the first level: 1) ten full adder sum signals and one unprocessed signal to the rightmost column, and 2) ten full adder carry signals to the column next to the left. On the second level, 11 divided by 3 gives 3 and 2 over. We write down 3 to the left and 3+2, or 5, under the 11. We can proceed in this fashion down the first column until we generate a 1 in that column. Now examine the second column. The 10 carry outputs we obtained from processing the original 31 inputs can be put into 3 full adders with 1 signal left over. So we write down 3 to the left (representing the IEEE TRANSACTIONS ON 1582 COMPUTERS, DECEMBER 1971 0 in 2 :.! a 4J CL .4. =1 4v) C> :3 QL _ / ----------- 3 4 .- I If Fig. 4. Carry shower circuit for N= 31 constructed of full adders. Solid lines-signals contributing to even-numbered output bits; dotted linessignals contributing to odd-numbered output bits. TABLE II carries) and 4 under the 10. The 4 we just wrote down must COMPARISON OF THEORETICAL MINIMUM NUMBER OF DELAYS T be combined with the 3 carries from the previous level of the WITH ACTUAL NUMBER OF DELAYS A FOR INCREASING 2 column to the right. This gives 7, which in turn produces NUMBER OF INPUTS N, GIVEN BY 2n- 1 carries and 3 signals to the next level in this column. The A T n T A procedure continues in this fashion until all signals have n been combined. Sometimes only one or two signals are avail35 1 29 19 0 0 able in a given column (see column 4) for several levels. We 36 31 20 1 1 2 37 33 21 3 3 will postpone combining these until we can utilize a full 3 39 34 22 5 5 4 adder or until it is obvious that no more signals will appear 41 36 23 7 7 5 in this column because all columns to its right have termi43 38 24 9 8 6 45 39 25 10 10 nated. To estimate the goodness of our lower bound, we 7 47 26 41 12 12 8 wrote a computer program which proceeded in the manner 49 43 27 14 13 9 of Fig. 3 to calculate the actual delays that would be found 51 44 28 16 15 10 53 46 18 29 16 in a carry shower counter of the type described here. These 11 55 47 20 30 18 12 delays were calculated for memory sizes of 2n- 1 with n 57 49 31 23 20 13 ranging from 2 to 35. The values so obtained are shown in 59 51 25 32 21 14 61 52 33 27 23 Table II together with the theoretical minimum delay calcu15 63 54 34 25 29 16 lated from (3). As may be seen, the actual values fall quite 65 56 35 31 26 17 close to the predicted minimum, never exceeding it by more 33 28 18 than 20 percent. The values do not behave smoothly with increasing n, although of course they do increase monotonically. This must be attributed to the natural "lumpiness" full adders to process the rightmost column. This is easily of integers. proved by induction. It is obviously true for N= 3 where one full adder does the job. If it is true for some N (odd), then MAxIMuM NUMBER OF ELEMENTS by combining the single output generated for those N input To determine the number of full adders required to realize lines with two more input lines in one additional adder, it a carry shower counter we present the following argument. will be true for N+2. This completes the proof. Let there be N input lines and M output lines, where M is Suppose that N is even.- We have shown that (4) holds for N odd. Let N= 4)+ 1, where 4 is odd. Then the 4 signals can given by (2). In the rightmost column we wish to combine N lines and be combined in A (4-1)/2, and one additional adder (with get one output (the least significant bit of the count). Each one of the three inputs set always to zero) will suffice to comfull adder accepts three lines and produces one sum output, bine the output of the first 4 signals with the one extra. This a net decrease of two in the number of signals to be pro- will require a total of A+ 1 adders: cessed. Suppose N is odd. Then we need N-2 N (5) - ~=-* x = A + 1= 1 + 2 2 A - (N 1)/2 (4) - 1583 SHORT NOTES Thus it never takes more than N/2 adders to get the first bit of the count, and since each generates one carry signal, there will be at most N/2 on the second level, N/4 on the third, etc. Summing these up we have + N/2m < N N/2 + N/4 + (6) a total of N full adders required to count the number of ONE'S present in the input. SUMMARY We have presented a scheme for determining how many of a set of N signal lines are in the ONE state. Such a circuit would be most useful for counting responders in a contentaddressable memory. The propagation delay of this circuit is quite close to the theoretical minimum which is also derived. Approximately one full adder per input line is required. Several interesting avenues remain unexplored. For what values of N do we utilize completely every full adder? Can one establish an upper bound on the delay of a carry shower counter? Is it possible or rather profitable to reuse the elements from early levels in processing at higher levels? Could elements other than full adders be employed in such circuits? Such questions may be examined at a later date. ACKNOWLEDGMENT As reported in [4], K. E. Batcher of Goodyear Aerospace Corporation contributed substantially to the original development of this method of counting responders in a contentaddressable memory. It is only his modesty which keeps his name from being among the authors. REFERENCES [1] A. D. Falkoff, "Algorithms for parallel search memories," JAss. Comput. Mach., vol. 9, pp. 488-551, Oct. 1962. [2] A. Kaplan, "A search memory subsystem for a general purpose computer," in 1963 Fall Joint Comput. Conf, AFIPS Conf. Proc., vol. 24. Baltimore, Md.: Spartan, 1963, pp. 193-200. [3] C. C. Foster, "Parallel execution of iterative algorithms," Ph.D. dissertation, University of Michigan, Ann Arbor, 1965. [4] J. N. Favor, "A method of obtaining the exact count of responses using full and half adders," Goodyear Aerospace Corp., Akron, Ohio, AP-111770, 1964. Dynamic Resolution of Memory Access Conflicts J. M. DANIEL, MEMBER, IEEE, AND J. D. IRWIN, SENIOR MEMBER, IEEE Abstract-This note presents a method for designing and implementing dynamic priority assignment memory-access conflict-resolution circuit for use in multiprocessing systems with K processors. A decision rule which establishes the priority, assignment is given and an example which illustrates the techniques is also discussed. a Index Terms-Conflict resolution, dynamic priority assignment, access, memory memory protection, multiprocessing. Manuscript received November 23, 1970. J. M. Daniel is with the U. S. Navy, RVAW-120, Norfolk, Va. J. D. Irwin is with the Department of Electrical Engineering, Auburn University, Auburn, Ala. 36830. INTRODUCTION In multiprocessing systems, all processors share a common main memory. Hence memory protection techniques must be employed as a system constraint which prevents processor interference. Memory protection can be implemented via hardware, software, or a combination of the two; however, most software solutions appear to require an inordinate amount of processor time. Although, in general, memory protection can be logically subdivided into the areas of common data protection, private data protection, and memory-access conflict resolution, only a dynamic priority arrangement for the latter will be considered here. DYNAMIC PRIORITY ASSIGNMENT It will be assumed here that in systems where memoryaccess conflicts exist, no one processor has priority over another and any memory module is addressable by any processor on a random basis. The probability of conflict when K processors are accessing N modules of memory can be obtained via the equation N! Pconf Iict = I - NK(N - K) ! (1) which is developed by Mosier [1] for a slightly different configuration. Since a memory module can serve only one processor per memory cycle, it is necessary that each module have the ability to grant access to a specific processor and deny it to all others during every cycle. This arbitration among processors by the storage unit is the memory-access conflict problem. If there is no fixed priority assignment and access is on a random basis, dynamic selection methods for granting access to processing units can be used. There are several circuits proposed'by Wood [2] that will resolve memory-access conflicts on a dynamic basis for the special case of a cross-point cell processor-memory connection. Gordon [3] has also developed a dynamic priority assignment circuit. Although Gordon's circuit has a number of interesting features, its primary disadvantage is that it is not applicable to systems with more than two processing units. DESIGN PROCEDURE The following fundamental assumptions provide the framework for the design procedure. 1) All processors access each memory module on a random basis. 2) The circuit to be designed will control access to only one memory module. 3) Access, whenever possible, will be granted on the basis of what processor had access last. 4) The system is asynchronous and flip-flop multivibrators (or equivalent devices) will be used as memory elements. The circuit will generate a start memory cycle signal at the same time access is granted to a requesting processor. This signal provides synchronous operation by allowing safe-race conditions and output clocking.
© Copyright 2026 Paperzz