594 IEEE A TRANSACTIONS ON COMPUTERS, VOL. c-28, NO. 9, SEPTEMBER 1979 Programmable Logic Approach for SUHAS S. PATIL AND TERRY A. WELCH, MEMBER, Abstract-This paper explores the use of a proposed programmable storage/logic array (SLA) chip as a general purpose universal logic element for digital computers. The SLA is compared to other programmable logic arrays in implementation and utilization, showing how it permits construction of complete digital subsystems on one chip without sacrifice in programmability. When compared with other contending very large-scale integrated technology (VLSI) approaches, such as microprogrammed processors and gate arrays, the SLA offers an attractive combination of cost, performance, and ease of implementation. Index Terms-Asynchronous circuits, digital integrated circuit design, digital systems design, logic arrays, programmable logic circuits, very large-scale integrated (VLSI) technology. INTRODUCTION T HE ADVENT of very large scale integrated (VLSI) circuit technology illustrates the continuing trend toward increased gate counts on logic chips. Due to the complexity and cost of designing such chips, a structured form of logic implementation is desirable. A programmable logic structure would permit use of a single chip type for a variety of functions within a computer, giving improved production volume to overcome the adverse unit costs of special-purpose VLSI chips. The programmable logic array (PLA) promises a possible solution to many of these problems, but existing PLA implementations may not be effective in achieving high logic density on a chip. A universal VLSI chip programmed in a PLA-like form is very desirable if it can implement entire subsystems on a single chip. This paper explores this possibility of using a proposed programmable storage/logic array (SLA) chip for general purpose logic in computer systems. The SLA is a form of PLA which contains flip-flops distributed throughout the array. We have explored the use of the SLA as a general purpose logic component, and conclude that the SLA is flexible and efficient for both data manipulation and sequential control. Furthermore, its ROM-like programming mechanism promises fast turnaround in design and implementation of a wide range of logic structures. This paper explains the structure of the SLA, describes its implementation, discusses its use in some typical computer logic, and compares it with other logic structures used for processor implementation. Implementation evidence and design examples are presented to support the assertion that OUTPUTS *-@ 0*0 i : T CONJUNCTION TERMS CHOICE OF: 1, 0. NO CONNECTION CHOICE OF: 1, NO CONNECTION AND ARRAY OR ARRAY Fig. 1. Conventional PLA. FEEDBACK | | - Manuscript received February 26, 1979. The work on storage/logic arrays was supported in part by the National Science Foundation under Grants DCR 74-2182 and MCS 78-04853 S. S. Patil was employed by Sperry Univac, Sudbury, MA at the time this work was carried out. S. S. Patil is with the Department of Computer Science, University of Utah, Salt Lake City, UT 84112. T. A. Welch is with Sperry Research Centre, Sudbury, MA 01776. IEEE INPUTS Zn: VLSI FLIP-FLOPS | e | CONJUNCTION 0 11 [11 [ 'TIRMS TI II1 1 AND ARRAY 1 [ I 1 7 OR ARRAY Fig. 2. PLA with feedback. the SLA is a viable candidate for becoming a practical general-purpose VLSI part. STRUCTURE OF THE SLA The traditional PLA is a combinational circuit whichproduces multiple Boolean outputs from its inputs. It differs from an ROM only in that implicants rather than the minterms of Boolean functions are stored, so that it is not necessary to have a word of storage for every input combination. A PLA is usually implemented in the form of two arrays (see Fig. 1): An AND array which forms selected conjunction terms (implicants) based on input data, and an OR array which combines conjunction terms to form the correct outputs. The array is programmed by selecting (via makebreak connections) whether a conjunction line is gated by a 1, 0, or neither on each input, and whether or not an output line responds to a conjunction line. In most computer designs, purely combinational PLA's are difficult to use extensively because of pin limitations. In response to these pin limitations, some PLA's have added flip-flops on the chip, providing internal feedback from the outputs back to the inputs in the style of classical statemachines [4], [5] (see Fig. 2). While these PLA's can be used in a much wider range of applications, they potentially suffer from inefficient utilization of the chip area within the AND and OR arrays because only a small fraction ofavailable logic elements on the chip are actually used in typical designs. The SLA is a variation on previous PLA's in that the AND 0018-9340/79/0900-0594$00,.75 (© 1979 IEEE 595 PATIL AND WELCH: LOGIC APPROACH FOR VLSI 3_ =. CHOICE OF: 1.0 NO CONNECTION X2 :. _ OI . 1 zI 1 CHOICE OF: SET, RESET, NO CONNECTION Fig. 3. Storage/logic array (SLA). Fig. 4. Associative logic array. array and OR array are folded together so that input lines and output lines are alternated within a single array [1] (see Fig. 3). This has two important effects: 1) substantially more flip-flops can be added without the need for excess inputoutput routing space and 2) rows of the array (conjunction signals) can be subdivided into multiple independent segments which can represent independent variables over smaller portions of the array. Extending these two concepts further, more flip-flops can be added at intervals along the columns of the array, so that the columns can also be subdivided into segments carrying independent variables with localized access. As a result of these changes the SLA is used in quite a different manner than is the PLA. Portions of the array can be used for independent tasks, such as, for example, using the upper right-hand corner to build an adder while using the left-hand columns for sequencing control and the lower rows for a register structure. This approach permits a much denser packing of logic into an array and permits execution of more complex functions on a 3) Columns may be segmented into independent column segments at points provided every Y rows (typical Y = 8). Storage cells (flip-flops) are provided every n Y rows (typical nY = 16). Every column segment must contain at least one storage cell. 4) Rows may be segmented at breakpoints provided every X columns (typical X = 4). Various columns and/or rows are connected to the input and output pins of the chip with suitable interface drivers; this arrangement can vary from design to design and so will not be described in detail here. AN SLA CIRCUIT An SLA can be implemented either as a clocked structure, in which the successive actions of the SLA are timed by a master clock, or as an asynchronous structure where the successive actions are determined by the temporal constraints within the SLA. Both forms of SLA are of practical importance depending on the applications and the technolsingle chip. ogy of implementation. The asynchronous form of SLA is The concept of row segmentation was previously described in this paper because it has more interesting developed by Greer [3]. His associative logic array contains timing considerations. A circuit for the SLA is shown in Fig. 5. It is illustrated internal feedback (see Fig. 4) but lacks explicit flip-flops (local feedback) and column segmentation. Thus, the con- here in simple transistor form using positive logic notation cepts developed by Greer to achieve higher effective logical for ease of understanding, but the concepts are applicable in density are extended in the SLA to further improve array standard semiconductor technologies with little change. Row-column connections are made by transistors whose utilization efficiency. Viewed as a logical element, the SLA has these collectors are selectively connected in a wired-NOR structure. characteristics: Storage cells consist of cross-coupled NAND gates with 1) Each logical column can either act as a stored binary complemented inputs S and R. A logical column consists of variable, or performs the logical OR function of the row the four wires S, R, Q, and Q. A column can be divided into inputs. When the column segment represents a stored segments by breaking all four wires at the breakpoints provided at regular intervals along the column. Segments variable, the values can be set and reset by the rows. 2) Each row is an implicant or conjunction term over the which contain two or more flip-flops operate the flip-flops selected column variables; a column input may either be the in parallel for electrical load sharing purposes, so that column value, its complement or there may be no connec- switching times are not strongly affected by the length of tions from the column to that row. These variables are column segments. The circuit in the storage cell contains breakpoints which ANDed to form the row value. 596 IEEE TRANSACTIONS ON COMPUTERS, VOL. STORAGEl CELL NAN 1979 C'OLUMN BREAK POINTS X __ SEPTEMBER VCc Vcc F c-28, NO. 9, ROW WIRE RI S1 O GO LOGIC - 01 01 S2 x BREAK rl ~~~~~~~~~~~~~~~~~POINT I 02 02 R2 CIRCUIT - do X =PROGRAMMABLE BREAK POINT Fig. 5. Basic cells of an asynchronous SLA. permit its use for several functions other than the flip-flop described. The feedback loop can be broken, so that the outputs are simple combinational functions of the inputs. Also, the input signals can be selectively inverted; the normal flip-flop configuration bypasses the input inverters and utilizes the feedback paths. The normal "split-column" configuration breaks both feedback paths and uses both input inverters, so that the Q line becomes the positive OR function of the inputs to the S line, and Q line becomes the positive OR function of the inputs to the R line. Other configurations are possible and useful but are not discussed in this paper. Rows are single wires which may be divided into smaller segments at breakpoints provided at regular intervals along their length. At each breakpoint a pull-up resistor is provided to feed current into the row wire; long row segments will have several such circuits active in parallel. Since the rows are activated by an NOR function of various Q and Q signals, they are used with the compliments of the selected input variables to produce the positive AND function. Likewise, the 3 and R signals are NOR functions of selected row signals, so the Set and Reset actions are seen as OR functions of row variables. Thus, the SLA imnplements the same AND-OR functions implemented in conventional PLA's, so the SLA is able to perform all PLA type functions in conjunction with its storage abilities. The asynchronous form of SLA provides apparently conventional implementation ofasynchronous logic, but the fact that all elements are on the same chip permits use of timing assumptions which are not normally reasonable with asynchronous logic built from discrete elements. Specifically, the uniformity of active device implementation and the uniformity in column loading permit the assumption that the variance in delay times between the fastest and slowest column is small relative to the average delay time of the columns. Under this assumption, a row in the SLA is able to alter the state of several column flip-flops simultaneously without the problems of race conditions. This I II- 51 rl i1 r2 Y1 Y2 OS IR 1 1 PROGRAM A 01 I lo / l I1 I, Q2 Q1- Q1 AND 02 MUST BE LOW SIMULTANEOUSLY TO ACTIVATE r2 Fig. 6. SLA timing example. property of the SLA is illustrated with the help of the timing diagram of Fig. 6, in which row rl is turned ON when input x is a 1 and columns Yl and Y2 are in states 0 and 1, respectively. In the ON state, row rl drives the set input of Yl and the reset input of Y2 so that the state variables Yl, Y2 switch from 0, 1 to 1, 0. Row r2 implements a conjunction term that responds to the state Yl, Y2 = 1, 1 (when Ql = Q2 = 0). In ordinary asynchronous circuits a simultaneous change in Yl and Y2 could have created a race condition, but here a race is avoided because Ql and Ql first Ql to become 1, 1 and one more gate delay for Ql, Ql to reach the final value 1, 0. If the variation in this signal propagation delay between Yl and Y2 is less than one gate delay, Ql and Q2 will never be both in the state 0 at the same time, and row r2 will never get a chance to become active. In terms of technology, the SLA as described uses modified ROM circuitry. An SLA of 256 rows and 48 columns with column breakpoints every 8 rows and row breakpoints every 4 columns can be implemented with an LSI circuit of the complexity of a 65 K ROM. In terms of speed, the basic cycle time (AND-OR-Set/Reset) of an SLA is expected to be of the same order as the basic cycle time of a PLA using internal flip-flops for feedback. THE SLA NOTATION The notational conventions used with the SLA are illustrated in Fig. 7(a). The logic is coded in a matrix, with each column representing the inputs and outputs of an intemal variable (flip-flop). An entry of 1 or 0 in a rowcolumn intersection indicates that the row is activated only 597 PATIL AND WELCH: LOGIC APPROACH FOR VLSI INPUT OUTPUT x = 1 1 I4 I I0SIi ROW SEGMENTATION + R _ COLUMN SEGMENTATION Ilv II7 I y SPLIT COLUMNS (a) NOTATION ILLUSTRATION S So K A3 A2Al A0 01 (a) Gi,v_ll RESET lb) CLOCK w x y z 4 TO 1 MULTIPLEXER CLK D Of 1 1 Ic) o STATE ASSIGNMENT: A= 00 B = 01 C = 10 OUTPUT: z = 1 IN STATE C Z 1- = R 0 1 0 0 i Ra a s) R) NEGATIVE EDGE TRIGGERED D FLIP-fLOP 0+0 IR o t R R -t-t+-t-t-t-R+ ot R 1R iR 1 OR 0+0+ 1 1R 1 I' +0+0+ tR 0 C-_B 1R (b) Fig. 7. SLA examples. Fig. 8. A state machine. if that column takes on the specified value. An entry of S or R indicates a set or reset action imparted to the flip-flop when the row is activated. The combinations IR and OS serve to toggle the flip-flop under the specified conditions. The combinations OR and IS are legal but not of any particular value. A row is activated when all of its 1 and 0 inputs are satisfied, at which time all of its specified S and R actions are performed. Row and column segments are demarcated by brackets or parentheses, so that a segment of a row (or column) between a pair of brackets is independent of all other portions of that row (column). Columns which have been split into two independent OR functions are demarcated by finely dashed lines dividing the columns. The OR function uses a + to indicate column activation, rather than S or R, and uses a 1 where the function output is utilized. These conventions are illustrated with examples in Fig. 7(b) and (c). Fig 7(b) shows, a four-to-one multiplexer and Fig. 7(c) shows a negative edge triggered D flip-flop implemented on an SLA. In the multiplexer example, columns 1 and 2 are used to decode the selector bits, columns 3 and 4 carry the four-input signals, and column five carries the output signal. In Fig. 7(c), the latched variable a is introduced to detect the negative edge in the clock signal C. When C is high a is set; then when C goes low a is reset as part of the next action and it then blocks any succeeding transitions in that clock cycle. Second, more practically, the components of a standard SLA LOGIC EXAMPLES Several examples are given here to illustrate the generality and characteristics of SLA programming. Two demonstrations of generality are presented. First, an algorithm for construction of an arbitrary finite-state machine is shown. processor structure are developed, namely mechanisms for data transfer and arithmetic. Finite State Machine: A simple Moore type synchronous sequential circuit, Fig. 8(a), is implemented by the SLA program of Fig. 8(b). This illustrates the basic structure in which state variables, input variables, and output variables are assigned one column each. Each state-input combination, which causes a state change or output, is assigned one row. That is, each row recognizes its appointed combination of present state and input values, and when activated it adjusts the state variables to the correct next state and signals the proper outputs. For example, the row marked A B is activated when Si = SO = 0 (which is the coding of state A), X = 0 as the input condition, and when Clock = a = 1. This row causes a transition to state B by setting SO to 1, and produces 1 on the Z output. This form of state machine design is justified on the SLA, despite the use of asynchronous latches, because of the timing assumptions discussed previously. The assumed uniformity in gate delays and the use of two wires Q and Q (double rail logic) to represent column variables in the SLA eliminates the difficult timing problems which complicate traditional asynchronous logic design. Critical races (or skew) in multiple state variable changes resulting from a single row activation are not a problem, as shown in the timing sequences of Fig. 6. The SLA, therefore, is programmed in almost the same style as a synchronous sequential-machine, with arbitrary state transitions allowed and with no extra clock signals necessary to resolve essential hazards. This provides a convenient programming style despite the use of inexpensive latch type flip-flops. The difference here is that the designer 598 IEEE TRANSACTIONS ON A -A+Z Z2 Z3 1 PARTIAL SUM G3 A3 I S 10 Os 1 1 OS t-- G2 Gl 1R I OS 1 1 R 10 1 ' _ _ 1R + + S tr + > - - __ t S I S t 0 1 I 1 1 0 * 1 - I i I 1 +I+I I" 0 0 1 R R R PHASE 2 PHAE 2 RESET ak l B-A 1 1 O 1 1 Zi+ll i+l Ai l 1 CLEAR A AR Z Z-A BIT SLICE i + 1 l. Sn O 1 0 1 1 10 R 1R S S 1R 1R 1R S 2R + | -- CARRY PROPAGATION FROM BIT 1 1Rll O R 10R lR Os 10R + II CLK 101.R Os 1 FROM BIT2 1R -I-4--r---, ( as | I1 If1R CONTROL SIGNALS S BIT SLICE STATE VARIABLES COMMANDS AO CARRY PROPAGATION R 0Os| R CONrOL 0 R 1 s +4_,4_ CARRY PROPAGATION FROM TIT O l+ GO Al 1 OS 1R CARRY CIRCUIT ZO A2 1 1 Il IIl |1 COMPUTERS, VOL. c-28, NO. 9, SEPTEMBER 1979 A 1 R S 0 |1 R1 C 1 IR |R 1R S Bji S 0 _ Fig. 9. A two's complement adder. Fig. 10. Register transfer operations. must program in the clock signals explicitly, rather than having them hidden in the flip-flop. A standard way to do this, achieving the effect of having edge triggered flip-flops, is accomplished by introducing the extra state variable a as first shown in Fig. 7(c). Processor Elements: To demonstrate general processing capabilities, the four basic components of a processing unit are examined, namely: 1) arithmetic and other data manipulation functions to process and sense conditions in data words; 2) transfer paths, for movement of data words between registers and through arithmetic operations; 3) microcommand generation, to create the data path control signals as a function of state variables; and 4) state sequence generation, with branching based on sensed conditions in the data paths. Three examples of SLA construction are shown here to illustrate the ease with which these processor elements are realized. An example of an arithmetic unit is shown in Fig. 9. This is a 4-bit accumulator with a two's complement adder, which adds the contents of the bus Z to the accumulator A, A + Z -- A. The addition is done in two phases, controlled by externally supplied clock signals provided on the bottom two rows. The first phase, using the top two rows, creates a partial sum in AO-A3 and records all carries generated in GO-G3. The second phase uses the middle nine rows to propagate the carries to create the final sum. For larger adders, this simple carry propagation technique would require an excessive number of rows. Two modifications in the basic strategy can be used: 1) do ripple carry propagation and 2) perform the partial sum and carry generation in parallel over groups of bits and then allow carries to ripple from one group to the next. Both schemes reduces the size of the carry propagation circuitry, by having fewer carry generation points, at the cost of time. The following table summarizes some representative two's complement adder costs, including control signals: This range of adder sizes demonstrates that a large degree of time/space tradeoff is available in SLA arithmetic designs. Other data manipulation functions normally used in processor structures are easily implemented in SLA logic, using two-level AND/OR logic or arithmetic structures. The bitwise EXCLUSIVE-OR, for example, requires one clock cycle and three rows of logic. The one logic function which is unusually costly to realize, as it is in most logic families, is the parity function. A second example, in Fig. 10, illustrates basic register transfer mechanisms and control structures, using various timing mechanisms. The upper half of this SLA program demonstrates transfers between a register A and a bus Z, using a pulsed signalling convention. In this example, the bus and microcommand distribution use unlatched columns, so that data placed on the bus are valid only for the time that the microcommand A -+ Z is active. The register is loaded by first clearing it to zeroes and then selectively setting the "one" values, as is often done in connections to pulsed buses. The lower half of the program demonstrates the ability to multiplex into register A the contents of some other register B. In this sample, command distribution is done by latched columns, which permits shorter cycle times but requires an extra column per bit slice. This example illustrates the effectiveness of row and column segmentation. Note that the transfers between bus and register are very economical because single rows can be segmented into multiple small gates. Likewise, the internal columns are segmented for use with different control signals in different sections of the logic. Note also that this bus transfer structure can be appended directly to the adder of the previous example to give an accumulator. The left-hand columns of this example illustrate a typical form of microcommand signal generation. The state variables of the processor are distributed along full columns, and each microcommand signal is generated by one row which responds to the appropriate state and input conditions. In this system one row is needed for each irreducible combination of state variables which actuate the same command signal, so it is often convenient to code the state space sparsely. That is, if some command signal is generated by many states, it is desirable to code those states into a Number of clock times Adder Size 2 clocks 3 clocks 4 clocks 16 clocks 4 bits 8 bits 16 bits 19 rows 40 rows 11 rows 9 rows 14 rows 23 rows 8 rows 12 rows 17 rows 8 rows 8 rows 8 rows 599 PATIL AND WELCH: LOGIC APPROACH FOR VLSI b15 .11, NY 1 b14 iT ~~HIGH-ORDER DATA INDEX, CONTROL, 8 ROWS Z8 INTERFACE LOGIC 16 COLUMNS FOUR COLUMNS P BIT SLICE ER HIGH-ORDER INDEX STORAGE 6 INDEX VALUES 48 ROWS UPPER BOUND COMPARISON 40 ROWS LOWER BOUND COMPARISON 40 ROWS HIGH-ORDER BUS TIMING LINES 48 ROWS 16 BIT ADDER LOW-ORDER BUS LOW-ORDER INDEX STORAGE 6 INDEX VALUES 48 ROWS 7 INTERFACE LOGIC 8 ROWS LOWER-ORDER DATA bo b6 b7 Fig. 11. SLA implementations of address generator. single equivalence class on one subset of the state variables, even though this may require using more state variables. The mechanism used for sequencing through processor states is essentially the same as that shown for finite-state machine control in Fig. 8. It is interesting to note that this sequencing logic can be dispersed throughout the data manipulation areas of the chip. The third example is more complex, and is intended to show typical chip layout rather than specific programming. This unit is an address generator as found in a typical (not specific) computer. It receives a 16-bit displacement, adds a specified one ofsix possible index registers, checks the higher order results against upper and lower bounds of eight high-order bits each, and produces the 16-bit result if the bounds checks succeed. A 16-bit bus is used for input and output, as well as to load index and bound values. Additional lines specify the index register number and register store commands. The chip layout, Fig. 11, illustrates a standard technique used for a variety of problems. The array is divided by columns into bit slices, with each data bit given four adjacent columns. The array is divided down its length (rowwise) into functional units, with separate units for bounds check, index register storage, and final address addition. To conserve columns the 16 bit data word is split so that the high-order eight bits are handled in the top part of the array and the low-order 8 bits in the bottom. Each logic unit is then 8 data bits, or 32 SLA columns, wide. Each bit slice contains one column which is a bus, labeled Zi, to connect through the various related logic units. The other three columns are segmented at unit boundaries, and hold variables which are local to the logic units. On the left side of the array 16 columns are used for common control information, such as the five-phase clock, three index specification bits, operation specification bits, and various condition codes. The logic of this example is complicated (not atypically) by the desire to execute the address generation function in five basic clock cycles (AND-OR-Set/Reset cycles). The timing on the internal data bus is this: In cycle 1 the bus carries the displacement value furnished from outside the chip. In cycles 2 to 3, it carries the index value, and in cycles 4 to 5 it carries the resultant index-plus displacement sum. The 16-bit adder stores the displacement from the bus in cycle 1, and then adds in the index value in cycles 3 and 4. The bounds comparison units have the most critical timing since the index value is not available to them until the third cycle. The resulting strategy is to subtract the displacement from the bounds in cycles 1 and 2, and then compare these results to the index in cycles 3 and 4. Since this comparison is incomplete without knowledge of a possible carry out ofthe low-order eight-bit index plus displacement sum, the final comparison is completed in cycle 5 when the carry signal is available. The individual units are generally similar to the structures shown in Figs. 9 and 10. The bound comparison units have one 8-bit subtractor, which is isomorphic to an 8-bit adder but with altered carry logic. The final comparison unit is isomorphic to an 8-bit adder whose carry circuitry is simplified to provide the carry signal for the high order bit position only. One extra row is used to recognize the equal or near-equal case when a possible carry-in would change the result, and this signal is sent separately to the final decision logic. The index storage areas consist of two registers stored in each 16 rows using the structure of Fig. 10. Index selection on load and store is done according to index selection bits, contained in the left-hand control columns, which augment the state variables in activating microcommands. The 16-bit main adder sits at the junction between the buses from the high-order and low-order halves of the word, so that the full 16-bit carry circuit is all in one place. This example illustrates the flexibility with which the areas of the array can be partitioned into various different functional units. It also illustrates the ability for several internal units to operate concurrently to improve performance. It should be pointed out that the design presented is only one point in a possible cost performance tradeoff curve. More compact designs could have been selected, at the expense of extra cycles in execution time, or vice-versa. The SLA size required for this problem (240 rows by 48 columns) has approximately the same number of circuit elements (transistors) as a 65K ROM. COMPARISON WITH OTHER LOGIC FORMS WTThe following comparison of the SLA with alternative VLSI logic forms focuses on the fundamental properties of each logic form since implementation details will change with time. Many variations on each form must be anticipated in the future, so the discussion here will concentrate on simple configurations of each logic type, to emphasize basic differences rather than idiosyncrasies of particular implementations. These comparisons assume the use of reasonably equivalent technology throughout, namely 600 equivalent component speeds and density. Comparisons based on specific implementations are not attempted here because such comparisons are rendered obsolete by rapidly changing semiconductor technology. Microprogrammed Structures: An important form of future processor design will involve use of standard processor chips with control programs in a separate control memory (e.g., ROM). To achieve reasonable production volumes, to be useful across a range of applications, and to utilize a limited number of connection pins, the processor chip must provide a generalized set of operations which are simple in nature. That is, the operations will seldom provide specialized actions based on particular data patterns. Also, the control memory will generally be a separate chip to permit interchange of memory type (RAM, ROM, EPROM). Thus a microprogrammed structure can be expected to give lower performance than the SLA for three reasons: 1) the data paths in a standard processor will often fail to exactly match the needs of a particular applications, requiring extra program steps to extract and combin offsized data fields, while the SLA can easily be adapted to varying sizes; 2) the storage of control information in a memory chip external to the processor requires extra delay, particularly in test/branch conditions, due to inter-chip delay times-which are much longer than intra-chip delays, and 3) standard processors can generally execute only one action at a time, while the SLA carries out concurrent actions with ease. The microprogrammed structure will store a higher density of control information in its control memory than the SLA can store, so the SLA will be generally more expensive. Also present technology provides flexibility in erasable memories so the microprogrammed approach permits easier development and debugging. This technology advantage can be expected to continue for the foreseeable future. The SLA is expected to provide functional speeds ten times faster than equivalent microprocessors, but at somewhat higher design and production costs. Thus the two logic types will fall into rather distinct application areas. Gate Arrays: A gate array is a standard set of logic elements on a chip, for which the designer creates a customized interconnection pattern implemented in two layers of metallization. The use of the same set of logic elements for a wide variety of applications tends to force the use of simple general-purpose elements (e.g., two-input NAND gates) with little variety in element types on the chip. In terms of logic speed, the gate array can be expected to give higher performance than the SLA because individual gates will be two to three times faster than SLA elements due to fan-in, fan-out differences. This raw speed will be partially offset by the need to cascade several gate array logic levels when substantial fan-in is needed. Component costs are affected by effective logic density, which is difficult to predict. The gate array achieves high utilization of its active components, but substantial areas of the chip (60 percent to 80 percent for large arrays) must be left open for interconnection lines. The SLA utilizes its active elements very inefficiently (using 10 percent to 20 percent) but it achieves very high densities of IEEE TRANSACTIONS ON COMPUTERS, VOL. c-28, NO. 9, SEPTEMBER 1979 elements on the chip. In comparing design costs, the SLA has a clear advantage because 1) the gate array requires substantial design effort to perform interconnection routing and 2) the gate array requires masks for two metallization layers as opposed to one contact mask for the SLA. The SLA offers a good possibility of being implemented in field programmable versions, while the gate array interconnects are best done by the chip manufacturer. In general, then, the gate array will probably provide higher performance and somewhat higher logic density, but at the cost of increased design effort and longer implementation turn-around time. The gate array does provide better flexibility in making connections between interior portions of the array and the I/O pins, which is an advantage if one large array is being used for a number of independent functions. However, this is not likely to be a common VLSI design strategy because of pin limitations. Customized Logic: A VLSI chip specifically designed for a particular application will produce very high performance implementations. This approach, however, involves extended implementation delays since masks for all the semiconductor processing steps are custom made, so chip reimplementation when working out design errors may take months. Further, the relatively low production volumes of customized parts in most computer applications implies higher chip costs. It is difficult at this time to foresee if the higher speed and density of customized parts can offset high-design costs at the production volumes found in most digital applications. PLA's: In comparing the SLA to conventional PLA's, the fundamental issue is efficiency of chip area utilization. The PLA has separate AND and OR arrays, so there is no opportunity to split rows and columns as in the SLA. The SLA's capability of being partitioned into concurrently operating functional units permits much higher effective logic density, especially in very large arrays and in implementing register structures. VLSI chips could be constructed using several small interconnected PLA's, but such an arrangement would still lack the flexibility in array size and function interconnection that the SLA gives. Partially offsetting this effect is the somewhat lower density of connection points on the SLA. The distributed flip-flops in the SLA absorb 10 percent to 20 percent of the chip area, while equivalent separate storage on a PLA might take less space than this. The SLA space used for programming row and column splits may take 10 percent to 20 percent of the chip area. The resulting 20 percent to 45 percent lower element density of the SLA, however, is expected to be small compared to the expected three to five times greater functional density achievable in typical logic structures achieved by row and column segmentation in typical register-oriented logic structures. For example, ifthe address generator of Fig. 11 were implemented in one large PLA array, it would require 200 inputs, 198 outputs, and 482 rows. This requires 192 000 row-column intersections, as compared to 46 000 in the SLA implementation. The PLA is more efficient for straight combinational logic, and so is the preferred logic type in such applications as microcommand decoding in microprocessor chips. 601 PATIL AND WELCH: LOGIC APPROACH VLSI However, if one chip type is chosen for implementing all data paths and control logic in a processor or controller, [1] then the SLA will be more efficient. REFERENCES S. S. Patil, "An asynchronous logic array," Tech. Memo TM-62, Project MAC, MIT, Cambridge, Ma, May 1975. [2] S. S. Patil, "Micro-control for parallel asynchronous computers," in 1975 Proc. Euromicro, North-Holland Publishing, Co. [3] D. L. Greer, "An associative logic matrix," IEEE J. Solid-State Circuits, vol. SC-II, pp: 679-691, Oct. 1976. [4] H. Fleisher and L. I. Maissel, "An introduction to array logic," IBM J. Res. Develop., pp. 98-109, Mar. 1975. [5] J. E. Logue et al., "Hardware implementation of a small system in programmable logic arrays," IBM J. Res. Develop., pp. 110-1 19, Mar. 1975. CONCLUSIONS Our limited set of design experiments indicates that the SLA is a useful general purpose design tool for sequential logic. It provides a good compromise in speed, logic density, and design cost compared to other VLSI logic forms. The SLA may well be valuable for a considerable range of applications. An important aspect of the SLA is the straightforward translation it provides between formal logic models and working logic circuits. Use of the SLA appears to provide a natural path to well structured and easily designed logic. It S. Patil received the B.Tech. (Hons.) degree from the Indian Institute appears to provide an excellent opportunity for use of a Suhas of Technology, Kharagpur, India, in 1965, and the Sc.D. degree in electrical high-level design language, with automatic translation into engineering from the Massachusetts Institute of Technology, Cambridge, reasonably efficient circuitry. The principal design problem in 1970. He was on the faculty of the Department of Electrical Engineering and which would require human interaction would be in chip Computer Science at M.I.T. from 1970 to 1975. He was a member of Project layout. Limited experiments to data indicate that a single MAC at M.I.T. from 1966 to 1975 and served as Assistant Director of person doing an SLA circuit design can sometimes achieve Project MAC from 1972 to 1974. Since 1975 he has been on the faculty of the Department of Computer Science at the University of Utah, Salt Lake better chip area efficiency than a team of logic designers and City, where he is an Associate Professor. His current interests include semiconductor layout people doing a custom design for the structured design of VLSI and the architecture of parallel processing same circuit using the same semiconductor technology. This machines. In the past he has worked on Petri nets, arbiter structures, occurs because the design team communicates imperfectly asynchronous digital systems and coordination of asynchronous events. while the SLA designer has simultaneous perception ofboth the logical design and the chip layout. The availability of a device like the SLA fills a troubleTerry A. Welch (S'59-M'63) received the S.B., some void in computer implementation tools, namely a S.M, and Ph.D. degrees in electrical engineering readily alterable form of logic which can be used to interpret A> | from the Massachusetts Institute of Technology, complex data forms. For example, consider basic instruction Cambridge, in 1960, 1962, and 1971, respectively. From 1960 to 1967 he worked with Honeywell, interpretation, which requires concurrent actions on several Inc. in design of small commercial computers and short fields in one word (i.e., op code, register selection, related equipment. From 1971 to 1976 he was index value, etc.). Microprogrammed interpretation of the Assistant Professor of Electrical Engineering and of Computer Sciences at the University of Texas compacted data fields of instruction is quite slow, but at Austin. Since 1976 he has been manager of X.; customized logic is too rigid to permit the frequent small the Department of Computer Architecture at the specification changes which must be accommodated. Future Sperry Research Center, Sudbury, MA. His recent research has been in efforts to transfer more high-level language features into descriptor based computer architectures and in memory hierarchy analysis. is a membef of the Association for Computing Machinery machine language would benefit greatly from a high perfor- andDr.forWelch several years was Chairman of the Central Texas Chapter of the mance, easily alterable VLSI device. Computer Society.
© Copyright 2025 Paperzz