A Programmable Logic Approach for VLSI

594
IEEE
A
TRANSACTIONS ON COMPUTERS, VOL. c-28, NO. 9, SEPTEMBER 1979
Programmable Logic Approach for
SUHAS S. PATIL
AND
TERRY A. WELCH, MEMBER,
Abstract-This paper explores the use of a proposed programmable storage/logic array (SLA) chip as a general purpose universal
logic element for digital computers. The SLA is compared to other
programmable logic arrays in implementation and utilization, showing how it permits construction of complete digital subsystems on one
chip without sacrifice in programmability. When compared with
other contending very large-scale integrated technology (VLSI)
approaches, such as microprogrammed processors and gate arrays,
the SLA offers an attractive combination of cost, performance, and
ease of implementation.
Index Terms-Asynchronous circuits, digital integrated circuit
design, digital systems design, logic arrays, programmable logic
circuits, very large-scale integrated (VLSI) technology.
INTRODUCTION
T HE ADVENT of very large scale integrated (VLSI)
circuit technology illustrates the continuing trend
toward increased gate counts on logic chips. Due to the
complexity and cost of designing such chips, a structured
form of logic implementation is desirable. A programmable
logic structure would permit use of a single chip type for a
variety of functions within a computer, giving improved
production volume to overcome the adverse unit costs of
special-purpose VLSI chips. The programmable logic array
(PLA) promises a possible solution to many of these problems, but existing PLA implementations may not be effective
in achieving high logic density on a chip. A universal VLSI
chip programmed in a PLA-like form is very desirable if it
can implement entire subsystems on a single chip. This
paper explores this possibility of using a proposed programmable storage/logic array (SLA) chip for general purpose
logic in computer systems.
The SLA is a form of PLA which contains flip-flops
distributed throughout the array. We have explored the use
of the SLA as a general purpose logic component, and
conclude that the SLA is flexible and efficient for both data
manipulation and sequential control. Furthermore, its
ROM-like programming mechanism promises fast turnaround in design and implementation of a wide range of
logic structures.
This paper explains the structure of the SLA, describes its
implementation, discusses its use in some typical computer
logic, and compares it with other logic structures used for
processor implementation. Implementation evidence and
design examples are presented to support the assertion that
OUTPUTS
*-@
0*0
i
:
T
CONJUNCTION
TERMS
CHOICE OF:
1, 0. NO CONNECTION
CHOICE OF:
1, NO CONNECTION
AND ARRAY
OR ARRAY
Fig. 1. Conventional PLA.
FEEDBACK
| |
-
Manuscript received February 26, 1979.
The work on storage/logic arrays was supported in part by the National
Science Foundation under Grants DCR 74-2182 and MCS 78-04853 S. S.
Patil was employed by Sperry Univac, Sudbury, MA at the time this work
was carried out.
S. S. Patil is with the Department of Computer Science, University of
Utah, Salt Lake City, UT 84112.
T. A. Welch is with Sperry Research Centre, Sudbury, MA 01776.
IEEE
INPUTS
Zn:
VLSI
FLIP-FLOPS
|
e
|
CONJUNCTION
0
11 [11 [ 'TIRMS
TI II1 1
AND ARRAY
1 [ I 1
7
OR ARRAY
Fig. 2. PLA with feedback.
the SLA is a viable candidate for becoming a practical
general-purpose VLSI part.
STRUCTURE OF THE SLA
The traditional PLA is a combinational circuit whichproduces multiple Boolean outputs from its inputs. It differs
from an ROM only in that implicants rather than the
minterms of Boolean functions are stored, so that it is not
necessary to have a word of storage for every input combination. A PLA is usually implemented in the form of two arrays
(see Fig. 1): An AND array which forms selected conjunction
terms (implicants) based on input data, and an OR array
which combines conjunction terms to form the correct
outputs. The array is programmed by selecting (via makebreak connections) whether a conjunction line is gated by a
1, 0, or neither on each input, and whether or not an output
line responds to a conjunction line.
In most computer designs, purely combinational PLA's
are difficult to use extensively because of pin limitations. In
response to these pin limitations, some PLA's have added
flip-flops on the chip, providing internal feedback from the
outputs back to the inputs in the style of classical statemachines [4], [5] (see Fig. 2). While these PLA's can be used
in a much wider range of applications, they potentially suffer
from inefficient utilization of the chip area within the AND
and OR arrays because only a small fraction ofavailable logic
elements on the chip are actually used in typical designs.
The SLA is a variation on previous PLA's in that the AND
0018-9340/79/0900-0594$00,.75 (© 1979 IEEE
595
PATIL AND WELCH: LOGIC APPROACH FOR VLSI
3_
=.
CHOICE OF:
1.0 NO CONNECTION
X2
:. _ OI
.
1
zI 1
CHOICE OF:
SET, RESET, NO CONNECTION
Fig. 3. Storage/logic array (SLA).
Fig. 4. Associative logic array.
array and OR array are folded together so that input lines and
output lines are alternated within a single array [1] (see Fig.
3). This has two important effects: 1) substantially more
flip-flops can be added without the need for excess inputoutput routing space and 2) rows of the array (conjunction
signals) can be subdivided into multiple independent segments which can represent independent variables over
smaller portions of the array. Extending these two concepts
further, more flip-flops can be added at intervals along the
columns of the array, so that the columns can also be
subdivided into segments carrying independent variables
with localized access. As a result of these changes the SLA is
used in quite a different manner than is the PLA. Portions of
the array can be used for independent tasks, such as, for
example, using the upper right-hand corner to build an
adder while using the left-hand columns for sequencing
control and the lower rows for a register structure. This
approach permits a much denser packing of logic into an
array and permits execution of more complex functions on a
3) Columns may be segmented into independent column
segments at points provided every Y rows (typical Y = 8).
Storage cells (flip-flops) are provided every n Y rows (typical
nY = 16). Every column segment must contain at least one
storage cell.
4) Rows may be segmented at breakpoints provided
every X columns (typical X = 4).
Various columns and/or rows are connected to the input
and output pins of the chip with suitable interface drivers;
this arrangement can vary from design to design and so will
not be described in detail here.
AN SLA CIRCUIT
An SLA can be implemented either as a clocked structure,
in which the successive actions of the SLA are timed by a
master clock, or as an asynchronous structure where the
successive actions are determined by the temporal constraints within the SLA. Both forms of SLA are of practical
importance depending on the applications and the technolsingle chip.
ogy of implementation. The asynchronous form of SLA is
The concept of row segmentation was previously described in this paper because it has more interesting
developed by Greer [3]. His associative logic array contains timing considerations.
A circuit for the SLA is shown in Fig. 5. It is illustrated
internal feedback (see Fig. 4) but lacks explicit flip-flops
(local feedback) and column segmentation. Thus, the con- here in simple transistor form using positive logic notation
cepts developed by Greer to achieve higher effective logical for ease of understanding, but the concepts are applicable in
density are extended in the SLA to further improve array standard semiconductor technologies with little change.
Row-column connections are made by transistors whose
utilization efficiency.
Viewed as a logical element, the SLA has these collectors are selectively connected in a wired-NOR structure.
characteristics:
Storage cells consist of cross-coupled NAND gates with
1) Each logical column can either act as a stored binary complemented inputs S and R. A logical column consists of
variable, or performs the logical OR function of the row the four wires S, R, Q, and Q. A column can be divided into
inputs. When the column segment represents a stored segments by breaking all four wires at the breakpoints
provided at regular intervals along the column. Segments
variable, the values can be set and reset by the rows.
2) Each row is an implicant or conjunction term over the which contain two or more flip-flops operate the flip-flops
selected column variables; a column input may either be the in parallel for electrical load sharing purposes, so that
column value, its complement or there may be no connec- switching times are not strongly affected by the length of
tions from the column to that row. These variables are column segments.
The circuit in the storage cell contains breakpoints which
ANDed to form the row value.
596
IEEE TRANSACTIONS ON COMPUTERS, VOL.
STORAGEl
CELL
NAN
1979
C'OLUMN
BREAK
POINTS
X
__
SEPTEMBER
VCc
Vcc
F
c-28, NO. 9,
ROW WIRE
RI
S1
O GO
LOGIC -
01
01 S2
x
BREAK
rl
~~~~~~~~~~~~~~~~~POINT
I
02 02
R2
CIRCUIT
-
do
X =PROGRAMMABLE
BREAK POINT
Fig. 5. Basic cells of an asynchronous SLA.
permit its use for several functions other than the flip-flop
described. The feedback loop can be broken, so that the
outputs are simple combinational functions of the inputs.
Also, the input signals can be selectively inverted; the
normal flip-flop configuration bypasses the input inverters
and utilizes the feedback paths. The normal "split-column"
configuration breaks both feedback paths and uses both
input inverters, so that the Q line becomes the positive OR
function of the inputs to the S line, and Q line becomes the
positive OR function of the inputs to the R line. Other
configurations are possible and useful but are not discussed
in this paper.
Rows are single wires which may be divided into smaller
segments at breakpoints provided at regular intervals along
their length. At each breakpoint a pull-up resistor is
provided to feed current into the row wire; long row
segments will have several such circuits active in parallel.
Since the rows are activated by an NOR function of various
Q and Q signals, they are used with the compliments of the
selected input variables to produce the positive AND function. Likewise, the 3 and R signals are NOR functions of
selected row signals, so the Set and Reset actions are seen as
OR functions of row variables. Thus, the SLA imnplements the
same AND-OR functions implemented in conventional
PLA's, so the SLA is able to perform all PLA type functions
in conjunction with its storage abilities.
The asynchronous form of SLA provides apparently
conventional implementation ofasynchronous logic, but the
fact that all elements are on the same chip permits use of
timing assumptions which are not normally reasonable with
asynchronous logic built from discrete elements.
Specifically, the uniformity of active device implementation
and the uniformity in column loading permit the assumption that the variance in delay times between the fastest and
slowest column is small relative to the average delay time of
the columns. Under this assumption, a row in the SLA is
able to alter the state of several column flip-flops simultaneously without the problems of race conditions. This
I
II-
51
rl
i1
r2
Y1
Y2
OS
IR
1
1
PROGRAM
A
01
I
lo
/
l
I1
I,
Q2
Q1-
Q1 AND 02 MUST BE LOW
SIMULTANEOUSLY TO ACTIVATE r2
Fig. 6. SLA timing example.
property of the SLA is illustrated with the help of the timing
diagram of Fig. 6, in which row rl is turned ON when input x
is a 1 and columns Yl and Y2 are in states 0 and 1,
respectively. In the ON state, row rl drives the set input of Yl
and the reset input of Y2 so that the state variables Yl, Y2
switch from 0, 1 to 1, 0. Row r2 implements a conjunction
term that responds to the state Yl, Y2 = 1, 1 (when
Ql = Q2 = 0). In ordinary asynchronous circuits a simultaneous change in Yl and Y2 could have created a race
condition, but here a race is avoided because Ql and Ql first
Ql to become 1, 1 and one more gate delay for Ql, Ql to
reach the final value 1, 0. If the variation in this signal
propagation delay between Yl and Y2 is less than one gate
delay, Ql and Q2 will never be both in the state 0 at the same
time, and row r2 will never get a chance to become active.
In terms of technology, the SLA as described uses
modified ROM circuitry. An SLA of 256 rows and 48
columns with column breakpoints every 8 rows and row
breakpoints every 4 columns can be implemented with an
LSI circuit of the complexity of a 65 K ROM. In terms of
speed, the basic cycle time (AND-OR-Set/Reset) of an SLA is
expected to be of the same order as the basic cycle time of a
PLA using internal flip-flops for feedback.
THE SLA NOTATION
The notational conventions used with the SLA are illustrated in Fig. 7(a). The logic is coded in a matrix, with
each column representing the inputs and outputs of an
intemal variable (flip-flop). An entry of 1 or 0 in a rowcolumn intersection indicates that the row is activated only
597
PATIL AND WELCH: LOGIC APPROACH FOR VLSI
INPUT OUTPUT
x = 1
1
I4
I
I0SIi
ROW SEGMENTATION
+
R
_ COLUMN SEGMENTATION
Ilv II7
I
y
SPLIT COLUMNS
(a)
NOTATION ILLUSTRATION
S
So
K
A3 A2Al A0
01
(a)
Gi,v_ll
RESET
lb)
CLOCK
w
x
y
z
4 TO 1 MULTIPLEXER
CLK
D
Of
1
1
Ic)
o
STATE ASSIGNMENT:
A= 00
B = 01
C = 10
OUTPUT:
z = 1 IN STATE C
Z
1-
=
R
0
1
0
0
i
Ra
a
s)
R)
NEGATIVE EDGE TRIGGERED
D FLIP-fLOP
0+0
IR
o
t
R
R
-t-t+-t-t-t-R+
ot R
1R
iR
1
OR 0+0+
1 1R
1
I'
+0+0+
tR
0
C-_B
1R
(b)
Fig. 7. SLA examples.
Fig. 8. A state machine.
if that column takes on the specified value. An entry of S or R
indicates a set or reset action imparted to the flip-flop when
the row is activated. The combinations IR and OS serve to
toggle the flip-flop under the specified conditions. The
combinations OR and IS are legal but not of any particular
value. A row is activated when all of its 1 and 0 inputs are
satisfied, at which time all of its specified S and R actions are
performed. Row and column segments are demarcated by
brackets or parentheses, so that a segment of a row (or
column) between a pair of brackets is independent of all
other portions of that row (column).
Columns which have been split into two independent OR
functions are demarcated by finely dashed lines dividing the
columns. The OR function uses a + to indicate column
activation, rather than S or R, and uses a 1 where the
function output is utilized.
These conventions are illustrated with examples in Fig.
7(b) and (c). Fig 7(b) shows, a four-to-one multiplexer and
Fig. 7(c) shows a negative edge triggered D flip-flop implemented on an SLA. In the multiplexer example, columns 1
and 2 are used to decode the selector bits, columns 3 and 4
carry the four-input signals, and column five carries the
output signal. In Fig. 7(c), the latched variable a is introduced to detect the negative edge in the clock signal C.
When C is high a is set; then when C goes low a is reset as
part of the next action and it then blocks any succeeding
transitions in that clock cycle.
Second, more practically, the components of a standard
SLA LOGIC EXAMPLES
Several examples are given here to illustrate the generality
and characteristics of SLA programming. Two demonstrations of generality are presented. First, an algorithm for
construction of an arbitrary finite-state machine is shown.
processor structure are developed, namely mechanisms for
data transfer and arithmetic.
Finite State Machine: A simple Moore type synchronous
sequential circuit, Fig. 8(a), is implemented by the SLA
program of Fig. 8(b). This illustrates the basic structure in
which state variables, input variables, and output variables
are assigned one column each. Each state-input combination, which causes a state change or output, is assigned one
row. That is, each row recognizes its appointed combination
of present state and input values, and when activated it
adjusts the state variables to the correct next state and
signals the proper outputs. For example, the row marked
A B is activated when Si = SO = 0 (which is the coding of
state A), X = 0 as the input condition, and when Clock =
a = 1. This row causes a transition to state B by setting SO to
1, and produces 1 on the Z output.
This form of state machine design is justified on the SLA,
despite the use of asynchronous latches, because of the
timing assumptions discussed previously. The assumed uniformity in gate delays and the use of two wires Q and Q
(double rail logic) to represent column variables in the SLA
eliminates the difficult timing problems which complicate
traditional asynchronous logic design. Critical races (or
skew) in multiple state variable changes resulting from a
single row activation are not a problem, as shown in the
timing sequences of Fig. 6.
The SLA, therefore, is programmed in almost the same
style as a synchronous sequential-machine, with arbitrary
state transitions allowed and with no extra clock signals
necessary to resolve essential hazards. This provides a
convenient programming style despite the use of inexpensive
latch type flip-flops. The difference here is that the designer
598
IEEE TRANSACTIONS ON
A -A+Z
Z2
Z3
1
PARTIAL
SUM
G3
A3
I
S
10
Os
1
1
OS
t--
G2
Gl
1R
I
OS
1
1
R
10
1
'
_
_
1R
+
+
S
tr
+
>
- -
__
t
S
I
S
t
0
1
I
1
1
0
*
1
-
I
i
I
1
+I+I I"
0
0
1
R
R
R
PHASE 2
PHAE 2
RESET ak
l
B-A
1
1
O
1
1
Zi+ll
i+l
Ai
l 1
CLEAR A
AR Z
Z-A
BIT SLICE i + 1
l.
Sn
O
1 0 1
1
10
R
1R
S
S
1R
1R
1R
S
2R
+
|
--
CARRY PROPAGATION
FROM BIT 1
1Rll
O
R
10R
lR
Os
10R
+ II
CLK
101.R
Os
1
FROM BIT2
1R
-I-4--r---,
( as
|
I1 If1R
CONTROL SIGNALS
S
BIT SLICE
STATE VARIABLES
COMMANDS
AO
CARRY PROPAGATION
R
0Os|
R
CONrOL
0 R
1
s
+4_,4_
CARRY PROPAGATION
FROM TIT O
l+
GO
Al
1
OS
1R
CARRY
CIRCUIT
ZO
A2
1
1
Il
IIl
|1
COMPUTERS, VOL. c-28, NO. 9, SEPTEMBER 1979
A
1
R
S
0
|1
R1
C
1
IR
|R
1R
S
Bji
S
0
_
Fig. 9. A two's complement adder.
Fig. 10. Register transfer operations.
must program in the clock signals explicitly, rather than
having them hidden in the flip-flop. A standard way to do
this, achieving the effect of having edge triggered flip-flops, is
accomplished by introducing the extra state variable a as
first shown in Fig. 7(c).
Processor Elements: To demonstrate general processing
capabilities, the four basic components of a processing unit
are examined, namely: 1) arithmetic and other data manipulation functions to process and sense conditions in data
words; 2) transfer paths, for movement of data words
between registers and through arithmetic operations; 3)
microcommand generation, to create the data path control
signals as a function of state variables; and 4) state sequence
generation, with branching based on sensed conditions in
the data paths. Three examples of SLA construction are
shown here to illustrate the ease with which these processor
elements are realized.
An example of an arithmetic unit is shown in Fig. 9. This is
a 4-bit accumulator with a two's complement adder, which
adds the contents of the bus Z to the accumulator A,
A + Z -- A. The addition is done in two phases, controlled
by externally supplied clock signals provided on the bottom
two rows. The first phase, using the top two rows, creates a
partial sum in AO-A3 and records all carries generated in
GO-G3. The second phase uses the middle nine rows to
propagate the carries to create the final sum.
For larger adders, this simple carry propagation
technique would require an excessive number of rows. Two
modifications in the basic strategy can be used: 1) do ripple
carry propagation and 2) perform the partial sum and carry
generation in parallel over groups of bits and then allow
carries to ripple from one group to the next. Both schemes
reduces the size of the carry propagation circuitry, by having
fewer carry generation points, at the cost of time. The
following table summarizes some representative two's complement adder costs, including control signals:
This range of adder sizes demonstrates that a large degree of
time/space tradeoff is available in SLA arithmetic designs.
Other data manipulation functions normally used in
processor structures are easily implemented in SLA logic,
using two-level AND/OR logic or arithmetic structures. The
bitwise EXCLUSIVE-OR, for example, requires one clock cycle
and three rows of logic. The one logic function which is
unusually costly to realize, as it is in most logic families, is
the parity function.
A second example, in Fig. 10, illustrates basic register
transfer mechanisms and control structures, using various
timing mechanisms. The upper half of this SLA program
demonstrates transfers between a register A and a bus Z,
using a pulsed signalling convention. In this example, the
bus and microcommand distribution use unlatched columns, so that data placed on the bus are valid only for the
time that the microcommand A -+ Z is active. The register is
loaded by first clearing it to zeroes and then selectively
setting the "one" values, as is often done in connections to
pulsed buses. The lower half of the program demonstrates
the ability to multiplex into register A the contents of some
other register B. In this sample, command distribution is
done by latched columns, which permits shorter cycle times
but requires an extra column per bit slice.
This example illustrates the effectiveness of row and
column segmentation. Note that the transfers between bus
and register are very economical because single rows can be
segmented into multiple small gates. Likewise, the internal
columns are segmented for use with different control signals
in different sections of the logic. Note also that this bus
transfer structure can be appended directly to the adder of
the previous example to give an accumulator.
The left-hand columns of this example illustrate a typical
form of microcommand signal generation. The state variables of the processor are distributed along full columns,
and each microcommand signal is generated by one row
which responds to the appropriate state and input conditions. In this system one row is needed for each irreducible
combination of state variables which actuate the same
command signal, so it is often convenient to code the state
space sparsely. That is, if some command signal is generated
by many states, it is desirable to code those states into a
Number of clock times
Adder
Size
2 clocks
3 clocks
4 clocks
16 clocks
4 bits
8 bits
16 bits
19 rows
40 rows
11 rows
9 rows
14 rows
23 rows
8 rows
12 rows
17 rows
8 rows
8 rows
8 rows
599
PATIL AND WELCH: LOGIC APPROACH FOR VLSI
b15
.11,
NY
1
b14
iT
~~HIGH-ORDER
DATA
INDEX,
CONTROL,
8 ROWS
Z8
INTERFACE LOGIC
16
COLUMNS
FOUR COLUMNS
P
BIT SLICE
ER
HIGH-ORDER INDEX STORAGE
6 INDEX VALUES
48 ROWS
UPPER BOUND COMPARISON
40 ROWS
LOWER BOUND COMPARISON
40 ROWS
HIGH-ORDER
BUS
TIMING
LINES
48 ROWS
16 BIT ADDER
LOW-ORDER
BUS
LOW-ORDER INDEX STORAGE
6 INDEX VALUES
48 ROWS
7
INTERFACE LOGIC
8 ROWS
LOWER-ORDER DATA
bo
b6
b7
Fig. 11. SLA implementations of address
generator.
single equivalence class on one subset of the state variables,
even though this may require using more state variables.
The mechanism used for sequencing through processor
states is essentially the same as that shown for finite-state
machine control in Fig. 8. It is interesting to note that this
sequencing logic can be dispersed throughout the data
manipulation areas of the chip.
The third example is more complex, and is intended to
show typical chip layout rather than specific programming.
This unit is an address generator as found in a typical (not
specific) computer. It receives a 16-bit displacement, adds a
specified one ofsix possible index registers, checks the higher
order results against upper and lower bounds of eight
high-order bits each, and produces the 16-bit result if the
bounds checks succeed. A 16-bit bus is used for input and
output, as well as to load index and bound values. Additional lines specify the index register number and register
store commands.
The chip layout, Fig. 11, illustrates a standard technique
used for a variety of problems. The array is divided by
columns into bit slices, with each data bit given four adjacent
columns. The array is divided down its length (rowwise) into
functional units, with separate units for bounds check, index
register storage, and final address addition. To conserve
columns the 16 bit data word is split so that the high-order
eight bits are handled in the top part of the array and the
low-order 8 bits in the bottom. Each logic unit is then 8 data
bits, or 32 SLA columns, wide. Each bit slice contains one
column which is a bus, labeled Zi, to connect through the
various related logic units. The other three columns are
segmented at unit boundaries, and hold variables which are
local to the logic units. On the left side of the array 16
columns are used for common control information, such as
the five-phase clock, three index specification bits, operation
specification bits, and various condition codes.
The logic of this example is complicated (not atypically)
by the desire to execute the address generation function in
five basic clock cycles (AND-OR-Set/Reset cycles). The
timing on the internal data bus is this: In cycle 1 the bus
carries the displacement value furnished from outside the
chip. In cycles 2 to 3, it carries the index value, and in cycles 4
to 5 it carries the resultant index-plus displacement sum. The
16-bit adder stores the displacement from the bus in cycle 1,
and then adds in the index value in cycles 3 and 4. The
bounds comparison units have the most critical timing since
the index value is not available to them until the third cycle.
The resulting strategy is to subtract the displacement from
the bounds in cycles 1 and 2, and then compare these results
to the index in cycles 3 and 4. Since this comparison is
incomplete without knowledge of a possible carry out ofthe
low-order eight-bit index plus displacement sum, the final
comparison is completed in cycle 5 when the carry signal is
available.
The individual units are generally similar to the structures
shown in Figs. 9 and 10. The bound comparison units have
one 8-bit subtractor, which is isomorphic to an 8-bit adder
but with altered carry logic. The final comparison unit is
isomorphic to an 8-bit adder whose carry circuitry is
simplified to provide the carry signal for the high order bit
position only. One extra row is used to recognize the equal
or near-equal case when a possible carry-in would change
the result, and this signal is sent separately to the final
decision logic.
The index storage areas consist of two registers stored in
each 16 rows using the structure of Fig. 10. Index selection
on load and store is done according to index selection bits,
contained in the left-hand control columns, which augment
the state variables in activating microcommands. The 16-bit
main adder sits at the junction between the buses from the
high-order and low-order halves of the word, so that the full
16-bit carry circuit is all in one place.
This example illustrates the flexibility with which the
areas of the array can be partitioned into various different
functional units. It also illustrates the ability for several
internal units to operate concurrently to improve performance. It should be pointed out that the design presented is
only one point in a possible cost performance tradeoff curve.
More compact designs could have been selected, at the
expense of extra cycles in execution time, or vice-versa. The
SLA size required for this problem (240 rows by 48 columns)
has approximately the same number of circuit elements
(transistors) as a 65K ROM.
COMPARISON WITH OTHER LOGIC FORMS
WTThe following comparison of the SLA with alternative
VLSI logic forms focuses on the fundamental properties of
each logic form since implementation details will change
with time. Many variations on each form must be anticipated in the future, so the discussion here will concentrate
on simple configurations of each logic type, to emphasize
basic differences rather than idiosyncrasies of particular
implementations. These comparisons assume the use of
reasonably equivalent technology throughout, namely
600
equivalent component speeds and density. Comparisons
based on specific implementations are not attempted here
because such comparisons are rendered obsolete by rapidly
changing semiconductor technology.
Microprogrammed Structures: An important form of
future processor design will involve use of standard processor chips with control programs in a separate control
memory (e.g., ROM). To achieve reasonable production
volumes, to be useful across a range of applications, and to
utilize a limited number of connection pins, the processor
chip must provide a generalized set of operations which are
simple in nature. That is, the operations will seldom provide
specialized actions based on particular data patterns. Also,
the control memory will generally be a separate chip to
permit interchange of memory type (RAM, ROM,
EPROM). Thus a microprogrammed structure can be expected to give lower performance than the SLA for three
reasons: 1) the data paths in a standard processor will often
fail to exactly match the needs of a particular applications,
requiring extra program steps to extract and combin offsized data fields, while the SLA can easily be adapted to
varying sizes; 2) the storage of control information in a
memory chip external to the processor requires extra delay,
particularly in test/branch conditions, due to inter-chip
delay times-which are much longer than intra-chip delays,
and 3) standard processors can generally execute only one
action at a time, while the SLA carries out concurrent
actions with ease.
The microprogrammed structure will store a higher
density of control information in its control memory than
the SLA can store, so the SLA will be generally more
expensive. Also present technology provides flexibility in
erasable memories so the microprogrammed approach
permits easier development and debugging. This technology
advantage can be expected to continue for the foreseeable
future.
The SLA is expected to provide functional speeds ten
times faster than equivalent microprocessors, but at somewhat higher design and production costs. Thus the two logic
types will fall into rather distinct application areas.
Gate Arrays: A gate array is a standard set of logic
elements on a chip, for which the designer creates a customized interconnection pattern implemented in two layers
of metallization. The use of the same set of logic elements for
a wide variety of applications tends to force the use of simple
general-purpose elements (e.g., two-input NAND gates) with
little variety in element types on the chip. In terms of logic
speed, the gate array can be expected to give higher performance than the SLA because individual gates will be two to
three times faster than SLA elements due to fan-in, fan-out
differences. This raw speed will be partially offset by the need
to cascade several gate array logic levels when substantial
fan-in is needed. Component costs are affected by effective
logic density, which is difficult to predict. The gate array
achieves high utilization of its active components, but
substantial areas of the chip (60 percent to 80 percent for
large arrays) must be left open for interconnection lines. The
SLA utilizes its active elements very inefficiently (using 10
percent to 20 percent) but it achieves very high densities of
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-28,
NO.
9,
SEPTEMBER
1979
elements on the chip. In comparing design costs, the SLA
has a clear advantage because 1) the gate array requires
substantial design effort to perform interconnection routing
and 2) the gate array requires masks for two metallization
layers as opposed to one contact mask for the SLA. The SLA
offers a good possibility of being implemented in field
programmable versions, while the gate array interconnects
are best done by the chip manufacturer.
In general, then, the gate array will probably provide
higher performance and somewhat higher logic density, but
at the cost of increased design effort and longer implementation turn-around time. The gate array does provide better
flexibility in making connections between interior portions
of the array and the I/O pins, which is an advantage if one
large array is being used for a number of independent
functions. However, this is not likely to be a common VLSI
design strategy because of pin limitations.
Customized Logic: A VLSI chip specifically designed for a
particular application will produce very high performance
implementations. This approach, however, involves extended implementation delays since masks for all the semiconductor processing steps are custom made, so chip
reimplementation when working out design errors may take
months. Further, the relatively low production volumes of
customized parts in most computer applications implies
higher chip costs. It is difficult at this time to foresee if the
higher speed and density of customized parts can offset
high-design costs at the production volumes found in most
digital applications.
PLA's: In comparing the SLA to conventional PLA's, the
fundamental issue is efficiency of chip area utilization. The
PLA has separate AND and OR arrays, so there is no
opportunity to split rows and columns as in the SLA. The
SLA's capability of being partitioned into concurrently
operating functional units permits much higher effective
logic density, especially in very large arrays and in implementing register structures. VLSI chips could be constructed using several small interconnected PLA's, but such
an arrangement would still lack the flexibility in array size
and function interconnection that the SLA gives.
Partially offsetting this effect is the somewhat lower
density of connection points on the SLA. The distributed
flip-flops in the SLA absorb 10 percent to 20 percent of the
chip area, while equivalent separate storage on a PLA might
take less space than this. The SLA space used for programming row and column splits may take 10 percent to 20
percent of the chip area. The resulting 20 percent to 45
percent lower element density of the SLA, however, is
expected to be small compared to the expected three to five
times greater functional density achievable in typical logic
structures achieved by row and column segmentation in
typical register-oriented logic structures. For example, ifthe
address generator of Fig. 11 were implemented in one large
PLA array, it would require 200 inputs, 198 outputs, and 482
rows. This requires 192 000 row-column intersections, as
compared to 46 000 in the SLA implementation.
The PLA is more efficient for straight combinational
logic, and so is the preferred logic type in such applications
as microcommand decoding in microprocessor chips.
601
PATIL AND WELCH: LOGIC APPROACH VLSI
However, if one chip type is chosen for implementing all
data paths and control logic in a processor or controller, [1]
then the SLA will be more efficient.
REFERENCES
S. S. Patil, "An asynchronous logic array," Tech. Memo TM-62,
Project MAC, MIT, Cambridge, Ma, May 1975.
[2] S. S. Patil, "Micro-control for parallel asynchronous computers," in
1975 Proc. Euromicro, North-Holland Publishing, Co.
[3] D. L. Greer, "An associative logic matrix," IEEE J. Solid-State Circuits, vol. SC-II, pp: 679-691, Oct. 1976.
[4] H. Fleisher and L. I. Maissel, "An introduction to array logic," IBM J.
Res. Develop., pp. 98-109, Mar. 1975.
[5] J. E. Logue et al., "Hardware implementation of a small system in
programmable logic arrays," IBM J. Res. Develop., pp. 110-1 19, Mar.
1975.
CONCLUSIONS
Our limited set of design experiments indicates that the
SLA is a useful general purpose design tool for sequential
logic. It provides a good compromise in speed, logic density,
and design cost compared to other VLSI logic forms. The
SLA may well be valuable for a considerable range of
applications.
An important aspect of the SLA is the straightforward
translation it provides between formal logic models and
working logic circuits. Use of the SLA appears to provide a
natural path to well structured and easily designed logic. It
S. Patil received the B.Tech. (Hons.) degree from the Indian Institute
appears to provide an excellent opportunity for use of a Suhas
of Technology, Kharagpur, India, in 1965, and the Sc.D. degree in electrical
high-level design language, with automatic translation into engineering from the Massachusetts Institute of Technology, Cambridge,
reasonably efficient circuitry. The principal design problem in 1970.
He was on the faculty of the Department of Electrical Engineering and
which would require human interaction would be in chip Computer
Science at M.I.T. from 1970 to 1975. He was a member of Project
layout. Limited experiments to data indicate that a single MAC at M.I.T. from 1966 to 1975 and served as Assistant Director of
person doing an SLA circuit design can sometimes achieve Project MAC from 1972 to 1974. Since 1975 he has been on the faculty of
the Department of Computer Science at the University of Utah, Salt Lake
better chip area efficiency than a team of logic designers and City,
where he is an Associate Professor. His current interests include
semiconductor layout people doing a custom design for the structured design of VLSI and the architecture of parallel processing
same circuit using the same semiconductor technology. This machines. In the past he has worked on Petri nets, arbiter structures,
occurs because the design team communicates imperfectly asynchronous digital systems and coordination of asynchronous events.
while the SLA designer has simultaneous perception ofboth
the logical design and the chip layout.
The availability of a device like the SLA fills a troubleTerry A. Welch (S'59-M'63) received the S.B.,
some void in computer implementation tools, namely a
S.M, and Ph.D. degrees in electrical engineering
readily alterable form of logic which can be used to interpret
A> | from the Massachusetts Institute of Technology,
complex data forms. For example, consider basic instruction
Cambridge, in 1960, 1962, and 1971, respectively.
From 1960 to 1967 he worked with Honeywell,
interpretation, which requires concurrent actions on several
Inc. in design of small commercial computers and
short fields in one word (i.e., op code, register selection,
related equipment. From 1971 to 1976 he was
index value, etc.). Microprogrammed interpretation of the
Assistant Professor of Electrical Engineering and
of Computer Sciences at the University of Texas
compacted data fields of instruction is quite slow, but
at Austin. Since 1976 he has been manager of
X.;
customized logic is too rigid to permit the frequent small
the Department of Computer Architecture at the
specification changes which must be accommodated. Future Sperry Research Center, Sudbury, MA. His recent research has been in
efforts to transfer more high-level language features into descriptor based computer architectures and in memory hierarchy analysis.
is a membef of the Association for Computing Machinery
machine language would benefit greatly from a high perfor- andDr.forWelch
several years was Chairman of the Central Texas Chapter of the
mance, easily alterable VLSI device.
Computer Society.