IBM System 360 Engineering

206
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
INPUT -OUTPUT CHANNELS
MAIN STORAGE
SELECTOR CHANNEL
MULTIPLEX CHAN.
MODEL MAX. NUMBER
MAXIMUM MAXIMUM DATA RATE
SUBCHANNELS
PER CHAN. (KC)
NUMBER
30
40
50
60
62
70
250
400
800
1300
1300
2
2
3
6
6
6
96
128
256
_ }NOT
_ AVAILABLE
1300
MODEL
CAPACITY
8-BITBYTES
lK "1024
WIDTH BITS
EXCLUDING
PARITY
30
40
50
60
62
70
8-64K
16 - 2S6K
32 - 256K
128 -5 r2K
256 - 512K
256 - 512K
8
16
32
64
64
64
CYCLE
)jS
2.0
2.S
2.0
2.0
1.0
1.0
t
ADDRESSES
INSTRUCTIONS
DATA FLOW
CONTROL
TYPE
30
40
50
60
62
70
READ ONLY STORE
READ ONLY STORE
READ ONLY STORE
READ ONLY STORE
READ ONLY STORE
SLT CIRCUITS
)II
1.0
0.625
0.5
0.25
0.25
INDEXED
ADDRESSES
30
40
50
60
62
70
8
16 (8 ADDER)
32
64
64
64
30
30
30
10
10
5
-
RELATIVE INTERNAL PERFORMANCE
30
40
50
60
62
70
CIRCUIT DELAY
PER LEVEL, ns
CYCLE
MODEL
MODEL
MODEL
WIDTH BITS
EXCLUDING PARITY
FIXED
Fi gure 1. IBM System/360 Architecture.
comes more pronounced when more functions are to be performed.
Therefore,
ROS units showed a method of maintaining full compatibility by allowing complex function even in the smallest
systems.
2. Flexibility., such as the implementation
of compatibility features with other IBM
systems. A unique flexibility is achieved
by being able to add to the control section of the computer, when implemented
by a ROS, without significantly affecting
the remainder of the system. One result is to allow certain System/360
models to operate as another computer,
such as the 1401, by adding to but without redesigning the system. This ROS
concept can be combined with software
VARIABLE
FIELD LENGTH
GENERAL REGISTERS
(16 It 32)
INTERNAL PERFORMANCE
1
3
10
20
30
50
I
I
I
FLOATING
POINT
FLOATING POINT
REGISTERS (4 x 64)
LOCAL STORE
MODEL
TYPE
30
40
SO
60
62
70
MAIN STORE
CORE ARRAY
CORE ARRAY
SLT REGISTERS
SLT REGISTERS
SLT REGISTERS
WIDTH BITS
EXCLUDING
PARITY
8
16
32
64
64
64
CYCLE
)jl
2.0
1.25
0.5
-
to offer almost any performance from
pure simulation (no ROS) to maximum
performance (no software). When this
can be done with performance equivalent to the original system, it greatly
simplifies the programming conversion
problem by offering essentially two
computers in one. Note that we have
various technologies for the read-only
memory control, including card capacitor, balanced capacitor, and the transformer approach. Certain choices were
made for initial implementation, but it
should be made clear that the choices,
especially in the slow speed unit are not
critical and that more than one technique could be used to give the desired
performance.
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
The concept of architectural compatibility
was carried a step further in the input/output area by the decision to attach the various
units through the same electrical interface.
The major reasons for this decision were the
added flexibility to the IBM customer, and
the greatly reduced number of different
engineering and manufacturing efforts
which would be involved in producing both
the System /360 I/O channels and the many
I/O units.
Other decisions concerned reliability and
maintainability. The primary improvement in
reliability involved the advantages of SLT
over SMS and obtaining more performance
from a given number of components by using
high-speed circuits and storages. An objective
in maintainability was to have hardware and
software not only detect failures, but to localize them to small areas, such as five specific
small cards. A programming system was
created which takes the machine logic, analyzes
it, and automatically produces a set of programs with the proper patterns and expected
results for that specific logic. These fault locat-
207
ing programs (FLT's) are then capable of
being entered to the appropriate computer
(Model 50, 60/62, or 70) and executed with
the assistance of special hardware. This hardware allows setting up the proper patterns in
the various registers, advancing the clock a
controlled number of cycles, logging these registers into main storage, and comparing the
actual versus expected results. The programming system allows for updating these FLTs
with engineering changes, and they offer a
powerful diagnostic tool in localizing failures.
SYSTEM/360 MODEL 30
Model 30, the smallest member of the System/360 line, was designed for the market area
currently served by the IBM 1401,1440, 1460,
and 1620. The design objective was, of course,
complete function and compatibility with other
System/360 models. However, System/360
architecture includes 142 instructions, decimal,
binary, and floating-point arithmetic, complete
interruption facilities, overlapped channels,
storage .protection, and "other features normallyfound in more expensive computers.
This made maintenance of full compatibility,
Figure 2. Solid Logic Technology (SLT).
From the collection of the Computer History Museum (www.computerhistory.org)
208
PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964
Figure 2b.
,
•
Figure 2c.
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
while achieving the required cost and performance objectives, a very difficult task.
Therefore, a number of different hardware
configurations were examined. Data paths
and storages from four to sixteen bits. wide;
table lookup addition and logical operations;
registers built in SLT hardware, in main storage and in a separate high-speed storage, were
studied in detail to arrive at a configuration
that would meet the cost performance objectives. The availability of the 2-p,sec. storage
at an acceptable cost was also established at
this tim:e and became an overwhelming factor
in our; selections.
The, key engineering decisions which established Model 30 characteristics were:
1. Selection of an 8-bit wide, plus parity,
2-,p,sec storage unit, 64K bytes maximum,
2. Use of 8-bit wide, plus parity, data path,
3. Use of 30-nsec SLT circuits,
4. Implementation of "local storage" in
main storage,
5. Use of a 1-p,sec ROS for control,
6. Provide· integrated, time-shared, multiplex and selector channels,
7. Provide for complete processor growth in
in a single frame,
8. Utilize "byte" packaging to facilitate
checking and fault location.
"By-te" packaging means placing all the circuits required for eight data bits plus parity
on one pluggable small card. This concept,
along with micro-program diagnostics, permits fault location to a resolution that averages
five small cards.
A simplified data-flow diagram ,of the Model
30 processor is shown in Figure 3. The fourteen 8-bit registers, with the 8-bit AL U and
8-bit storage, provide all the functions for the
CPU and multiplex channel. Additional registers ,or buffers are required for selector
channe.ls,: direct control, interval' timer, and
storage-protection features.
Figure 4 is a map of local storage, which is
actually an extension of the main storage unit.
The first 256 bytes are utilized by the processor
for general registers, floating-point registers,
interim storage during multiply time shar-
209
ing, and other scratch-pad functions. The
second 256 bytes are used by the multiplex
channel for channel-control words. Additional
control-word storage is available in larger
storage sizes.
The multiplex channel time shares the processor registers by interrupting the microprogram, holding the return address, storing the
registers in local storage, and then at completion of the I/O operation, restoring the registers to their original state. This is analogous
to a macroprogram interrput.
TIMING AND ROS CONTROL
The basic timing is established by the 1-p,sec
read-only storage. The main storage provides
the read-write cycle in 2 p,sec and a read-compute-write cycle in 3 p,sec. Within the 1-p,sec
ROS cycle are four 250-nsec timing pulses.
The read only storage in the Model 30 is a
card capacitor ROS containing a maximum of
8,000 words with 64 bits per word.
The capacitor ROS consists of a matrix of
drive lines an<;l sense lines with capacitors at
the intersections where a one is required, and
no capacitors at those intersections requiring
a zero. The voltage change on a drive wire
will cause capacitive current to flow in those
sense lines which are coupled to that particular drive wire by a c,apacitor. In the card
capacitor store, the 64 sense lines and one plate
of each capacitor are printed on an epoxy glass
board bonded to a sheet of dielectric material.
The drive wires and the other plate of the
capacitors are printed on mylar cards (program cards) the size of a standard IBM card
(see Figure 5).
Each program card is punched with the information pattern specified by the micro-code
and contains 12 ROS words. A capacitor plate
is punched out at an intersection where a zero
is to be stored; thus, an unpunched card will
give all ones and punching a hole gives a zero
at that bit in the word. Microprogram changes
can be made by inserting new program cards.
MICROPROGRAM
A single microprogram instruction can initiate a storage operation, gate operands to the
From the collection of the Computer History Museum (www.computerhistory.org)
210
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
ALU
A
TOW
TOX
TIC
B
B
NEXT ROS
ADDRESS
CN
ROS
BRANCH
CL
ROS
BRANCH
CH
STAT SET
CS
SOURCE "A"
CA
"A" INPUT CTL
CF
SOURCE "B"
CB
"B" INPUT CTL
CG
STORAGE
DATA
BUSS
STORAGE INHIBIT BUSS
CONSTANT
CK~--.
CARRY CTL
CC
TIC ALU CTL
CV
oEST.
"0"
CD
STORAGE
ADDRESS REG.
INH SELECT
CM
LOCAL/MAIN
DEST. CTL.
CU
FROM
U V
X BUSS
WBUSS
Figure 3. System/360 Model 30 Data Flow.
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
4;: 6
211
INTP STATUS
0
OX
LOCAL
STORAGE
I
2
G.P. REG
3
0
lX
I
2X
2
7
f
I
X
8
9
I x+llx+2
10
II
12
13
14
IS
FLOATING POINT REG. 0
0111213141S1617
FLOATING POINT REG. 2
10SO USE
'3X
3
4X
4
FLOATING POINT REG. 4
SX
S
16 I 17 I 18 I 19 120 121 122 123
6X
6
FLOATING POINT REG. 6
7X
7
24 I 2S I 26 I 27 I 28 I 29 I 30 I 3 I
8X
8
9X
9
lOX
10
8 I 9
o
110 III 112 113 I 14 liS
1 L i S I v lUi
G1 J
I
....
....
....
....
K ADDRESSABLE LOCAL
STORAGE BYTES, FOR
TEMPORARY STORAGE OF
INSTRUCTION COUNTER,
STORAGE PROTECTION
MASK, SELECTOR AND
MULTIPLEX CHANNEL
INFORMATION, CONDITION
REGISTER, ETC.
I
l
T
IIX
II
12X
12
13X
13
14X
14
ISX
IS
OX
1)(
MPX
STORAGE
CPU STORAGE DURING MPLX
SHARE
UNIT CONTROL WORD
I
CPU WORKING STORAGE
0
UNIT CONTROL WORD 16
1
17
2X
2
18
3X
3
19
4X
4
20
SX
S
21
6X
6
22
7X
7
23
8X
8
24
9
2S
9X
In
1:~~I~------------------i-j----r------------------~-~--~
,),c,
Figure 4. Model 30 Local Storage,.
ALU input registers, select the ALU function,
store the result in the destination register, and
determine the next micro-word to read from
read-only storage. Each ROS. word is decoded
to operate the gates and control points in that
system. A brief description of the branch,
function and storage-control fields in each ROS
word follows.
Branch Control
The branch control fields provide the address
of the next ROS word to be executed. A ROS
address is a 13-bit binary number. Normally the branch control group provides
only 8 bits (leaving 5 bits unchanged) of
next address information. Of these 8 bits,
the low-order 2 are called "branch" bits and
the remaining 6 are called "next address" bits.
The 6 "next address" bits are specified directly
in a 6-bit field. The two "branch" bits are
speficied by two 4-bit fields. These two fields
are decoded and used in masking and extracting machine conditions and status conditions
contained in data-flow registers G and S.
Another 4-bit branch control group provides
the function of setting several variables to desired values for later use in microprogram
branching.
From the collection of the Computer History Museum (www.computerhistory.org)
212
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
MYLAR PROGRAM CARD
~:iE CAPACITANCE
EPOXY GLAss
SENSE PLANE
CONNECTION
TABS
Figure 5. Model 30 Card Capacitor Read-Only Storage.
In summary, every ROS word provides a
branchjng ability. The branch can be 4-way,
2-way, or I-way (simple next address). A
partial next address is normally used, but provision is made for obtaining a full 13-bit next
address when required. The total length of
the branch control group is 18 bits, plus 1
parity bit.
Function Control
The function control· group is subdivided
into four fields: source A, source B, operation,
and destination.
1. Source A (CA)
This 4-bit field selects one of the 10- hardware registers to be gated to the A input
of the ALU.
2. (CF)
The 8 data bits from a register can be
presented to the A input "straight" or
they can be presented "crossed." The
term· "crossed" means that the high-
order four bits of the source register
enter the low-order four bits of the ALU,
and the low-order four bits of the source
register enter the high-order four bits of
the ALU. The A input can further be
controlled by presenting all eight bits,
the low-order four only, the high-order
four only, or none, to the ALU.
3. Source B (CB)
This 3-bit field selects one of three registers to be presented to the B input of the
ALU. The B input is the "true/complement" input and has HI/LO controls
but no straight/crossed controls.
4. (CG)
This field controls the gating of the B
input to the ALU. That is, the low-order
four bits only, the high-order four bits
only, all eight bits, or none of the eight
bits of B may be presented to the ALU.
5. Constant Generator (CK)
This field is gated to the B buss, main
core STAR and ROSAR, thus providing
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
a source for constants, mask configurations, and address constants.
6. Carry (CC)
This 3-bit field controls carry in, AND,
OR, EXCLUSIVE OR functions and permits the setting of carry out into the
carry latch, if desired.
7. True / Complement & Binary / Decimal
Control (CV)
This 2-bit field controls the true/ complement entry of the B input to the ALU,
also whether the operation is decimal or
binary.
8. Destination (CD)
This 4-bit field selects one of the 10 hardware registers to receive the output of
the ALU. A given register may be used
both as a source and as the destination
during a single ROS cycle.
In summary, the Operation group specifies
one of ADD binary, ADD decimal, AND, OR,
or EXCLUSIVE OR. It also specifies true
or complement; 0 or 1 carry input; save or
ignore resulting carry; use true/complement
latch; and use carry latch.
Storage Control.'J
(CM) (CU)
These two fields control core storage operation. Either main storage, local storage, or
MPX (for I/O) storage can be addressed for
storage read-write calls; five values of CM
are used to specify the address register to be
gated to STAR.
An example of the sequence of ROS control,
Figure 6, shows an ADD cycle using a simplified data flow.
Step i-As the routine is entered, the con-
tents of UV are gated to the STAR, a read call
is issued to main storage, and register V is
decremented by 1.
Step 2-The A-field data is regenerated in
storage and the A-field data byte is transferred
from register R to D.
Step 3-The contents of IJ are gated to STAR,
a read call is issued and J (lower -4 bits) are
put in Z.
Step 4-Z is tested for 0 to set up the branch
condition at the next step, the B-field data byte
213
is read out (R) to the adder, as are the Dregister contents (A-field data) and the carry
from a previous cycle. The output (Z) is gated
into R.
Step 5-If the zero test of A in Step 4 is true,-a
write into the B field is performed (the address
is still in STAR), J is decremented by 1 and the
routine is repeated. If the zero test of Z in step
4 is false, then' a branch is made to the write
call and the routine is exited.
As an example of Model 30 microcoding
efficiency, the execute portion of a fixed-point
binary add uses approximately 20 words. However, the add can be combined with 13 additional operation-code executions, such as subtract, AND, OR, EXCLUSIVE OR, etc., using a
total of 45 words. A one-half word multiply
giving a full-word product requires about 95
words. The total floating-point feature, which
utilizes the fixed-point microprograms, requires
approximately 500 words. It should be recognized that the microprogrammer has the choice
of optimizing for minimum words or maximum
performance.
IBM 1401/40/60
FEATURES
COMPATIBILITY
It was a market requirement that Model 30
execute the IBIVI 140.1-40/60. instructions directly. Further, it was desired to provide these
features without disturbing the Model 30 design, which was optimized for System/360 requirements. As a result, these features are
provided by an addition of only four circuit
cards plus extensive microprograms.
The general approach utilizes the following:
1. System/360 input-output devices,
2. Conversion tables in local storage,
3. Microprogram decoding and execution of
the instructions directly.
The internal performance is several times the
1401, based on a typical mix of instructions
found in 1401 programs. For individual instructions, however, the speed ratio varies
widely.
Method
The internal code used in Model 30 for the
compatibility feature is EBCDIC and, further,
Model 30 has a binary-addressed storage. Thus,
From the collection of the Computer History Museum (www.computerhistory.org)
214
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
ALU OUTPUT BUSS (Z)
U, V - ADDR. A FIELD
I J - ADDR. B FIELD
ALU
A INPUT
OP - ADD
END-LOW 4 BITS OF
J-EQUAL 0
BINPUT
SENSE OUT
INHIBIT
CORE
STORAGE
STAR = UV
RD
V = V- 1
I
READ
1/
I
WRITE
I
I
I
WR
D=R
I
~
V
I
I
I
~
~
I
STAR = IJ
RD
Z'" J (L)
R=R+D+C
I
~
I
I
I
I
I
I
V
I
I
I
I
'{I
Figure 6. Model 30 Read-Only Storage ADD Cycle.
a certain amount of translation of character
codes and conversion of numbers from decimal
to binary radix, and back again, takes place
during processing. These conversions and
translations are accomplished by table look-up,
using the tables in local storage. These tables
are read into storage as part of the special load
procedure required prior to execution of a 1401
program. The table used to translate the
EBCDIC to BCD requires 128 characters. It is
used ·during the execution of 1401 instructions
such as bit test, move zone and move numeric,
which depend on the actual bit coding.
A 72-byte table is required to hold the constants used to convert the 1401 decimal ad-
dresses to binary addresses and the converse.
This conversion takes place at execute time,
hence the programs can operate correctly no
matter what methods are used in the 1401 program to generate or modify addresses. Each
conversion requires from 6 to 12 microseconds
depending on the value of the address.
Other functions which utilize tables are op
decode and I/O device address assignment. Additional areas in local storage are used for hardware register back-up, sense switches, system
specification, and working area. While most of
the available 512 bytes of local storage are used,
no main storage is used. Hence, an 8K 1401
program will run on an 8K Model 30.
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
Each input or output device type has an individual microprogram routine. Most devices are
attached to the Model 30 multiplex channel.
Magnetic tapes are also available on selector
channels. The throughput for the compatibility
feature requires an analysis of the particular
I/O devices used and the particular program
being run. However, in every practical case it
will exceed the system being simulated.
SUMMARY
The Model 30 effort has helped to prove two
points:
1. Small systems, with the extensive function and facility of the largest systems,
are practical.
2. The microprogram control is sufficiently
general that a good system design can be
used to simulate a wide variety of architecture.
SYSTEM/360 MODEL 40
The performance of Model 40 is approximately three times that of Model 30. To attain
this performance at minimum cost, four major
design decisions were important.
1. The adoption of a 16-bit-wide storage of
2.5 p'sec together with a basic 16-bit wide
data flow; implemented in 30-nsee SLT
circuits, provided an optimum configuration.
2. A 0.625 p.,sec read-only storage as a means
of control for all CPU functions was used
because this technique offered significant
advantages in cost, flexibility, and design
freedom, compared with more orthodox
. control systems.
3. The inclusion of a 1.25 p., sec local storage
of one hundred forty-four 21-bit words to
provide general register storage. This
resulted in a considerable reduction in
accesses to main storage.
4. By using the local storage to preserve the
contents of the CPU during channel operations, much of the CPU data flow can
be used for channel functions, thus considerably reducing the cost of channels.
MICROPROGRAMMING
Microcoded programs, physically residing in
permanent form in a transformer read-only
215
storage (TROS), form the heart of the control
section of the Model 40. To reduce the resulting
physical changes associated with changes in the
microprograms, the microprogram design is
automated and debugged before actual physical
implementation by means of an IBM 7090 programming system, the Control Automation System (CAS). CAS is utilized not only by Model
40, but by all the System/360 models using ROS
control.
The basic input to the system is a logic sketch
page produced by the microprogrammer. The
separate micro-instructions are written on this
page in a formal language, TACT. When this
initial writing phase is complete, the page is
transcribed into punched cards. The control
program for the 7090 is directly derived from
the Model 40 control signal specification and
acts as a set of inviolable rules. Within this
framework, and using associated established
microprograms for reference where necessary,
the 7090 simulates the Model 40 and attempts
to run the microprogram using submitted data.
Errors or violations are detected, stop the program, and cause a diagnostic analysis routine
to be entered.
Facilities are available for various printouts
to provide for analysis and subsequent correction.
The debugged microprogram, in the form of
magnetic tape, is submitted to assignment
checking. This operational phase checks the
manual assignment of the absolute address given to each ROS word, and produces listings giving the absolute address and binary bit pattern
of each assigned ROS word. Two decks of cards
are also produced and used in the production
and testing of the read-only storage.
An output of the CAS program is a fullychecked and redrawn version of the original
logical sketch page. If microprograms are subsequently updated, a revised CAS page is automatically printed on receipt of change.
TROS
The finally-debugged microprogram is translated into a series of micro-instructions, held in
read-only storage.
In Model 40, this takes the form of a transformer read-only storage-TROS. TROS is
From the collection of the Computer History Museum (www.computerhistory.org)
216
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
made up of 16 modules, each containing 256
ROS words (micro-instructions) to make a
total capacity of 4096 words. Each module is
made up of 128 tapes, each tape containing two
words. The word tape carries two ladder networks, each of which, after modification, holds
the bit pattern derived from a specific microinstruction.
The tapes are stacked in a module, as shown
in Figure 7, with transformers inserted
through the prepared holes in the tapes. These
54 transformers are in the form of U and I
cores. The I cores carry the sense windings.
Each stage of the ladder network corresponds
to one bit position of the ROS word. Whether
the bit is a 1 or a 0 is determined by breaking
the current path on one side of the ladder with
a punched hole through the printed wiring, so
that the current then either passes through the
core for a 1 or bypasses it for a O.
Signals are taken from the tapes to the sense
amplifiers, the outputs of which are used to set
54 latches.
The total cycle time of the TROS and the
basic cycle time of the CPU are both 625 nsec.
Normally, there are four 625-nsec TROS cycles
for each 2.5-/1 sec main storage cycle.
It is possible, by the inclusion of a feature
which adds additional ROS modules, to simulate
other equipment, such as the 1401 or the 1410.
This enables programs written for these machines to be run on the Model 40. Model 40
implements these features by microprogramming and conventional programming; Model 30
utilizes microprogramming exclusively. For example: the input/output and edit commands
in the 1401 simulated on the Model 40 are executed with System/360 programming, while
most of the remainder of the 1401 instructions
are microcoded.
The primary reason for adopting this approach in the Model 40 is to reduce the added
TROS requirements and the ensuing packaging
problems and costs.
DATA FLOW
Figure 8 is a schematic representation of the
Model 40 data flow.
The data flow may be divided into sections
characterized by their data-handling capabilities:
1. one-byte arithmetic handling (8 bits plus
parity) ,
2. one-byte local storage addressing,
3. two-byte storage addressing and data input/output referencing,
4. one-byte service for the channel data and
one-byte service for flags related to the
channel interface,
5. two-byte data flow to and from local
storage.
One-Byte Data Flow
One-byte arithmetic handling is performed
by the arithmetic and logical unit (ALU). The
ALU is a one-byte-wide adder/subtracter which
operates in either decimal or hexadecimal mode.
It is capable of producing both arithmetical
and logical combinations of the input data
streams and is checked by means of two-wire
logic, where one true and one complement signal is expected on each pair of wires.
Figure 7. System/360 Model 40 Transformer
Read-Only Storage.
Data bytes are fed to the P and Q busses
from the associated registers or from the emit
From the collection of the Computer History Museum (www.computerhistory.org)
217
IBM SYSTEM 360 ENGINEERING
LOCAL STORAGE
field of the current micro-instruction. The data
are then manipulated by the ALU in accordance
with the content of the ROS control word.
Several instructions may have common microprogram subroutines in which the difference
lies only in the ALU function. One of 16 different ALU functions is preset by a 4-bit field
in a ROS control word executed before branching to the common subroutines.
This is a small, high-speed storage which provides registers for fixed and floating-point operations, channel operations, dumping of CPU
working register contents, interrupts, and for
general working areas. Only the fixed and floating-point locations are addressable by the main
program.
Two-Byte Data Flow
Data transfers between local storage, channel
registers, CPU registers and main storage are
carried out in two-byte steps.
The ferrite local store contains 144 locations
allocated as shown in Figure 9. Each location
is 21 bits long. Addressing is completely r~n­
dom and the unit may be split-cycled with reador-write operations in any sequence. A read-
STORE
PROTECT
MAIN STORE
2.5 PSt CYCLE
1881TS
~
BUMP
Q BUS (9 BITS)
1
P BUS (9 BITS)
DIRECT CONTROl
INPUT
REGISTER 0-7
.-J
DIRECT IN
·1
DIRECT OUT
DIRECT CONTROL
OUTPUT
REGISTER 0-7
I
I
Pt O-17
A
REGISTER
Pt O-15
B
REGISTER
ARiTHMETiC
I
I
AND LOGIC UNIT
Pt O-7
I
I
Pt O-17
C
REGISTER
-
Pt O-15
D
REGISTER
I
R BUS (21 BITS)
MPX OUT
SELECT IN
SELECT OUT
1
INCREMENT
SELECTOR
CHANNEL DATA
REGISTER P,O-7
,.
..
MPXIN
I
H
J
P,O-7 P,O-7
LOCAL
STORE
ADDRESS
REGISTER
P,O-7
-
P,O-17
R
REGISTER
READ ONLY
ADDRESS
REGISTER P,O-II
LOCAL
STORE
144 WORDS
21 BITS
READ
ONLY
STORE
P,O-52
Figure 8. Model 40 Data Flow.
From the collection of the Computer History Museum (www.computerhistory.org)
-
CONTROL
218
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
000
064
066
067
WORK
AREA
PROGRAM
STATUS
WORD
071
072
073
009
WORK AREA IN
UNDEFINED STATE
WORK
AREA
AND
LOG OUT
AREA
START 1/0 SWITCH
FLOATING
POINT
REGISTERS
1 ST LEVEL
DUMP AREA
015
016
031
032
SELECTOR CHANNEL 1
UNIT CONTROL
WORD
037
038
SELECTOR
CHANNEL 1
BUFFER
MULTIPLEX CHANNEL
WORD AREA
041
042
FIXED
INTERRUPT BUFFER
043
UNASSIGNED
047
III
048
112
POINT
SELECTOR CHANNEL 2
UNIT CONTROL
WORD
053
054
SELECTOR
CHANNEL 2
BUFFER
UNASSIGNED
056
057
2ND LEVEL
DUMP AREA
063
127
ADDRESSES IN DECIMAL
Figure 9. Model 40 Local Storage.
From the collection of the Computer History Museum (www.computerhistory.org)
REGISTERS
IBM SYSTEM 360 ENGINEERING
or-write cycle requires 625 nsec. A complete
read-and-write cycle therefore takes 1.25 p.. sec.
One example of use of the local store is the
double-dump routine executed under certain
types of channel operation. If the machine is
currently in CPU mode and a multiplex channel
interrupt occurs, all data relevant to the current
CPU operation are dumped in the local storage
first-level dump area. If, subsequently, a selector channel interrupt occurs, all data relative
to the multiplex service are dumped, and the
selector channel is serviced. When all selector
channel operations are complete the multiplex
data are restored and multiplex servicing is
continued. Similarly, when mUltiplex servicing
is complete, CPU data are restored and CPU
operation is resumed.
MAIN STORAGE
The main storage array (2.5-p.. sec cycle, 1.25p.. sec access) of the machine is divided into two
logical sections. These are the true main storage, and a special area of 256-2048 bytes,
called the bump storage. The bump storage is
used to hold channel control words used in multiplex channel operations and is accessible by
the microprogram only.
CHANNELS
Multiplex Channel
The multiplex channel is an extension of the
CPU in the sense that .the regular CPU data
flow and microprogram are used for all data
transfers. This channel, which allows a number
(maximum of 128) of relatively low-speed units
to be operating simultaneously, normally scans
all attached control units continuously. When
a device reaches the point where it needs to
send or receive a byte of data, its control unit
intercepts the first available scanning signal
and transmits the unit address to the CPU. The
CPU data flow is then cleared and retained in
the local store using the dump routine. After
the byte transfer has been completed, the control unit and device disconnect from the channel, permitting scanning of other devices to be
resumed, and the CPU processing to continue.
Selector Channels
Two types of selector channels are available
on Model 40, the A channel and the B channel.
219
They differ in that the CPU interference caused
by I/O operations on the B channel is approximately one third of that caused by the A
channel.
The A channel time-shares the CPU data flow
and microprogram to a high degree. Data bytes
are never transferred directly between main
storage and the interface busses, but move via a
16-byte buffer in the Model 40 local storage.
The transfer between interface and buffer is
conducted serially, by byte, at the rate dictated
by the I/O device. Each transfer causes the
microprogram to hesitate one 625-nsec cycle.
When the 16-byte buffer is half full, the channel
requests the use of the microprogram and CPU
data flow. Up to 12 microprogram cycles are
required to preserve the current contents of the
data-flow registers and to load the control word.
The 8-16 bytes in the buffer are transferred 2
bytes per storage cycle to main storage as a
block, at a rate of 2 bytes every 2.5 microseconds
The B Channel is more conventional; it uses
SLT hardware and does not use the local storage as a buffer. Data is transferred to main
storage as soon as two bytes are accumulated.
Interference is basically constant at 1.25 p'sec
per byte.
SYSTEM MAINTAlNABILITY
One of the more unique features of Model 40
maintenance hardware is the use of the readonly storage as a source of diagnostic routines.
One module of the TROS contains a complete
set of tests to validate the CPU, local and main
storages. These tests are automatically applied
each time the system reset is operated to ensure
an operational machine.
SYSTEM/360 MODEL 50
The performance range of System/360 Model
50 is approximately ten times the Model 30.
A review of the following key engineering
decisions will highlight the distinguishing characteristics and engineering achievements of
Model 50.
1. The 30-nsec family of SLT circuits is
used.
From the collection of the Computer History Museum (www.computerhistory.org)
220
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
2. A small, 0.5 p,sec, local core storage contains the general purpose and floatingpoint registers.
3. A relatively low-cost 2.0 p,sec. main storage is used.
4. The data paths, local storage and main
storage are" each 32 bits wide (plus 4
parity bits) .
5. A 0.5 p,sec read-only storage provides sequence control throughout.
6. Both selector and multiplexor type channels are provided to cover a wide performance range.
7. The CPU and channels utilize common
hardware, yet maintain substantially
overlapped operation.
8. The CPU, channels, and storage are
packaged in a unified structure.
CENTRAL PROCESSING UNIT
The first four decisions are highly interdependent and were reached somewhat simultaneously to produce a system in the right cost and
performance range, utilizing components that
would be available at the right time and that
would fit together in a good physical and speed
relationship. These components were selected
on the basis of good performance per-unit-cost
and a balanced design, rather than for performance alone.
The choice of the 30-nsec family of circuits
allowed an internal clock cycle from register
through adder and back into register of 500
nsec. This family of circuits provides the combination of compact packaging, good fan in-fan
out ratios, and low power dissipation, with a
nominal delay per stage of 30 nsec.
.,,1__-
Figure 10. Solid Logic Technology 30-nsec AOI Circuit.
isters, the main storage, and the local storage
are all 32-bits wide plus 4 parity bits. An
auxiliary 8-bit data path through a logical
processing unit, called the mover, is provided
for processing variable field length information. This path allows a byte to be selected
from each of two working registers under control of two byte counters. The two bytes can
then be combined in a variety of logical functions and returned to one of the working registers. With the mover, decimal operands are
first aligned in the working registers, then
processed arithmetically 32 bits at a time using
the main adder.
A typical 30-nanosecond SLT module is illustrated in Figure 10, together with its circuit
diagram. Such a module has a power dissipation of approximately 30 milliwatts and allows
a fan-out factor ef 5 and a fan-in to the OR of
5. The fan-in to the AND is limited only by
packaging considerations.
The main 2-p,sec storage unit for the Mod 50
is approximately 32" x 14" x 26"; a reduction
by one-third the physical size of the 7090 storage (Figure 12). It is available in capacities
of 64K, 128K, and 256K bytes (8 bits plus
parity). Bump storage which is part of main
storage is used to hold channel control words
for multiplex channel operations. Bump storage area of 1024 to 4096 bytes is accessible only
by microprogram control and not by the problem program. The use of a combined senseinhibit line allowed a three-wire ferrite core
plane which can be machine wired.
Figure 11 illustrates the basic Model 50 data
flow. The main adder path, the working reg-
The local storage contains 64 thirty-six bit
words (32 plus parity) and has a read-write
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
cycle time of 500 nsee. This ferrite core storage
unit provides working locations for the CPU
and channels, as well as the general and floating point registers. Regeneration from either
the L or R register allows a swap of information between the CPU and local storage on a
single cycle.
221
organizing influence on the design procedure.
It not only forced a centralization of all controls, but· also forced an early definition of all
gate signals thereby allowing the design to proceed independently in several areas of the machine. Additional benefits resulted from the
use of the Control Automation System (CAS).
This system not only provides the necessary
record-keeping and generation of manufacturing information for the read-only storage, but
also provides documentation of the instruction
READ-ONLY. STORAGE
The decision to use a read-only storage for
sequence control produced a great unifying and
--L
L2
=+~
I
L.S. ADDRJ
REG.
i
LOCAL STORAGE
32 PLUS PARITY
+
+
-A.
+
~
~I II
II
+
A~~;..
iii
STO!AGE
~~:.
1
t
I
i
i
I
L REG.
i iI IIIi I~I
Ii i
I
R REG.
-r -
T
MD
i
1M REG.
F
-tT-
L..:l_T
I
I
I
f COUNTER
I I
i
I
.r:-
i
I
i
H REG.
I
FROM
MULTIPLEXOR
CHANNEL
TRUE/
COMPLEMENT
GATE
I
-.i
I
X
STORAGE
DATA REG.
32 PLUS
PARITY
I
+
SHIFTER
MOVER
8 PLUS
PARITY
+
,
I
I
LATCH
-
..L
I
I
I
~-
I
~
~
ADDER
32 PLUS PARITY
MAIN
STORAGE
TO
SELECTOR
CHANNELS
I
1
l---i..
TO
MULTIPLEXOR
CHANNEL
LATCH
I
+
FROM
SELECTOR
CHANNELS
Figure 11. System/360 Model 50 Data Flow.
From the collection of the Computer History Museum (www.computerhistory.org)
222
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
drive lines running vertically and the sense
lines horizontally. Approximately a 5-inch
pound torque is applied to a system of pressure
pads to maintain constant pressure between the
two plates.
Thus, switching on the array driver provides
a change in voltage that is capacitively coupled
from the drive line to the sense line. For impedance-matching purposes in the sensing circuits, a balancing line is used in conjunction
with each drive line. The appropriate location
of the bit tabs determines whether the signal
received by the sense amplifier is a 1 or a O.
Figure 12. Physical Comparison Model 50 and
IBM 7090.
sequences and allows complete simulation of
these sequences before producing hardware.
The balanced capacitor technology is used in
the read-only storage unit of the Model 50. A
bit plate contains the information content of
the storage in the form of tiny tabs attached to
a long electrical drive line (see Figure 13).
These are etched on a glass epoxy plate by a
process similar to that from which printed circuit cards are manufactured. This bit plate is
covered by a sheet of I-mil mylar which forms
the dielectric of the capacitor. The bit-plate tab
forms one plate of this capacitor and a sense
line running orthogonally to the drive line
forms the other plate.
The sense lines are also etched lines. A pair
of these lines form inputs to the base and
emitter of a differential amplifier at one end
and are terminated to ground at the other end.
One sense plate contains 400 parallel sense
lines.
The bit plates lie over the sense plates and
are separated by the I-mil mylar, with the
The read-only storage of the Model 50 contains 2816 words of 90 bits each. It has a cycle
time of 500 nsec and an access time of 200 nsec.
To allow the result of one cycle to immediately
influence the choice of the next, two words are
read from storage simultaneously, and a choice
between them is made on the basis of the result
of the first cycle. The chosen word then controls the second cycle. This technique allows
the same speed to be'maintained as if sequential
logic circuit controls were used.
CHANNELS
A wide range of channel performance is provided in the Model 50 by the inclusion of two
quite different channel designs. Both types of
channels allow a substantial overlap with CPU
operations.
The multiplexor channel provides concurrent
operation of multiple low to medium speed devices. It makes extensive use of CPU hardware
and contains relatively little hardware of its
own. (Example: Local storage and some CPU
registers are used as working locations.) The
control words for each operating device are
held in an extension to main storage called
bump storage. As each byte is handled, the
channel takes control of the CPU and obtains
the required control word from bump storage.
The CPU registers required for the operation
are dumped into local store. The multiplexor
channel can also operate with a single higher
speed device in a "burst" mode. In this mode
the control word is held in local storage to speed
the operation, but bytes are still handled one
at a time.
From the collection of the Computer History Museum (www.computerhistory.org)
223
IBM SYSTEM 360 ENGINEERING
{
TO
TO TERM.
RESISTOR
+6 VOLTS
a
r
TOTERM.
RESISTOR
+6 VOLTS
a
TO TERM.
RES. GND.
a
r
SENSE!
AMP
(A)
TO TERM.
RES.
GND.
-=-
B( + PADS
a
TO TERM.
RES. GND.
a
TO
SENSE
AMP
(B)
:?~E,~~EJL-'
~Mr-. \'"
1
y
TO ARRAY
DRIVER
(X)
TO TERM.
RES. GND.
a
DRIVE
BALANCE
LINE
1
TO ARRAY
DRIVER
(Y)
r~~~~E
TO ARRAY
DRIVER
.ZERO"/lJ------- O-~
BIT
TO ARRAY
DRIVER (X)
DRIVE BALANCE
LINE
(Z)
Figure 13. Model 40 Capacitor Read-Only Storage.
High speed I/O devices use the selector channels (up to three are attachable) which can
handle 8-bit data up to an 800 KC byte rate.
These channels contain sufficient hardware to
assemble a full word of data before requiring
the main storage or CPU facilities. Controlword information is retained in hardware instead of local storage. CPU facilities are used
only when transferring a word to storage or
chaining between I/O commands, resulting in
a much higher maximum data rate than the
multiplexor channel.
An additional selector channel is available
which operates in a lockout fashion a;nd is
capable of operating at a data rate up to 1.3 mc.
MAINTENANCE
Four decisions stand out in the maintenance
area. They are:
From the collection of the Computer History Museum (www.computerhistory.org)
224
PROCEEIHNGS-F ALL JOINT COMPUTER CONFERENCE, 1964
1. The decision to include a hardware Log
In-Log Out system for the execution of
fault location tests (FLT's). An integrated approach (using the read-only
storage to control FLT sequencing and log
paths, and using existing hardware where
feasible) reduced the cost from being excessive to being defendable, and in addition provided better fault resolution than
conventional diagnostics. The FLT's can
provide fault localization to within a few
small cards, on the average, and have the
additional advantage of being automatically produced and updated. The Log
Out system is also used with the error
checking circuits, to provide a complete
snapshot of internal status at time of
error.
2. Use of a Progressiive Scan technique
provides a reasonable means of running
fault-location tests on channel hardware; with resultant improved fault
resolution, thus the entire processor
can be examined with high resolution,
non-functional test.
3. The DIAGNOSE instruction allows the
initiation of any micro instructions,
in any order. This permits integration
of the control of various maintenance
techniques into the read-only storage.
This instruction gives the diagnostic
programmer a power tool.
4. The decision to allow for single-man
servicing. This is made possible by
bringing all controls and indicators
together on a single system control
panel which is usable in two positions;
the normal operating position, and
swung left 1800 so that it faces the principle servicing area (CPU logic, main
storage, and channel hardware). See
Figure 14. The small logic cards, which
are the principle replaceable item, are
easily accessible, with the majority
located on the outer sides of the gates.
Figure 14. Model 50 System Control Panel (Open Position).
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
SYSTEM/360 MODELS 60/62
Models 60/62 are large-scale processors with
performances that are approximately 20 and 30
times that of Model 30.
Basic design considerations are:
1. Main storage speeds of 2 p,sec and 1 p,sec.
2. High-speed circuit family with nominal
delay of 10 nsec per logical block,
3. Local storage in transistors, with 125nsec access and non-destructive read,
4. Read-only storage control of CPU functions at a 250-nsec cycle,
5. 64-bit-wide storage, plus 8 parity bits;
64-bit data flow, plus 8 parity bits.
Models 60/62 attach stand-alone storage and
channel units compared to the integrated designs of the smaller models in System/360. The
entire instruction set is standard. Model 60 is
equipped with a 2-p,sec main storage packaged
in two separate frames. Address interleaving
between the frames increases the effective
speed of storage. The combined capacity of the
storage frames varies and is available in 128K,
256K, or 512K byte sizes. Input-output control
is provided through a channel capable of handling 8-bit data at a lo3-mc byte rate. Up to 6
channels are attachable and capable of operating simultaneously. Information is passed
between storage and channel on a 72-bit, double-word basis, minimizing interference with
operating programs. High-speed I/O devices,
tapes, disks, and drums are primarily intended
for direct attachment to the channel. Slower
speed units, terminals, peripheral readers,
punches, and printers, attach to smaller System/360 Models which, in turn, may be connected to one of the Model 60 channels in a
multiprocessing arrangement.
Model 62 differs from Model 60 only in the
speed and configuration of the main storage.
The 1-p,sec storage associated with this model
is available in 256K-byte, self-contained units.
A maximum of two of these units is directly
attachable, providing 512K bytes of storage.
The addressing is conventional and is accomplished without interleaving.
TECHNOLOGIES UTILIZED
The foundation of the central processing unit
design is a basic building block or module con-
225
taining four logical diodes with load resistors,
one emitter follower and inverter transistor,
plus three control diodes fabricated ona single
substrate. This device houses the primary circuit used almost exclusively in Models 60/62.
Variations of diode and transistor arrangements exist within the same circuit family and
these different module types equip the logician
with a full array of AND-OR design elements.
The basic block has a nominal delay of 10 nsec.
Signal swings vary from + 1 to +3 volts. The
circuits display good noise-rejection characteristics in that worst-case simultaneous switching is tolerable under maximum loading situations of five-way AND's driving five-way O,R's
into a ten-load output. Circuit speed has purposely been compromised to improve driving
and loading capability.
All the circuitry is contained on one module,
with the exception of the two collector resistors
which are located external to the module in a
resistor pack. With the exception of the components outlined, the circuit configuration is
essentially the same as the 30-nsec circuit block.
The local store is not required to accommodate integrated I/O channels, consequently a
favorable trade-off is achieved by structuring it
in transistor registers as opposed to ferrite
. cores. In addition to reduced cost faster speeds
are obtained, since the store is not destructively
read, so regeneration time is eliminated. The
unit contains twenty-:five 36-bit registers. Sixteen are general purpose· registers, eight are
floating-point registers and one is a working
register used for variable field· length control.
Since one or more registers participate in the
execution of each System/360 instruction, the
accessibility and maneuverability of local storage contents is mandatory for good.internal
performance. Any of the registers can be easily
read or modified in 125 nsec, a nice fit for the
250-nsec machine cycle. All CPU manipulations
occur within the 250-nsec machine cycle.
A balanced capacitor read-only storage device, similar to that used by the Model 50, provides logical control for the processor. The cycle
time of the read-only storage is 250 nsec, with
the output being available 100 nsec after select.
Sixteen bit planes of 176 words each provide
a total of 2,816 words. Each word is 100 bits
wide.
From the collection of the Computer History Museum (www.computerhistory.org)
226
PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964
To achieve desired CPU speeds, the read-only
storage is supplemented with conventional control hardware. Approximately 500 control lines
are generated in all, of which 400 emanate from
the read-only storage. Conventional control
logic is utilized primarily where sequencing becomes data-dependent and exclusive read-only
storage control would require additional cycles
to be taken.
LOGICAL ORGANIZATION
The size of the CPU-storage interface varies
in width from a byte on Model 30, to half-word
on Model 40, to a full word on Model 50. In the
Models 60/62, it is double-word wide. A reduction in the number of storage cycles required
fo~ program execution is purchased at the price
of hardware. The data path continues double
word throughout. Significant performance im-
provements in the handling of long precision
floating-point arithmetic and variable field
length operations are achieved.
A 64-bit instruction-buffer register is directly fed from storage and initially receives
all instructions (see Figure l5). Four halfwords are loaded and passed, one half-word at
a time, through a I6-bit extender register to a
I6-bit decoding register where instruction execution commences. As the last of the four
half-words enters the extendor register, storage
is signaled to refill the buffer register. The flow
through these three registers neatly overlaps
instruction fetching with execution and adds
substantially to the internal performance of
the CPU.
A parallel adder, 60-bits wide, is the confluence of the data path. Address calculation,
arithmetic, shifts, register-to-register trans-
TX
LOCAL STORE
25 WORDS
~....----..
.....I--+-....
L
ADDR.
BUS
24
SERIAL ADDER
8 BITS
32
24
32
PARALLEL ADDER
60 BITS
LATCH
LATCH
TO AB, T, D, IC, R, E
TO ST
Figure 15. System/360 Models 60/62 Data Flow.
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
fers, and parity generation and checking are
handled in the adder. It is a high-speed unit
split into four sections, four groups per section,
four bits per group for look-ahead propagation.
Worst-case carries ripple out in 135 nsec. The
entire adder output can be held in a latch
register equipped with gates which can shift
the full sum 4 bits right or left into the latch.
Adder input gates provide for left shifts of one
or two bits. The adder accommodates, in one
pass, the 56-bit fraction arithmetic of long precision, floating-point instructions.
Two 64-bit registers operate in concert with
the parallel adder. Both can be directly loaded
from storage and one provides the store path
to main core. At times they participate in an
action as a 64-bit unit. Frequently they are
treated as separate and independent 32-bit registers (identified logically as A-B, S-T). A
finer subdivision into 8-byte units can also be
effected. Both the A-B and S-T registers have
three-position byte counters controlling byte
movements. By utilizing the 32-bit _working
register in local store, in addition to parallel
adder paths, any combination of register-toregister transfer between, A, B, Sand T can
be made. The base registers in local storage
transmit between the S-T registers. A 24-bit
instruction counter and a 24-bit storage address
register connect to the parallel adder for loading and incrementing.
An 8-bit serial adder draped with extensive
gating is woven into the data flow. It drives a
latching register and is byte fed into the working registers. The many variable field ope:t;ations, including arithmetic, work through the
serial adder as do the logical functions AND,
OR and EXCLUSIVE OR.
To provide maximum flexibility in present
or future high-performance system configurations, physically-independent storage and channel units are employed. One common mu1tip1extype interface serves to permit the attachment
of either the 2360 (2 p.Sec) or the 2362 (1 p.Sec)
storage units to the CPU and 2860 channel.
This interface also provides the mechanism for
the attachment of multiple 2361 large-capacity
storage units. Figure 16 shows that the key to
this interface is the cable that serves as the
conductor for multiple driving circuits feeding
multiple receiving circuits. This permits effici-
227
ent time sharing of the inter-unit data and
address paths under the control of just a few
direct connection signal lines.
SYSTEM/360 MODEL 70
System/360 Model 70 is designed to fill a
marketing need for a very-high-performance
data processing system. It may serve as a direct
functional replacement for any member of the
System/360 family, with which it maintains
complete program compatibility.
The sole constraints on program compatibility are storage size, input/output configuration, and the absence of time-dependent program loops.
The fundamental elements contributing to
Model 70 performance are: high-speed circuitry, interleaved 1-ftsec storage units, and logical
organization.
CIRCUITS AND PACKAGING
To achieve the desired performance, the
Model 70 Engineering team specified a circuit
family which would permit an effective balance
between the basic machine cycle, storage-access
time, and storage-cycle time.
The optimum cycle time was established at
200 nsec. This, in turn, coupled with machinepackaging constraints, dictated a circuit with
a nominal switching time of 5 nsec. The resultant circuit is of the AND-OR INVERT
family. The switching time varies from 4 nsec
and longer as a function of fan-in and loading,
i.e., line length and number of loads. The basic
circuit configuration is similar to the 10-nsec
AOI but differs in component characteristics.
These circuits are packaged in modular form;
the modules are mounted on small cards having
a capacity for either 12 or 24 modules. A substantial number of these small cards are functionally packaged to achieve high density and
relatively short line lengths. With a circuit that
performed well, extreme care had to be exercised, not only in defining the functional cards
but also in card placement. More difficulty was
experienced in controlling delays in transmission than logical delays.
STORAGE
Main storage for the Model 70 consists of two
banks of 1-ftsec storage. Each storage bank has
From the collection of the Computer History Museum (www.computerhistory.org)
228
PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964
STORAGE 1
MODEL 60-2360
MODEL 62-2362
~~~
STORAGE 2
MODEL 60-2360
MODEL 62-2362
LCS 2
2361
;
~~~
LCS 4
2361
LCS 3
2361
~~~
r-:
I
'
I
t-------------.. -------------.--------------~---------- ____•______________ J
I
t--------------,------------r- -------- ---r-------- ------~ ------------,-------------,
I:
:
:
:
:
:
:
!
t-.- -r--T- --:-,-"--r-T- -t"--""'T-"~-r-r- -;--1
!
~~~
LCS 1
2361
I
!
~~~
~~~
!
I
!
I
-~--""--+--+---4--""-.~-:---'---~--~
I
:
:
I i :
I
!
I
I
•
•
!
I
•
I
I
~ DR~
•
I
~DR~ ~DR ~ ~DRa
2
--
SIMPLEX CONTROL LINES
I
I
I
~DR~ ~DR~ ~DR~
5
4
3
6
2860 SELECTOR CHANNELS
2860 SELECTOR CHANNELS
CPU
2060
t
~
~
•
7
LEGEND:
- - - STORE BUS
- - - FETCH BUS
-------
}
MULTIPLEX BUSSES
ADDRESS BUS
DR - DRIVER CIRCUIT
REC -RECEIVER CIRCUIT
LCS - LARGE CAPACITY STORAGE
Figure 16. Model 60 Storage-Channel Interface.
a- capacity of 256 K bytes arranged in 32K double words (64 information +8 parity bits).
Data with even double word addresses are
stored in one bank; data with odd double word
addresses are stored in the other. These two
banks a.re independent of each other, having
exclusive drive schemes and data registers.
When accesses are made to sequential storage
addresses, the storage units operate in an interlea ved fashion.
The two units cannot be accessed concurrently but must be offset by at least 400 nsec, a
restriction imposed by the maximum rate of
the storage bus.
LOGICAL ORGANIZATION
The initial design approach was to accentuate the p~rformance of arithmetic operations.
Enlphasis, however, was brought to bear on
those instructions used heavily in the compiling
function, e.g., load, branch, store, and compare.
A data flow schematic is shown in Figure 17.
Instruction buffering is provided via two
double-word registers, each supplied by independent banks of main storage. Instructions
are executed sequentially and as the contents
of each register is exhausted, it is replenished
from storage during which time the contents
of the second register is being used.
Some degree of instruction overlap is
achieved by operand buffering. A sequencing
unit controls the instruction decoding, effective
address generation, and operand fetching, while
the preceding instruction is being executed. The
sequencing unit is not allowed to operate more
than one instruction ahead of the execution
unit and is not allowed to change any addressa-
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
ble registers while the execution unit is still
operating on the previous instruction. This
eliminates recovery problems in the event that
a branch or interrupt causes the abandonment
of the instruction being worked on by the sequencing unit. Address generation is accomplished through the use of a three-input adder
in one machi!le cycle (200 nsec) ; one input is
from the instruction register, the other two
from the general purpose registers specified by
the instruction.
The heart of the execution unit is the main
adder which is a 64-bit adder-shifter' combination. It has a three-stage full carry lookahead
scheme which permits an add operation in one
clock cycle. The main adder is supplemented
with an 8-bit exponent adder for floating point
operations and an 8-bit decimal adder for decimal and VFL operations.
16
229
A read-only storage mechanism was not included in the Model 70 since it did not lend itself to the machine organization, especially in
the area of required cycle time. As a result, the
control functions were implemented through
conventional logic design.
A further attempt to improve performance
was made through utilization of transistor circuitry, rather than core storage, for the general
purpose and floating-point registers. This approach permitted faster access, eliminated regeneration time between fetches, and permitted
accessing more than one register at a given
time.
In view of the storage interleaving and instruction overlap, a precise estimate of machine performance on a particular program,
loop or sub-routine, must involve careful scrutiny of instruction and data addresses and instruction sequences.
GENERAL
PURPOSE
REGISTER
4
I
II
FLOATING
POINT
REGISTER
I
i
MAIN ADDER
ADDRESSING
ADDER
STORAGE
IN BUS
Figure 17. System/360 Model 70 Data Flow.
From the collection of the Computer History Museum (www.computerhistory.org)
230
PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964
CHANNEL
The 2860 Selector Channel is a very high
performance data channel designed to operate
on Models 60, 62 and 70.
LARGE CAPACITY STORAGE
A considerable asset in achieving thruput
improvement is the addition of large capacity
storage (LCS) to the Model 70. This storage
unit, with a read-write cycle of 8 p'sec, may be
attached to the system in increments of 1024K
and 2048K bytes up to a maximum of 8 million
bytes. These units are attached directly to the
storage bus and can be considered an extension
of directly addressable main storage.
The performance of this channel utilizing
30-nsec circuits permits operation at data rates
up to 1.3 megacycles; the rate being measured
by the number of 8-bit bytes that pass, via the
I/O interface, to or from an appropriate input
or output control unit. Any interconnection to
input or output control units is via the I/O interface.
The LCS can also be attached to Models 50,
60 and 62, and has the facility of being shared
between any two of these systems.
,<
ri
r,
The channel is of the selector type, permitting interconnection of multiple I/O control
~SERVIC~EA-------------------~~'
21' O'
~
BOUNDARY
/ /""
I
I
./'"
~--Cl
:
I
/
I
fl:
I
,- -
I
I
M-4
3 AND 4
60~'
!
::
/
17 1 3"
M-4
1 AND 2
I
1
III
!!ii
1/
l
-t
30
I4
(OPTIONAL)
=
31
=
- , IL--_ _-Y
L: -
MCU
-
I
---,
(OPTIONAL)
1 r------+-1
I
I
:
I EXTERNAL
-
L
, \
I \
CABLE
I
\\
,
I
I
I
I
I
:
I
:
:
1
'--1- -
I
I
I
I
II
I
__
I CONNECTORS
I"
I
I
I
I
I
, ' II
I
!
"'"
CPU
1
_____ ,
:
3"
604'
L... ' - ' - ' -
I
I
~
1
I
30"
L_l _______ l ___ J ___ ~L ____ L_~~
L30.1
63'
J. 43.-L-43"-i-43,,J30.J
Figure 18. Model 70 Physical Arrangement.
From the collection of the Computer History Museum (www.computerhistory.org)
IBM SYSTEM 360 ENGINEERING
units to a single channel. A maximum of 6
channels may be used. Up to 8 control units
may be attached to one channel. However, the
channel at any given time operates with one,
and only one, device of the many attached. The
channel is "instructed" by the processor system
to commence an operation. An operation performed by an instruction may involve any sequence or list of commands to that particular
device. After being instructed, the channel independently obtains "commands" and transmits
data to or from processor storage until it completes its operation.
In operating with storage, the channel shares
the buss control unit, used by the processor, to
obtain its storage references. The Models 60,
62 and 70 use a double level of priority. In the
first level, each channel vies with the other
channels attached to that particular CPU for
priority, and then in turn vies with the processor for specific priority.
PHYSICAL ORGANIZATION
. Figure 18 shows the Model 70 physical arrangement. Each gate in the CPU contains 20
large cards and 4 half-size cards for termination of interframe cables. Two of the gates contain the Execution-unit, the other contain the
sequencing-unit. The maintenance-control unit
(MCU) frame contains maintenance circuitry,
231
the bulk power supply, and a small amount of
the CPU circuitry.
Mounting the CPU and the storage units on
a common wall helps performance in that it
allows short, direct cable connections between
them, rather than long under-floor cables. It
also means that all installations will have the
same cable lengths in these paths.
The 2860 Selector Channel frame is a threegate stand alone frame housing three swinging
gates, each capable of containing 20 large cards.
Power supplies are mounted in the internal
column between gates. Since each channel occupies one full gate, up to three channels may
be contained in a given three-gate frame.
BIBLIOGRAPHY
1. AMDAHL G. M., BLAAUW G. A., and BROOKS
F.P. JR., "Architecture of the IBM System/
360, "IBM Journal of Research and Development, Vol. 8, No.2 (April 1964).
2. DAVIS E. M., HARDING W. E., SCHWARTZ
R. $., and CORNING J. J., "Solid Logic Technology: Versatile, High-Performance Microelectronics," IBM Journal of Research and
Development, Vol. 8, No.2 (April 1964).
3. CARTER W. C., MONTGOMERY H. C., PREISS
R. J., and REINHEIMER H. J. JR., "Design
of Serviceability Features for the IBM System/360", IBM Journal of Research and Development, Vol. 8, No.2 (April 1964).
From the collection of the Computer History Museum (www.computerhistory.org)
From the collection of the Computer History Museum (www.computerhistory.org)
UNISIM-A SIMULATION PROGRAM FOR
COMMUNICATIONS NETWORKS
J. H. Weber and L. A. Gimpelson
Bell Telephone Laboratories, Inc.
Holmdel, New Jersey
I. INTRODUCTION
Furthermore, the measured outputs are directly influenced by most or all of the transactions sequenced through the program.
The design and analysis problems associated
with large communications networks are frequently not solvable by analytic means and it
is therefore necessary to turn to simulation
techniques. Even with networks which are
not particularly large the computational difficulties encountered when other than very
restrictive and simple models are to be considered preclude analysis. It has become clear
that the study of network characteristics and
traffic handling procedures must progress beyond the half-dozen switching center problem
to consider networks of dozens of nodes with
hundreds or even thousands of trunks so that
those features unique to these large networks
can be determined and used in the design of
communications systems. Here it is evident
that simulation is the major study tool.
In telephone (and other) traffic simulations,
however, particularly of the network type, the
possible number of alternatives before any call
(demand) is not very great, but the number of
calls which are simultaneously in progress is
quite large, and their interactions are not
predictable. The measure of performance is
typically grade of service (or probability of
blocking), which is normally in the order of
one per cent of the total offered calls. This
measure, furthermore, applies to each traffic
parcel in the network. For example, if there
are 20 nodes in a network, there are 190 two
way traffic items, each with a different demand
rate. If the smallest demand contributes only
1/1000 of the total calls in the network at any
time, then 1000 calls must be processed for
each one from this smallest parcel. A blocking of one per cent should be measured with
reasonable reliability on this smallest parcel,
and even if 1,000,000 calls are processed, only
(1/1000) (1/100) (1,000,000) or ten calls
will be blocked, and therefore contribute directly to the measurement on the smallest parcel. Since many networks must be tested
using different load levels, it is clear that a
primary requirement for a simulator which is
to be used in traffic network studies is that
it be fast. An improvement in speed of 5
II. SIMULATOR REQUIREMENTS
The type of problem to which this simulator
is directed is quite different from many of
the queuing and other problems which are
the primary application for most simulation
programs. Whereas in management simulation and tandenl queuing processes (job
shop simulations, etc.) problems are characterized by a fairly complex sequence of possible alternatives, the number of demands
simultaneously in process is ordinarily not so
great as to tax the capacity of the computer.
233
From the collection of the Computer History Museum (www.computerhistory.org)