Common file organization techniques compared

·Common file organization techniques compared
by NED CHAPIN
IrifoSci Inc.
Menlo Park, California
INTRODUCTION
In order to make a comparison of file organization
techniques, concurrence is needed on terminology. To
that end, this introduction offers some definition of
terms. Unfortunately, many of these terms do not
have universally accepted definitions. A general definition of terms can be found elsewhere. 6
In offering definitions of terms, this paper does not
suggest that those who give different definitions are
wrong. On the contrary, the differences in definition
that exist reflect in part imperfect communication
among people in the field, and in part. real differences
in the concerns of the people in the field. Hopefully,
papers such as this ope will help improve communication. But the differences in concern will continue
to exist) and to spawn both new differences and new
terms.
As used in this paper, the term "file organization"
is not synonymous with file structure) data structure,
data base" data organization, or data management. A
file organization is viewed as a way of putting together the components of a file.. "File structure" is
viewed as synonymous with file organization, but is not
used in order to help distinguish it from "data structure." A "data structure" is a more general term than
file organization.l. since a file is viewed as but one general
organization of data. Some people use the term data
structure to refer only to vertical relationships among
data. "Data organization" is viewed as synonymous
with data structure. A data base is viewed here as a
group of files or alternatively as a controlled aggregation of data which can be regarded as organized into
files.
The term "data management" is used with a variety
of meanings in the field .. Sometimes it is narrowly used
to refer to movement and formatting of data to and
from internal storage, and the supporting software.
Sometimes in a broader sense it also refers to the
identification of data and procedures to maintain the
integrity and security of the datI\.. At other times, the
term is used also to refer to file organization. In a very
broad sense, it refers also to. the maintenance of files
the handling of inquiries, and the preparation of
reports.
These definitions raise questions about the definitio n
of the vertical and horizontal organization of data.
Looking first vertically, this paper views a file as an
arbitrary but usually homogeneous but not exhaustive
aggregation of records. Records are collections of data
all of which share some attribute in common, usually
the name of a thing the data are about. For example, a
record of employee job attendance might contain data
apout number of days worked, number of days absent,
the usual work station, the parking lot location, the
home address, the home telephone, the usual days of
the week absent, and the like. When these data are
drawn together and grouped in terms of "the identification of the employee (such as by employee identification number), the individual groupings thus formed
are here viewed as records. The components of the
record are data items (usually fields), as diagrammed in
Figure 1.
The definition of a record implies no specific ordering
of the data items •. The definition of the file implies
no ordering of the records "within the file. By ordering
is meant the application of a collating sequence or
pattern template to data items at a uniform level in
the vertical hierarchy of data. When records are ordered,
~.
413
From the collection of the Computer History Museum (www.computerhistory.org)
4'14
Fall Joint Computer Conference, 1969
OPERATIONS FILE
BASE
~
o
FILE
H
CJ)
RECORD
(J)
H
ITEM
Figure I-Condensed diagram of the vertical hierarchy
of data
the data items used for the ordering are referred
to here collectively as the key. For example, the records
in the attend~nce file just cited might be ordered using
an ascending numeric collating sequence with the
employee identification numbers serving as the key.
The horizontal organizations of data reflected in
this paper require definitions of table, tree! string,
and list. A "table" is a series of pairs of data itemS,
which are the argument and the function. The table by
its form permits the table user to establish by inference a relationship between a particular argument and
its associated function. A telephone book and a statement of tax rates are examples of tables.
Three important tables for the comparison of file
organizations are indexes, directories, and tables of
contents. An "index"· has the arguments in a specific
order but the function which may consist of multiple
data items may be in order. By contrast, a "table of
contents" cites the functions in a specific order but
leaves the arguments in any order. "Directories" may
have the arguments or functions or both ordered in
any manner. For this reason, the term directory serves
as a general term covering in practice both indexes
and tables of content.
A "tree" can be used to represent vertical relationships among data. 4 A tree may also be used for horizontal organization of data, as shown in Figure 2. For
Figure 2-A partial representation of a tree af! a
horizontal organization for a file
example data about a firm's operations might be broken
into divisions such as production, sales, eng:ineering,
and the like. These divisions in tUrn can be broken
into subdivisions. For exampleJ, sales might be broken
into territories, and production into the product categories. Engineering might incorporate new product cntegories currently not in production, as well as those: in
production. These categories can in turn be broken
still further. Thus in production they might be broken
by production equipment or in terms of a bill ofmaterinl. In snles they might be broken down into products
or into salesmen. In summnry the term tree gets its
name from the graphic representation of the processes
of subdividing.
By contrast, a string organizntion. is viewed as a
series of things, one after the other, where: the elements composing the series are similar. EX~Lmples of
strings nre series of characters, of digits, of nnmes,
or of numbers.
A "list" is viewed as a series of records or data
items each accompanied by one or more pointers to
other' elements in the series. These pointers are here
termed "links" and are themselves data items. Some
people prefer the term "chain" to refer to a list.
Irrespective of vertical or horizontal aspeets of the
file organization, a file may exhibit a si~npl~ or a
compound organization. A "simple" orgaIll2iatlOn has
only one major structural pattern. A " com~ound"
organization has two or more distinct and dl~erent
structural patterns which taken together comprIse the
file organization.
Classifications
The number of people in the field have proposed
From the collection of the Computer History Museum (www.computerhistory.org)
Common File Organization Techniques Compared
classifications of file organization. A brief review of
some of these will serve as a basis for selecting one
for use in making comparison here.
A team headed by Anthony J. Dowkart has offered
an extensive basis for comparison. 9 In summary, this
basis is: the data definition provided, the facilities
for file creation and maintenance, the retrieval mechanism, the processing procedures, the output characteristics, and the operating environment. This basis of
classification is concerned not with file organization
alone, but also with data management in the broad
sense. Looking at the matter of file creation and maintenance, and of data definition, the classification bases
suggested are performance oriented, rather than
structure or pattern oriented.
Richard G. Canning has suggested classifying file
organization into two general classes based upon type
and upon structure. 3 Within type he proposes recognizing sequential, indexed, and chained files. Within
structure, he proposes recognizing linear, hierarchical,
and involute files. These classifications are more
structure and pattern oriented than those just cited,
but they lack a consistently applied, obvious basis.
lVIinker and Sable in reviewing data management
systems suggested a basis of classification as user
language, file structure, system processing capability,
and user interface. 13 This again shares the same general
user basis cited previously. Looking more particularly
at the basis identified as file structure, ::\Iinkerand
Sable suggested classifying on the basis of the implementing storage media (such as tape· or disk) and the
variety of field and record lengths permitted. Among
those that permit greater variety and which are disk
based, Minker and Sable suggested a classification of
indexed, tree-ordered, and linked, or chained. Th('se
suggestions share many of the features of those of
Canning as noted earlier.
David Lefkovitz has suggested a classification of
file organization based upon a combination of the hardware and software components utilized to implement
the file. 12 These he viewed from a functional point of
view, particularly with regard to the retrieval process.
Thus a file organization may be classified on the basis
of which software-hardware components it utilizes and
in what way. For example, does it use a directory,
does it use a randomizing or a tree approach? If it
uses a tree approach, does it use a fixed length key or
a variable length key? And so on. Such a basis of classification results in a very large number of possible
classes. In a sense, each non-identical existing file
organization becomes a separate classification.
Ned Chapin has suggested a classification scheme
415
based fundamentally upon the way of indicating association at a giV(,Yl vertical level within a file. 4 At
one extreme he placed the attributed organization
which provides explicit identification with the data
at some given level. This obviates the necessity for
providing a means of association below this level.
At another extreme, he placed the linked or list organization, where each data element at a given level incorporates a specific indication of association. Two
varieties of this he singled out for particular attention:
the complex ring which is a complex list that forms
closed loops, and the muble or multiple double-linked
list which provides two or more links. At another
extreme, he placed the hierarchical organization, which
provides a tree-like association on a horizontal basis.
Finally, at another extreme, he placed the positional
organization. This provides association in terms of
placement in relation to other data, at a given vertical
level. Thus, field A is always known to precede field
B, and field B is always known to precede field C,
and all three fields are always present in a record.
Hence, values from the third field position have a
known identification and association.
The Chapin classification utilizes an important
feature of the way people think about data, as its
basis for classification. As such, it avoids the mixed
base problems inherent in the other classification
schemes it reviewed, without the gaps or holes characteristic of the other systems.
This classification approach lends itself to a graphic
representation, as diagrammed in Figure 3. The diagram
uses time as the left to right distance, but not in strict
scale units. 7 The vertices or nodes are the identity of
data. The solid arcs or lines are the sequence of the
active (pointed to) data. Vertically, the diagram has
two parts, an upper or demand CD) part, and a lower
or supply part. A perfect match of the file organization
to the demands upon it occurs ,,,hen the data (indicated by broken lines) demanded and supplied occur
at the same time.
Characteristics
Th~
D
S
point is well taken that users by and large are
, • , T• , •
.. • • • • • •
C/
II
II
•
••
••
••
•
I
•••
•
••
•
•
Figure 3-A graphic representation of associations
showing the ideal pattern for a file organization
From the collection of the Computer History Museum (www.computerhistory.org)
)
••
)
416
Fall Joint Computer Conference, 1969
unconcerned with the classification of a particular file
organization technique. They are concerned with the
functional characteristics of the file organization technique in action. Some of these of course are hardware and software dependent. But within those bounds,
they are determined largely by the file organization
itself. Among the common characteristics are the speed
and basis of access, the use of storage capacity, the
ease of maintenance (for insertions, alterations, and
deletions), and the extent of software support available.
The speed and basis of access is fundamentally
affected by the association provided in the file organization because access uses the association for its realization. The hardware, the software, and the association together set the limits. The basis of access may
be by attribute, by value, or by property as has been
pointed out elsewhere. 4
The use of storage capacity reflects two aspects
of file organization, each of \vhich in turn rests upon
the basis of association. One aspect is that compound
organizations commonly use more storage capacity than
do simple ones. Another is that hardware and software
factors also affect the use of storage, given the file
organization.
The procedures, the convenience, and the time re-·
quired for maintenance operations, such as insertion,
deletion, and alteration of data in a file, depend obviously upon the hardware and software used. But they
also depend importantly upon the association provided
by the file organization, since maintenance involves
access, but is more than access. Common maintenance practice is not always a corollary of the features
of the file organization.
The extent of software support available is a very
significant determinant of the degree to which people
are willing to use a file organization. Even if it be
theoretically attractive, a file organization unsupported
by software is in practice ignored in favor of anything
that is supported by debugged software.
niques supported by software available from the computer vendors but not provided normally 2,S part of
the operating systems. These usually take the form of
"packages" capable of a variety of functions.
A third category are the file organization techniques
available in the software market from independent
suppliers of software. K one of these are as common as
those available in the first category, but some are as
common as some in the second category.
For contrast, this paper looks also at the extensions
to COBOL proposed to CODASYL in the area of file
organization techniques.
Vendor supported techniques
Historically the oldest, the most popular, and by
far the most common, is the strict sequential file
organization. The strict sequential is a positional
organized file commonly consisting of ordered records
which are themselves positionally organized. 4 ,lo As such,
its use of storage is the most economical of nIl. It is a
simple, not a compound organization.
The strict sequential enjoys a rapid next-record
aceess by attribute, but a slow random acces!) by attribute, as diagrammed in Figure 4. That is, as long as the
sequence in which access is demanded conforms to the
sequence in which the file was sorted, access is rapid
unless the number of records to be passed over is large.
Unfortunately, access is often desired on some other
key. This requires first a reordering of the file which
involves a time-consuming sorting operation, or an
exhaustive search of the file. Even with this sorting
,. ,.
CONSECUTIVE
,
:
•·· • •
•
,
:
:
--+
•
• •
l --.--+
T
T
r---+
Common techniques
Techniques covered
The most common file organization techniques are
those proselytized and supported with software by the
computer vendors. These are normally part of the
operating system and are accessible to anyone who
programs in the symbolic language for a particular
computer. Some of them are available to users of
higher level languages such as COBOL and FORTRAN.
Less commonly used are the file organization tech-
RANDOM
T
~==4c=:;. . ..::~~:::.~:::
Figure 4-Diag,ram of the strict sequential file
o rganiza tion
From the collection of the Computer History Museum (www.computerhistory.org)
Common File
operation, access by value and by property involve
search.
Maintenance for sequential files is logically straightforward, but slow. It requires typically a complete
passage of the file with a complete copying of it. Each
record must be read and written in order to do maintenance on the file. Because of this, insertions and dele~
tions are easily accomplished. Alterations are also simple as long as the typical fixed length restrictions on field
sizes is observed. Where variable length fields are permitted alterations become a little more complex but are
still logically straightforward.
Software support for sequential organization is
extremely good. Its popularity is attested by Table I.
It is the most widely supported of all the file organi~
za tion techniques.
The indexed sequential is a compound file organization technique, historically younger than the strict
sequentia1. 4 ,lo This too is a positional organization.
The main file is a strict sequential file. With it is a
sequential organized index using the same key. Sometimes indexes to indexes are provided depending upon
the size of the main file and the storage space available.
Random access for the indexed sequential file is
superior in speed to the strict sequential because the
index search requires less time than a search of the
main file. From the index the location of the desired
record can be found and the record then accessed without search. But for a next-record access, the same
procedure usually is required, which slows such access
(see Figure 5). Access by attribute, by value, and by
property follow the same pattern as for the sequential
organized file.
. The use of storage space for the indexed sequentia1
IS l~rger because of the additional space required for
the mdexes. An added inefficiency in the use of storage
space is the typical requirement for overflow areas to
TABLE I -Summary of the file organization techniques
supported by the eight largest computer vendors
Strict
Sequential
I ndexed
Sequential
Direct or
Random
IBM
RCA
CDC
UNIVAC
Burroughs
NCR
GE
Honeywell
IBM
RCA
CDC
UNIVAC
NCR
Honeywell
IBM
RCA
UNIVAC
NCR
GE
Honeywell
Organi~~tion
Techniques Compared
417
Figure 5-Diagram of the indexed sequential file
organization
permit insertions in the main files. This overflow may
amount to as much as a third to a half more space for
the main file, although typically this can be held to
about one-tenth more space.
The maintenance of the indexed sequential file
differs considerably from that for strict sequential.
Maintenance does not require rewriting the entire file;
only those specific records in the file that are altered
are rewritten back into their same places. This saving
in maintenance time can be more than offset by other
factors.
An insertion in an indexed sequential file requires
that· adjustments be made to the index and to the
main file. The inserted record typically must be written
in the main area displacing a record into the overflow
area. Links are inserted if more than one such overflow
occurs in a given area. By contrast, deletion is more
simple. The record to be deleted is simply marked for
deletion but is not physically deleted from the file nor
from the indexes. Periodically, the entire file is rewritten in order to eliminate the accumulated deletions,
to pull the insertions into the main sequence, to re~pportion the overflow areas, and to clean the index.
In sum, whether or not the maintenance time for an
indexed sequential file exceeds that for a strict sequential file depends upon the volume of insertions and
alterations. For low to moderate volume, the strict
sequential is usually slower over-all. An indexed sequential suffers from the same single-key limitations
as the strict sequential.
The software support for indexed sequential generally
is good The software operates more slowly per random
access than for strict sequential because of the decreased buffering possible.
The direct or random file organization is also a
positional organization. 4 ,lo It is like strict sequential
in that it is simple, not compound. The direct or
random file organization is a variation of the strict
From the collection of the Computer History Museum (www.computerhistory.org)
418
Fall Joint Computer Conference, 1969
sequential. It uses a transforma~ion of the key. Whatever the key ',,"ould be is passed through an algorithm
to calculate a position in storage. Because of the
possible occurrence of multiple records having the same
key, or of closely spaced keys, provision is made in the
algorithm to handle some conditions. One is to place or
find a record when its transformed key is the same as
another transformed key. This can be handled by links
and overflow areas, or by shiftiI).g records to maintain
a sequence in order to restrict the search domain. Another is to set up the initial spacing of records in the
file to permit room for the later insertions. The amount
of storage space allocated for this purpose is usually
not less than that allowed for overflow areas in the
case of an indexed sequential file.
The random access provided by the direct or random
file organization is slightly faster than that for an
indexed sequential organized file, since no index reference is needed. But for next-record access, it is slower
because the transformed key order is not the same as
the ordinary key. Hence, every access is a random
access, as diagrammed in Figure n. The access basis is
the same as noted earlier for the positional organized
files. Also, only one key can be 'used, as noted earlier.
The use of storage space for the direct or random
file organization is about as effi~ient as that for the
indexed sequential, and is less; efficient than for the
strict sequential. This is because of the voids that
must be left in the spacing of the records to accommodate inserts, and the use of overflow areas. No space
is needed for an index.
The maintenance for a direct or random organized
file resembles the indexed sequential more than the
strict sequential. This may alsQ extend to alterations
and deletions. For insertions, no index need to be adjusted. If the record to be inserted must go into a place
that is already occupied (that is, the transformed key
is a duplicate of an already existing transformed key)
then provision must be made for moving records or for
use of overflow area and links.
The software support for the direct or random file
organization is less troublesome and less burdensome
than that for the indexed sequential. Also, less supporting software is needed to accomplish the job. The
user does not even need to rely upon manufacture
provided software but can make do by providing his
own algorithm for key transformations and by using a
strict sequential file organization. lVIany vendors have
been supplying this software for a longer period of
time than they have supplied indexed sequential soft",;are.
Another type of common file organization technique
available from the computer vendors and incorporated
as a normal part of their operating systems is the partitioned file organization. 4 ,10 This is a hierarchical file
organization. But it is normally not accessible to the
programmer even though it is utilized routinely by the
operating system for its own functions such as program
libraries. Typically, the hierarchical file organizations
are compound because they require directories and
sometimes even hierarchies of directories to maintain
association and provide access. These directories usually include one that is of the table of contents type.
Access by attribute is the most common. The speed
of access depends mostly upon the size and number of
directories used (see Figure,7). Maintenance is usually
done by making deletions by altering only the directories. Insertions are entered in the directories and the
new data placed in any available space. Alterations
are often treated as combined deletions and insertions.
The software support is usually inadequate to enable
the use of the partitioned file organization by programmers in their own programs. The organization
becomes increasing uneconomically of storaglB space as
, deletions accumulate. To eliminate them requires rewriting the entire file and recreating the directories
j
--.'--.'--,4'~~'~~'~~'~--'~
•
•
•
•
•
•
•
:
:
:
:
...-1-._:L----L-:
:
:
:
~
--t--:----.:----t-:
·
•
•
••
•
••
••
.•
•••
••
•
••
•• • ••• •
••
I
Figure 6--Diagram of the direct or random file
organization
•
t.....-----I---+-~--...-_r
Figure 7-Diagram, of the partitioned file orgnnization
From the collection of the Computer History Museum (www.computerhistory.org)
Common File Organization Techniques Compared
TABLE II-Summary of selected vendor
augmentations
Strict
Sequential
GIS
FORTE
MARS
Indexed
Sequential
GI~
I
UNIMS
FORTE
MARS
UL/I
Direct or
Random
GIS
FORTE
Other
Techniques
IDS (ring)
FORTE (list)
an operation equivalent to that needed for the indexed
sequential file organization.
Vendor augmentation
Computer vendors over the years have made a,
number of augmentations and elaborations of the implementation of file organizations just compared. The
best known of these are listed in Table II.
One of these has been IBIVI's GIS (Generalized Information System).2 This elaboration provides a number
of features that add greatly to the power and convenience available to the user. Underlying it are the
two positional organized file organizations, the strict
sequential and direct or random. The use of indexed
sequential is optional depending upon the scope of the
GIS implemented. GIS is a free-standing package, not
an extension of COBOL, but GIS can be used with
COBOL.
The access for the GIS is slower because of the
additional software. But that software yields greater
convenience of user access by reducing programming
effort to file and retrieve data. The use of storage
space is but little more extensive, ignoring the space
for the additional software. l\1aintenance follows the
usual procedures but is more convenient from the user's
point of view because he does not need to write all of
the programs for doing it. The software support is
comprehensive.
The Integrated Data Store (IDS) is available from
General Electric, l and is similar to the General l\10tors
Associative Programming Language. IDS offers a complex ring file organization where the number of links
possible at the record level in the file may be made as
extensive as the user desires. In practice, it is used
most often as an extension of COBOL.
Access by attribute beyond the first access is slightly
facilitated because of the links. Access by property is
much facilitated as a practical matter because of the
links which provide quick reference to the records with
related contents. The use of storage space is greater
419
than for a strict sequential organization because of the
space occupied by the links. Since in practice, directories are used to locate or serve as pointers to rings, a
little additional storage is also needed for them.
Although insertions, deletions, and alterations are
handled by the software, the procedures are considerably more complicated for IDS than for the positional
organized file. This is because of the need to adjust
the links whenever insertions and deletions are made. If
the insertion cannot be made physically nearby, then
subsequent accesses following the links are slowed.
This maintenance problem compounds as the number
of links to be adjusted increases. The software support
available for IDS is comprehensive and has been extensively tested in use.
The UNIlVIS (Univac Information lVIanagement
System) is available from the Univac Division of
Sperry Rand. It offers a modified indexed sequential
file organization in a package of software, in a similar
manner to that noted earlier for GIS and IDS. It too
can serve as an extension to COBOL.
The access and maintenance for UNIlVIS are similar
in character to that noted earlier for indexed sequential
files. But to the user the procedures appear easier
because of the assistance provided by the software.
UNIMS uses little more storage space than the indexed sequential noted earlier. The software support
is comprehensive.
The UL/I (User Language/I) from RCA offers a
more convenient language for the handling of access,
maintenance, and reports from files than the usual
programming languages. As such it has similar objectives to GIS noted earlier. UL/I uses a modified
indexed sequential file organization in a way that gives
the appearance of a hierarchical file organization. l1 The
characteristics of this software system were still fi uid
at the time of this paper.
FORTE is available from Burroughs Corporation.
It provides unordered (sequential), indexed sequential,
random,. and a combination of indexed sequential and
random. Further, it provides list file organization in
two forms, a two-cell list, and a usual double-linked
list (but not a multiple-linked list or ring structure).4.14
As such it represents an improvement over the FORGE
software which Burroughs has offered. FORTE is designed for use as an extension of COBOL, not as a
free standing software package for file organization
and use.
Another relatively new entry in the field is MARS
from CDC. In giving the user the appearance of a
range of file organizations, it like UL/I relies primarily
upon the strict sequential and indexed sequential file
From the collection of the Computer History Museum (www.computerhistory.org)
420
Fall Joint Computer Conference, 1969
organizations. Like GIS noted earlier, MARS is a
generalized system providing access, maintenance, and
report capabilities. It does however provide the capability of building an inverted list organization. Its
characteristics were still fluid at the time of preparing
this paper.
Non-vendor augmentation
The number of implementations of file organization
alternatives are available in the software market from
sources other than computer vendors. With IBM's
Summer 1969 announced changes in software policy,
this growth in alternatives can be expected to grow still
larger. Only a brief selection is covered here, based
primarily on age and popularity (see Table III).
Two distinct classes of offering are available in the
software market. One uses and elaborates upon the
vendor provided file organization and software support.
Another replaces the vendor provided file organization
and hence also provides its own software. A brief look
at each of the groups will round, out the comparison,
since these offerings may soon become more popular
in the market.
In the first group, some of the best known are the
MARK-IV, the FILE EX, SCORE-II, and INQUIRE.
The first two of these use the vendor-provided strjct
sequential and indexed sequential file organization techniques. To these they add an important software superstructure for report preparation, qata retrieval, and file
maintenance. As such they provide an alternative to
the user for preparing his own programs to accomplish
similar ends, and to the use of the vendor-provided
software.
The SCORE-II also uses the vendor-supported sequential and an indexed sequenti3j1 file organization. In
addition it also provides tree structure, not directly
but based upon a combination of the strict sequential
and indexed sequential. This adds flexibility to the
package of report preparation, retrieval, and mainteTABLE III--Summary of selected non-vendor
augmentations
Strict
Sequential
Indexed
Sequential
MARK-IV
MARK-IV
FILE EX
FILE EX
SCORE-II
SCORE-II
Director
Random
Other
DM-5
(hierarchy)
SCORE-II
(tree)
INQUIRE
(list)
nance facilities.
Differing in its choice of the underlying file organization is INQUIRE. This utilizes the indexed sequential and the direct or random file organization.s. But
these are not directly accessible to the programmer.
Rather, INQUIRE combines them to form a modification <;>f an inverted list file structure. * This gives
added power to the file maintenance, retrieval, and
report capabilities of INQUIRE. Access by attribute
and by property is facilitated by the inverted list
organization, but maintenance requires adjustment 'Of
the lists as additional operations. 4
In the second group, the oldest and most publicized
entry is the DNI-5 (Data Manager-5) which has been
described in the literature of the field. s DM-5 ,like the
others, includes the soft-ware for retrieval, maintenance,
and report preparation. DM-5 utilizes a hier,archical
file organization of a compound form. Tables are used
at several levels. Both random and next-record access
is handled by use of the tables, and are of about equal
speed for access by attribute. Since the records are
not ordered by a key, but ma,ny keys can be used in the
construction of the tables, the single key restrietion 'Of
the positional file organization is avoidod with a result
similar to that for the inverted list file organization.
In summary, the non-vendor offerings in the software market typically combine into a single packa!~e
both file organization and convenient aids to using it.
The offerings thus far do not attempt to replace the
file organizations supported by the computer vendors.
COBOL extensions
The Data Base Task Group proposed last year to
* The inverted list was developed about 1964 under the leadership of Dr. Jack Minker 803 a modification of the inverted fille.
The inverted file organization was in use in the information
retrieval field in the years 1957-1958. The inverted file is a positional file organization with an ordering determined by multiple
keys. Records in the file reoccur as many times as they may have
keys, which need not be the same from record to feicord. By
contrast, an inverted list is a list file organization of a compound
form. The main portion of the file need not be and usually is not
in a list form. The key portion of the file is organized ai3 a set of
lists consisting of pointers for each key to records in the main
file. Since as a practical matter, the links are unnecessa,ry, common practice is to elide them. The result is conceptually equivalent to an inverted file with all records replaced by surrogates
(a common practice now), and with the records drawn into a
subfile of their own with no redundancy. (The inverted list cnn
also be viewed as resulting from a consolidation of the links in
one direction from a muble chain or multilist file. 4 •12 ) In net
effect in their modern forms, and as a practical matter, an inverted liflt differs from an inverted file primarily in emphasis and
mann~r of implementation.
From the collection of the Computer History Museum (www.computerhistory.org)
Common File Organization Techn'iques Compared
the CODASYL COBOL Committee an extension of
COBOL to incorporate provisions for the complex ring
file organization. 6 Although the discussion devotes considerable attention to the other file organization techniques, the proposal is for the inclusion of only orre· of
them, the complex ring. In substance, this is very
similar to the IDS noted earlier. This discussion included with the proposal indicates that ring file organization can be used to simulate or serve as other file
organizations, such as sequential, random, hierarchical
or tree, and inverted file. Although not presented in
the discussion, it can also be used as for muble chains
or a multilist file organization.
One of the major objectives of the Data Base Task
Group was to work toward keeping the description of
data stored with the data itself. This is in effect. an
attempt to delay binding time. Since delayed binding
time in general improves the flexibility and power of
the resources available to the programmer, the objective is commendable. Providing linkage among data
can be a definite step in this direction. The question
to be argued is whether or not the ring file organization
is the best choice of means for accomplishing this
objective as well as serving as a worthwhile extension
of COBOL.
From the comparisons presented, it can be argued
that replacing a ring file organization by a frankly
compound file organization sans links, would gain more
for COBOL. Examples of candidate file organizations
are the inverted list and the hierarchic~l. Access for
both is faster and more powerful; maintenance for
both is simpler.
CONCLUSION
Automatic computers during the middle and late
1950's had by present day standards, relatively slow
execution times and great restrictions upon the availability of both internal and external storage. The trend
has been toward increasing the availability of larger
and larger amounts of storage capacity, and toward
faster and faster operating speeds.
These changing computer capabilities suggest the
desirability of seriously rethinking the historic preference for positional organized files. This was certainly an
appropriate choice of file organization technique, when
storage capacity was extremely limited and operating
speed was slow. It required the least storage space and
the least direct overhead within the program at the
time of file use. The positional organized file entails a
very heavy cost of additional operating time in order
to reorder (sort) the file. It also involves tlie time to
rewrite the file periodically as a part of the mainte-
421
nance of the file, depending for its extent upon the form
of the positional file organization.
N ow that computers have much more extensive external and internal storage capacity and operate more
rapidly, it appears appropriate to reappraise our continued reliance upon positional file organization techniques. Let us consider briefly the alternatives. The
attributed file organization is still too expensive of
storage space and of machine time for serious attention
in pure form. The list file organizations in general
suffer from costly maintenance. The exception is the
inverted list. The hierarchical file organizations appear
attractive, but like the inverted list, are in practice
compound file organizations.
It is significant that these latter two file organization
techniques are generally not available to computer
users because the supporting software is not generally
available. The software exists, but the form of most
puts it beyond the reach or scope of operations for
most computer users. But this gap is narrower now
than it was. Some vendors such as CDC and Burroughs
have started to move to provide a wider range of file
organization. techniques. Independent software firms
are starting to offer a wider variety of alternatives.
But a gap still exists.
REFERENCES
C W BACHMAN
Integrated data store
DMPA Quarterly Vol 1 No 2 Jan 1965 10-30
2 J H BRYANT P SEMPLE
GIS and file management
Proc 21st Natl ACM Conf 1966 Thompson Book Co
Washington D C 97-107
3 R G CANNING
Data management: file organization
EDP Analyzer Vol 5 No 12 Dec 1967 14 pages
4 N CHAPIN
A comparison of file organization techniques
Proc 24th ACM Natl Conf 1969 ACM New York 273-283
5 N CHAPIN
Data structures
Automatic Computers N Y Van Nostrand Reinhold Co
in press
6 Data Base TBsk Group
COBOL extensions to handle data bases
SIGPLAN Notices Vol 3 No 5 April 1968 1-45
7 M E D'IMPERIO
Data structure.<{ and their representation in storage
Annual Review of Automatic Programming Vol 5 Oxford
1969 Pergamon Press 1-75
8 P J DIXON S JEROME
DM-l-a generalized data management system
Proc SJCC Vol 30 1967 185-198
9 A J DOWKART et al
A methodology for comparison of generalized data manage-
From the collection of the Computer History Museum (www.computerhistory.org)
422
Fan Joint Computer Conference, 1969
ment systems
CFSTI No AD-811-682 March 1967287 page,...
10 IBM CORP
Introduction to IBM System/360 direct acce8s storage
devices and organi..ation methods
IBM Corp 1968 White Plains N Y 70 pages
11 W I LANDAUER
The balanced tree and its utilization in information retrieval
IEEE Trans on Electronic Compu~ers Vol 12 No 6 Dec
1963 863-871
12 D LEFKOVITZ
File structures for on-time sy.~lem
Spartan Books 1969 Wa!'hington DC 215 pages
13 J MINKER J SABLE
File organization and data management
Annual Review of Information and Technology 19157
John Wiley and Sons Inc N Y 123-160
14 N S PRYWES H J GRAY
Outline for a mutilist organized system
ACM Natl Meeting 1959
From the collection of the Computer History Museum (www.computerhistory.org)

Download Report

Common file organization techniques compared

Paperzz.com

Your Paperzz