On preserving the integrity of data bases
M. V. Wilkes
Computer Laboratory, University of Cambridge
It is not usually possible to reconstruct from original documents a large data base that has been in
existence for some time, but it must be possible to repair it after an error has occurred. This can
only be done if a copy of all information in the data base is kept in some safe place. The paper
discusses the use for this purpose of periodical dumps and of various forms of journal. The important
role played by the data base manager is emphasised.
(Received March 1972)
The subject of filing information in a computer is a large one
and one that is still very imperfectly understood. Some 8 or 9
years ago we saw the development of the first systems in which
multiple-access facilities were combined with central filing. This
was essentially a development in the scientific field and it led to
the delineation and understanding of one important—if
limited—segment of the general area of filing. This area I like
to refer to as that of user-support filing systems. These provide
support to the user in his capacity of user rather than in his
capacity of engineer, physicist, or what have you. They provide
him with filing space in which he can conveniently hold programs, quantities of test data, documentation, and so on.
Drake and Smith (1971) refer to user-support systems as
general purpose systems.
User-support systems derive from that designed by Corbato
for the CTSS at MIT and later examples are to be found in
MULTICS, in the Cambridge multiple-access system, in
GEORGE III, in TENEX, and in other systems. Descriptions
of some user-support filing systems will be found listed in the
bibliography at the end of the paper. A user-support system
caters primarily for character files, although it must also cater
for binary blockfiles;the latter may be thought of as containing
binary information dumped directly from core and, from the
present point of view, present no points of great interest. The
character files are context free in the sense that the system will
treat a FORTRAN program, a love letter, or a death warrant
on the same terms. Their properties may be summarised as
follows:
1. They are sufficiently short for the simplest of serial scan
algorithms to be used for accessing.
2. Lockout and security control can be applied to a file as a
whole.
3. Modifications to the content of thefileare made by creating
a copy with the changes incorporated leaving the original
file as it is. The copy may befiledaway under a new name,
in which case the original version remains under its old
name; alternatively, the copy may befiledunder the original
name, in which case the original version is implicitly deleted.
In the CTSS and systems directly deriving from it the user is
expected himself to resolve any ambiguities resulting from
the re-use of names, and this does not in practice cause any
difficulty; within the system ambiguity is avoided by having
the date and time of creation associated in thefiledirectory
with the name of thefile.In other systems a version number
is automatically assigned by the system when afileis re-used.
4. Since the files rarely extend over many blocks on the disc,
no form of placement control (see below) is found to be
necessary.
Quite elaborate security systems have been implemented for
preventing unauthorised access to files in a user-support filing
system, and this has presented no great problem since, as
pointed out above, it is sufficient that the protection should
Volume 15 Number 3
apply to an entire file and not to part of it. Similarly, requirements for integrity—that is the safeguarding by the system of
information entrusted to it—are not onerous. It is sufficient if
the system safeguards, by making copies on to magnetic tape,
files that the user has designated as being in some way permanent, and if such copies are made within 15 or 20 minutes.
The inherent hardware reliability of modern disc-based filing
systems is such that the risk of occasionally losing a temporary
file, or a file that has not yet been copied, can be comfortably
accepted.
While the implementation of a user-support filing system is no
light task, one at any rate feels that the subject is tolerably well
understood. Few would make such a claim for large filing
systems intended for particular applications, especially business
applications. I shall refer to such filing systems as data bases,
a term which, for convenience, I take to exclude user-support
filing systems. Data bases tend to be large and can be very large
indeed. The standards required for integrity are very high,
almost frighteningly so.
In designing a data base one must have regard to the structure
inherent in the data and to the additional structure introduced
for the purpose of storage; in other words, one must consider
both data structures and storage structures, the latter reflecting
the accessing pattern to which the data will be subjected. At a
higher level, there is the question of the language by and through
which the data base is to be interrogated and updated. All these
subjects have been treated at some length by the CODASYL
Data Base Task Group (1971). They concern the owner of
the data as much as the computer expert and one feels that
workable if not optimum solutions will somehow be found
in individual cases. One cannot, however, sit so lightly to the
question of integrity. Unless the designer of a data base to be
used in an information system has an adequate understanding
of this important subject he is likely to create more problems
than he solves. I shall discuss integrity first from the point of
view of a user-support filing system and later from a more
general point of view.
Integrity in a user-support filing system
There is only one way whereby a high degree of integrity may
be achieved in anyfilingsystem and this is by keeping a copy of
the information to be safeguarded in a separate place, or better
still by keeping several copies in several separate places. A
simple application of this principle in the case of a disc-base
filing system is to dump the entire disc on to magnetic tape at
suitable intervals. If corruption of files is then suspected, the
entire disc can be reloaded. The effect, of course, is to put the
clock back to the time of the last dump and all updates made
since then must be repeated and all newfilesmust be re-created.
From the operational point of view, therefore, periodic
dumping by itself provides only a partial solution to the
problem. Moreover, dumping is time consuming, taking several
191
hours even for a small filing system; if many of the files are
relatively inactive—as they normally are in the case of a usersupport filing system—it is obviously wasteful to go on dumping
them at short intervals.
It is more efficient to use a system of incremental dumping.
The entire collection offilesare dumped only rarely, but a copy
is made on magnetic tape of every new file, or of every new
version of a file, shortly after it is created. For a discussion
of a system of this type see Fraser (1969). Information is held
in the file directory about where the currently valid copy is to be
found, either on the primary dump tapes or on the incremental
dump tapes. The file directories themselves, together with files
relating to the administration of the filing system, are also
copied at frequent intervals on to the incremental dump tapes.
The incremental dump tapes can, of course, be re-used as soon
as the next complete dump has taken place.
If, as is usual in a user-support filing system, a number of
files are highly active at any particular time, and the rest
inactive, and if the interval between complete dumps is fairly
long, the incremental dump tapes will come to contain a good
deal of information that is no longer valid. In these circumstances, it may be desirable to re-use an incremental dump tape
before a complete new dump has taken place; naturally, any
information on the tape about to be used that still has value
must be re-dumped. This type of procedure is particularly use. ful if the primary dumping is done on the basis of so much per
day rather than altogether at the end of a longer interval.
Since the file directories contain information not only about
where a givenfileis to be found on the disc but also about where
copies are to be found in the backup system, then it is possible
to design the file master in such a way that a copy is automatically called for if a file on disc should be found to be
corrupted.
Recovery from system failure
There is a close connection between the subject of integrity
and that of establishing effective re-start procedures for use
when a failure of a system has occurred. In fact, the system
just described does provide such a capability. One simply
re-loads the operating system, which will find that all files are
missing and will proceed systematically to re-load them. This,
however, is a very drastic form of re-start since all work in
progress is lost and must be re-initiated. It is, on most occasions,
more drastic than is necessary, since the files on the disc, apart
perhaps from one or two being accessed when the failure
occurred, are in no way corruputed; what has happened is that
the system has lost the information that enables them to be
accessed. Some of this information is in the file directories and
is covered by the dumping system just described; other information is, however, in normal operation held only in core, and
this is lost when a failure occurs. Provision should therefore be
made for a series of checkpoints at which such information can
be recorded at a fixed place on the disc or drum for use if a
re-start should become necessary. A set of re-start procedures
graded according to their severity can then be established. Some
will be invoked automatically and amount from the user's point
of view to no more than a hiccup; others will involve the use of
judgement by the operating staff. The cost of storing re-start
information at frequent intervals is high, and the trade-off
between interruption of service and expenditure of system
resources on the establishment of checkpoints must be weighed
in the light of operational requirements.
Integrity in data bases
The difficulty about discussing integrity in general terms is that
the problem is open-ended. It is possible to object to any
proposed solution by pointing to circumstances in which it
would fail. In specific cases, of course, the designer of the data
192
base can be provided with a specification to work to. Some
judgement is, however, needed in deciding to what extent
possible future developments should be taken into account
and what present costs should be incurred in providing for them.
All that can be done in a general discussion is to survey the
integrity problem and indicate some of the techniques that are
available for their solution. Many of these have already been
mentioned in discussing the special case of user-support filing
systems.
Damage to information stored in a data base can be caused by:
1. Unauthorised access.
2. Erroneous or incomplete update.
3. System malfunction.
1 is a matter of security rather than integrity and will not be
discussed here. Counter measures against 2 and 3 comprise:
1. Prevention (for example, catching errors by operator
procedure before damage has occurred).
2. Recovery, that is on-going action for self-preservation.
For the large data base that has been in existence for some time,
complete regeneration after error, that is, re-assembling the
information from original documents, is not usually possible;
it must, however, be possible to repair the data base after an
error has occurred.
Until recently, the storage medium for a data base was
magnetic tape. Since the use of magnetic tape imposes a
convenient subdivision into reels, and since a reel once written
can be used in a read-only manner, the problem of integrity can
be handled in a straightforward and effective way. When the
information contained on a reel is to be updated, that reel is
mounted on the computer with a switch set to read-only. The
updates have been punched on to cards or recorded on magnetic
tape. These updates are read into the computer along with
information from the original reel and the updated information
is written on to a fresh reel. The original reel can be stored
away with the updates for a new copy of the updated reel to be
prepared if anything happens to the original one. This is
sometimes described as the father/grandfather system. Additional security can be provided by keeping the tapes any distance
back into the past.
The fact that the problem was comparatively simple with
magnetic tape makes us all the less prepared for dealing with
the serious problem that arises when random-access discs are
used as the primary storage medium and advantage is taken of
the possibilities that then exist for instantaneous updating of
the data base. The problem is complicated by the fact that the
need for rapid access with a minimum of searching is leading to
the use of highly complex data structures (as distinct from
storage structures) so that there is a problem of maintaining the
structural consistency of the data base as well as of maintaining
the accuracy of the stored data.
In some circumstances, it is possible to opt out of these
problems by maintaining an old-fashioned file on magnetic tape
in parallel with one on a disc. The former is regarded as the
primary depository for the information. At intervals, for
example, each morning, information from the magnetic tape is
copied on to the disc. During the day the information on the
disc is interrogated from keyboards as required and updates
made on-line by the operators. These updates are also put on to
a magnetic tape reserved for the purpose. During the night, the
magnetic tapefileis updated with the updates that have accumulated during the day; the father/grandfather system is used for
integrity. Next morning, the disc is reloaded from the tapes and
so the operation proceeds.
The system just described involves the frequent copying of the
whole file on to the disc regardless of the fact that some parts
are inactive. Since the programs used for updating the file on
the disc and the master file on the tape are distinct, there is
The Computer Journal
some possibility of inconsistencies arising; this perhaps is not a
serious matter since the inconsistencies are eliminated as soon
as the disc is reloaded, It would obviously be possible to develop
the system in a variety of ways in order to improve its efficiency
and extend its range of applicability. However, it is probably
better to approach the matter from the opposite point of view
and to regard the disc rather than the tapes as containing the
primary version of the data. This was the point of view adopted
above when discussing the integrity of user-support filing
systems. The magnetic tapes used for backup purposes do not
necessarily contain an image of the information as structured
on the disc. On tape the information will most likely be arranged
in a manner suited to the backup function, whereas on disc
it will be arranged with a view to convenient interrogation and
updating.
An attempt will now be made to classify and assess the
measures that can be taken to monitor operations being performed on the data base so that, if information is lost, either
the system can recover it automatically or the user can be
assisted to do so. These measures are:
(a) Verification of operations with immediate repetition if in
error.
(i) Checks for transient hardware failures (for example,
when writing to magnetic tape).
(ii) Checking of keyed information
(a) against internal consistency (this implies some
redundancy in the keyed information);
(b) against previously recorded information (for
example, a customer's account number against his
name and address);
(c) for reasonableness (for example, an order for a
quantity of a certain commodity may be either
unreasonably large or unreasonably small).
(b) Redundant (error correcting) coding of information in the
data base. For example, row and column sum checks may
be added to data before they are placed in the data base.
On read back, any error is automatically corrected by
hardware or software. The record may be re-written so as
to eliminate the error or, alternatively, the error may be
allowed to remain.
(c) Periodic dumping of the entire data base. If the data base
is very large, this may not be feasible. It is wasteful if large
parts of the data base have very low activity. A dump may
be either physical, in which case an exact image of the
recorded information is transferred to some other medium,
or it may be logical, in which case the dumped records are
arranged in an order determined by the internal structure
of the data. With one exception to be mentioned later,
logical dumps are to be preferred.
(d) The use of journal tapes on which information is dumped
continuously while the system is in operation. Journals
may be classified as follows:
(i) transaction journals
(a) a record of all key strokes made by keyboard
operators, editing being either non-existent or
restricted to the deletion from the journal of errors
that are immediately corrected;
(b) a condensed summary of the transaction recorded
immediately before the update is made.
(ii) record journals on to which copies of records are
dumped. These are
(a) before journals, in which records are dumped before
being updated;
(b) after journals, in which records are dumped after
being updated.
Volume 15 Number 3
Journal tapes may be kept in duplicate, or even in triplicate,
if this is considered necessary. It should be noted that the effect
of a system failure can be to cause an entry in a journal to be
incorrectly terminated. It is very important that the software
should be so designed that an incorrectly terminated entry—or
for that matter any corrupt entry—should not make the reading
of other records on the same tape impossible. The system for
recovering information from tapes should be so designed that a
complete set of good copies of records that are as recent as
possible should be automatically obtained from the journal
tapes, using duplicates where necessary.
(c), (d)(i), and (rf)(ii) are partly overlapping and partly
complementary in their function. Whether all are required and
the degree of importance to be attached to each will depend on
the application for which the data base is intended.
In the case of very large data bases, complete dumps (c) will
be rare and may be non-existent. In this case, editing of the
journal tapes (d) is essential in order that outdated information
shall be purged from the system. This editing is an off-line
operation performed at intervals; out-of-date copies of records
are dropped (keeping the last two or three for additional
integrity) and the rest are arranged in logical order. Editing is
facilitated if at the time a record is written on to a journal tape
a separate record is made elsewhere in the system of its identifier. During the editing, new tapes are written and the old ones
can be preserved according to the usual father/grandfather
system.
The advantage of using a transaction journal in which key
strokes are recorded is that, if a system failure occurs before a
transaction has been completed, then after the restart the system
can prompt the operator by typing out the last few lines
received. Otherwise, she has to re-type the entire transaction
and confusion can sometimes occur as to whether the transaction has been completed or not. Since the repetition of a
transaction that has in fact been completed could in certain
circumstances give rise to an error in the data base, careful
system design and good operator training are necessary.
The use of a before or after journal can be dispensed with and
reliance placed on periodic dumps, together with a transaction
journal. In the event of a failure, the disc is re-loaded from the
latest dump tape and all transactions since the time of the
dump are repeated. Since, during this operation, the transactions come from magnetic tape instead of from operators'
keyboards, it will usually be found that the work of a whole day
can be repeated in a few minutes. This will not, however, be the
case if scientific style computation using much processor time is
involved, or if the system is under normal circumstances
limited by shortage of channel capacity. The routines used for
repetition are not necessarily the same as the routines used
during operation. In some applications, it is essential that
transactions should be repeated in the exact order in which they
were originally made, and it may be necessary to record in the
transaction journal additional information designed to enable
this requirement to be met.
Placement control
An independent reason for using a periodic dump of the information in a data base is in order that control over the placement
of records on the disc may be secured. This is sometimes necessary in order to make it possible to access contiguous records in
a file more quickly than could be done if they were scattered at
random over the disc. The requirements of placement control
and efficient use of space on the disc are, of course, directly
opposed. A partial solution to the problem can be achieved by
periodically re-writing the disc from the dump tapes with due
regard to optimum placement. As updating proceeds, the
quality of the placement will suffer until it is eventually restored
when the data are once more re-written from the dump.
193
Data base manager
Naturally, the designer of a recovery system will endeavour
to arrange that, as far as possible, recovery from damage to
information is automatic or, if not completely automatic, can
be effected by the use of documented procedures available to
the operating staff. Such methods can necessarily take account
only of formal properties of the data, in particular of redundancy either in structure or in information. They can take no
account of the meaning of the data, nor can they use knowledge
about the sources from which it has been compiled. It is essential that a large data base should be in the charge of a data
base manager, who not only understands the way the information is stored but also what it means and how it has been assembled. If formal methods of recovery fail, the data base
manager can perhaps use his knowledge to save the situation.
In less desperate situations, he may be able to use his knowledge to shortcut the cumbersome procedures that would
otherwise be necessary.
Although the system takes in its stride errors detected under
{a) and {b) in the last Section but one, statistics about their
occurrence should be passed to the data base manager in
order that, acting through the maintenance engineers or otherwise, he may have their causes removed as far as that is possible. He may, in addition, wish to copy information from
some storage unit, such as a disc pack, to another similar
unit if he suspects that, owing to equipment deterioration, the
information is at risk. It is in circumstances of this kind that
the making of a physical copy rather than a logical copy can
be justified.
The measures that have been outlined enable recovery to bemade if a record should prove unreadable, or if some system
failure should occur in the middle of a transaction and be
detected. There are obvious difficulties, however, if, as the
result of a fault, the console is temporarily out of communication with the system. In such circumstances, the operator has
little way of knowing what has happened and an appeal to a
master log, available to the data base manager, may have to be
made. The manager, if he is wise, will look carefully at what
occurred during periods when consoles were out of
communication.
Another circumstance in which an investigation on the part of
the data base manager is called for is when a record that ha&
been locked out preparatory to being updated remains locked
out for an unreasonable time. The procedure in updating is.
first to prepare the update as completely as possible without
accessing the record that is to be updated. A lockout flag is then
set to protect the record from being accessed by any other user.
The information in the record is then read, any necessary
checking done, and the update performed. The updated version
of the record is then inserted in the file and finally the lockout
is cancelled. If this cancellation takes place, then it must beassumed that the update has been correctly made. There should
be a time-out check to give warning if the lockout is not cancelled after a certain time. In these circumstances, anything may
have happened and an investigation on the part of the data basemanager is called for.
Bibliography
CORBATO, F. J., et al. (1963).
DALEY, R. D., and NEUMANN,
The compatible time-sharing system; a programmer's guide, Cambridge, Mass.: M.I.T. Press.
P. G. (1965). A general-purpose file system for secondary storage, AFIPS Conference Proc, Vol. 27, pp..
213-229.
ICL (1967). Operating systems George 3 and 4. Technical Publication 4125, ICL, London.
MYER, THEODORE H., and BARNABY, JOHN R. (1971). TENEX executive language; manual for users, Cambridge, Mass.: Bolt, Beranek and
Newman, Inc.
WILKES, M. V. (1972). Time-sharing computer systems, Second Edition, Chapter 8, London: Macdonald; New York: American Elsevier.
References
CODASYL DATA BASE TASK GROUP (1971). April 71 report. Obtainable from BCS or ACM.
DRAKE, R. W., and SMITH, J. L. (1971). Some techniques for file recovery, Australian Computer Journal, Vol. 3, pp. 162-170.
FRASER, A. G. (1969). Integrity of a mass storage filing system, The Computer Journal, Vol. 12, pp. 1-5.
Book review
Advanced Techniques for Strategic Planning, by Ernest C. Miller,
1971; 174 pages. {Bailey Bros. & Swinfen Ltd., £415)
This volume (American Management Association Research Study
No. 104) presents the results from a survey carried out during
1968-9 of American Companies or institutions who had reported
noteworthy developments in the area of strategic planning. The data
for this report came from (i) 40 responses to a questionnaire;
(ii) 41 interviews in 22 companies (eight of which also responded
under (i)); (iii) a literature survey.
The results are presented in terms of the date of the first use by the
firms of particular techniques (e.g. correlation and regression, simulation, risk analysis, etc.) in this area and |the current level of use
of the technique (numbers of firms). In addition the structures of
particular strategic planning models and examples of the usage of
194
particular techniques are presented. Much of the test is taken up by
verbatim quotes from managers of corporate or strategic planning
groups.
However, the technical level of the presentation of the material is
at that of the interested layman (or manager) rather than the
practitioner. The book would be very useful as a source of good
background material for an appreciation of the potential benefits,
pitfalls and jargon in the corporate or strategic planning area;
especially for managers on whom such plans are likely to impinge.
It will not assist the corporate manager in his planning methods and
procedures, but it could help to convince his colleagues and superiors of its relevance and usefulness. Nevertheless, such benefits
should be weighed against the high cost (£4-15) of a relatively lightweight volume.
J. R. EATON (London)
The Computer Journal
© Copyright 2026 Paperzz