Computer Data Processing of Medical Diagnoses in Pathology

Computer Data Processing of Medical
Diagnoses in Pathology
DEREK ENLANDER,
M.D.
Department of Clinical Pathology and Laboratory Medicine, University of California,
San Francisco, California 94110
ABSTRACT
Enlander, Derek: Computer data processing of medical diagnoses in
pathology. Am J Clin Pathol 63: 538-544, 1975. Modes of insertion of
pathology diagnoses into a computer data storage and retrieval system are
reviewed. The conversion of free-flowing diagnostic sentences into internal
code is considered, and the advantages of coding are discussed from two
aspects: (a) to minimize storage, and (b) to help alleviate difficulties in
retrieval of synonymous terminology. Methods of manually pre-coding
diagnoses into Systemized Nomenclature of Pathology (SNOP) code are
discussed. Data encoding produces a fixed format record which provides
significant economy in data handling.
The potential use of a real-time* visual display unit in data gathering and
automatic coding is presented. (Key words: Coding; Computer; Pathology:
Standard Nomenclature of Pathology.)
rarely express the same
diagnosis in exactly the same manner.
Although a consistent syntax of diagnoses
has been proposed by Ledley, 4 in most
cases physicians show individuality in
their choice of diagnostic terms. The
processing of medical data from freeflowing diagnostic sentences entails problems that are similar to those encountered
in a computer analysis of semantic structure. Natural language may be translated
into predicate calculus formulas that can
PHYSICIANS
Received August 2, 1974; received revised manuscript October 21, 1974; accepted for publication
October 21, 1974.
This work was performed at Stanford University
Medical Center, Stanford, California, and at Kaiser
Foundation Research Institute, Oakland, California.
Address reprint requests to Dr. Enlander, University of California Clinical Laboratories, Building 100,
Room 279, San Francisco General Hospital, San
Francisco, California 94110.
* Certain terms in this article are "computerese"; it
is difficult to avoid using these terms when describing certain computer processes. A glossary of the
terms used is appended at the end of the article.
538
be computed by certain algorithms which
represent interrelationships among the
various components of a sentence.
The coding of natural sentences has
particular benefits for use in medicine.
The diagnosis can be expressed in a fixed
format, and the coding obviates the problem of the retrieval of synonyms, since
synonyms are designated by the same
code. Because such coding permits several
lines of identification and diagnostic data
to be condensed into a single line, it allows
for economy in terms of both computer
storage and data search.
Arnold Pratt and Milos Pacak 5 developed a semantic method for the automatic processing of medical English that
is related to the Katz-Fodor (K-F) semantic theory. This theory is based on a
dictionary, and a system of rules that
defines the grammatical relationship
among words and phrases of a sentence.
The lexicon base of the Pratt/Pacak sys-
April 1975
T2---
RESPIRATORY SYSTEM
|—i—i—r-i—I
FIG. 1. Numeric hierarchy of SNOP code allows
accurate correlation of
both general and precise
diagnostic entries.
539
COMPUTER DATA INSERTION
1ST DIGIT
1—i
T26--
BRONCHIAL SYSTEM
2ND DIGIT
|—i—i—]—i—I—i—i—n
RIGHT LOWER BRONCHUS
T264
|—I
RIGHT LOWER POST.
SEG. BRONCHUS
tern is the Systemized Nomenclature of
Pathology (SNOP) 1 —a special-purpose
thesaurus created by pathologists to assist
in the organization and retrieval of medical data.
In SNOP coding, a conceptual semantic
unit of the diagnosis is listed in one of
four categories: topography (T) — the
anatomic site of the disease; morphology
(M)—the visible or microscopic change as
a result of the disease; etiology (E)—the
causative agent; trauma, bacteria, virus,
drugs, etc.; function (F) — the signs and
symptoms associated with the disease.
Each of these categories is formed in a
four-digit numeric hierarchy. For example, in the topography (T) category, the first
digit defines the gross system, e.g., T2xxx
represents the respiratory tract. T h e addition of the second digit further defines
the location of the disease, e.g., T26xx
represents the bronchial system. T h e
third digit specifies a particular bronchus,
e.g., T264x represents the right lower lobe
bronchus. T h e fourth digit defines the
precise location of the disease, e.g., T2645
represents the right lower lobe, posterior
segmental bronchus (Fig. 1).
Diagnoses can be encoded into each of
the remaining three SNOP categories according to a similar hierarchical structure,
although most diagnoses can be coded
adequately on the basis of two entities: the
location or T code, and the morphology
of the lesion or M code.
Lamson and Dimsdale 3 developed an
alternative mode of transforming a diagnostic sentence into code. In their system,
3RD DIGIT
1—I—I—I—I—I—r-|
T2645
4TH DIGIT
every word in a sentence is transformed
into a numeric code. This internal code is
unique to the Lamson/Dimsdale system,
and does not relate to other coding
systems.
Both the Pratt/Pacak and the Lamson/
Dimsdale modes of approach require
massive computer resources to accomplish
the transformation of a diagnostic sentence. Since such extensive resources are
not available to most users, Enlander and
Durbridge 2 developed a more economic
mode of approach.
T h e Enlander/Durbridge system establishes a basic rule that each diagnostic
sentence must contain only one diagnosis.
A computer program searches for certain
pre-established key words in the diagnostic sentence according to a hierarchical
structure that is based on the numeric
hierarchy of the four-digit SNOP code.
An example of the key word search,
representing part of the search in the
coding for malignant disease, is shown in
Figure 2.
This encoding procedure formats a
diagnosis within the ten alpha-numeric
digits of the (M) and (T) SNOP codes. For
example, bronchogenic carcinoma of the
left mainstem bronchus is coded as
T2650M8013. A less specific diagnosis of
carcinoma arising from the left bronchus
is coded as T2600M8003. Since both
diagnoses have the same initial code, both
will be retrieved by a two-digit search.
The patient identification information
inserted into the data base includes the
patient's name, age, and hospital number,
540
A.J.C.P. — Vol. 63
ENLANDER
-0MA
NO
NEXT SECTION SEARCH
-CARC-
•ADENO-
INFILT-
NO
•CYST-
INFILTRATING
CARCINOMA
M8Q13
INFILTRATING
ADENOCARCINOMA
M8143
SARCOMA ->CONTINUES
•MUCINADENOMA
ADENOMATOUSPOLP
ADENOMATOSIS
M8140
ADENOCARCINOMA
M8016
CARCINOMA
M8013
CARCINOMA-INSITU
M8012
YES
MUCINOUS CYSTADENOMA
PSEUDO-MUCINOUSCYSTADENOMA
M8470
METASTATIC
ADENOCARCINOMA
M8140
SEROUSCYSTADENOMA
PAPILLARY CYSTADENOMA
M8440
FIG. 2. Part of the flow chart of the key word search in automatic SNOP encoding of a diagnosis.
tldentifiersi
si r
•Diagnoses-
FRED BLOGGS 47 YR 2 20175 8F502 DR.R.SMITH T807 3M2400 T9600M1130 T6300M1520 T2 800M4174
Hospital
number
Pathology
number
Clinician
FlG. 3. Example of a typical SNOP code, multi-diagnoses entry.
April 1975
541
COMPUTER DATA INSERTION
the pathology d e p a r t m e n t accession
number, the clinician's name, and the
coded diagnosis. In an average entry,
diagnostic coding can reduce these data
from as many as ten lines to a single line
(Fig. 3).
The coding in the data base is searched
periodically to provide an index table
which lists the consecutive SNOP codes
against their data base adress. A search
for a particular SNOP code initially will
derive the address or addresses in the
data base from this consecutive matrix.
T h e diagnosis then can be cross-searched
against other diagnoses or patient identifiers, as necessary (Fig. 4).
When this mode was applied to 500
diagnostic sentences, 2 the automated key
word search encoded 75% of the sentences. Of the remaining 25%, 20% were
unencoded, and 5% were wrongly encoded. However, all was not lost in the
wrongly-encoded sentences. Because the
encoding and retrieval procedures used
the same encoding routines, errors in
SNOP encoding of the initial diagnosis
were duplicated on the retrieval code, and
the search found a match due to the
consistency of the error.
As an alternative to the automatic encoding of the diagnostic sentence by the
computer, the person making the diagnosis, or a secretary trained to the task,
could code the sentence manually. After
the diagnostic code is obtained, it can be
entered into the data base with the patient's identification. In theory, then, the
retrieval benefits that are obtainable with
automatic encoding routines also will be
obtained with manual encoding. However, the manual coding of diagnoses has
certain drawbacks. The same diagnosis
may be encoded differently by different
coders, depending upon the depth of
their interest in the process.
Snop code
case
insertion
^
Matrix
formation
II
/—^
f
V
Index
matrix
Case
file
>( OUTPUT J
Search
input
snop code
FIG. 4. Formation of an index matrix from SNOPcode
to make data search more efficient.
Miscodad
Uncodtd
3.5%
5%
Ambiguoui
in 2 digits
4.57.
Ambiguoui
in 4 digili
11%
Correctly
coded
76%
FIG. 5. Results of manual coding by pathology
residents.
in a four-digit search of diagnoses resulted in 76% successful retrieval (Fig. 5).
Of the data submitted, 11 % were retrievable on a two-digit search, and 13% of the
data were not retrievable. Further breakdown of the nonretrievable 13% showed
3.5% miscoded, 5% uncoded, and 4.5%
ambiguous in two-digit search.
No matter which mode of coding is
used, the difficulty of entering diagnostic
data is always present. Since entering
diagnoses by keyboard or keypunch is
tiresome and time-consuming, we considered that entering data through a cathode
ray tube (CRT) terminal might be less
tedious for the operator.
In a pilot study at the Medical Methods
Research Department, Kaiser Foundation
Research Institute, in Oakland, CaliforIn a survey of manual SNOP encoding nia, we used a Saunders alpha-numeric
by pathology residents at Stanford Uni- display terminal equipped with a light pen
versity Medical Center, 2 the retrieval rate attachment. This terminal was part of a
542
A.J.C.P.—Vol.
ENLANDER
PATHOLOGY
m
\r
EXAMINATION
THE FOLLOWING SYSTEMS ARE
63
ABNORMAL
RA#RA#RA»RA
RA
RA
RA
RA
TFIC VALVE
RV
RV
RV
RV
RV
RV
PV
RV
CARDIO-VASCULAR
RESPIRATORY
GASTRO-INTESTINAL
GENITO-UR1NARY
ENDOCRINE
LIVER
LA(LA«LA*LA
LA
LA
LA
LA
AS
AS
AS
AS
MITRAL VALVE
VS
VS
VS
VS
VS
VS
VS
VS
LV
LV
LV
LV
LVLV
LV
LV
SPLEEN
GRAIN AND CNS
REPORT COMPLETE
PLEASE FRINT
yv.
>/-
PERICARDIAL SAC CONTAINS
MINIMAL FLUID
SEROUS
FLUID
PURULENT FLUID
PLOOD
OTHER SEE COMMENT
SEROSAL SURFACE OF THE HEART SHOVED
NORMAL APPFRANCE
FIBRINOUS EXUDATE
FIBROSED AREAS
-NECROSIS
RUPTURE
OTHER SEE COMMENT
CORONARY ARTERIES
APPEAPED
PATENT
SCLEROSED
THROMBOSED
RV
LV
RRRRRRPRR
RPRRRRF.RRR
RRRR
RRRR
RRfR
LATRRPR
PRRR
RRRR
RRRR
RRRR
RRRR
RRPRRFRRR
RPRRRFRR
SSS
ssr
sss
SSS
SSS
sss
sss
SSS
SSS
SSS
SSS
SSS
LLI.LLLLLLLL
LLLLLLLLLLLL
LLLLLLLLLLLLL
LLLLLL
LLLLLL
LLLLLl
LLLLLL
LLLLLL
LIlLLLL
LLLLLL
LLLLLLLLLLLL
LLILLLLLLLL
LLLLLLLLLL
\
NECROSIS
LEFT
VENTRICLF.POSTERIOR
T32 M?«f)
T33A7
FIG. 6. Steps in entry of diagnosis, necrosis in early myocardial infarction, left ventricle, posterior aspect,
as seen on computer video terminal.
Saunders Clini-Call experimental system 6
programmed in FOPS. T h e Central Processing Unit (CPU) was a Honeywell 516
minicomputer, with a Honeywell 416 as
an 110 controller. When the light pen was
pressed against a selected part of the
screen, a signal was created, causing the
program to branch to the next appropriate display (Fig. 6). In this way, when a
diagnosis was inserted, the program
branched to select the appropriate displays and, ultimately, the diagnosis was
April 1975
543
COMPUTER DATA INSERTION
A
C
r
ARTERIOSCLEROSIS
CORONARY ARTERIES APPEARED
-SCLEROSED
THROMBOSED
< 25X
OCCLUSION
25 TO SBX
OCCLUSION
• 51 TO 75X
OCCLUSION
76 TO 99X
OCCLUSION
TOTAL
OCCLUSION
CORONARY ARTrRIES
RIGHT
C C
C C
C C
0 0
C C
0 OL
C C
0 L
L
L
CC
I
L
CC
I
L CC
L
C
A ANT
A DESC
RRRRRRRPRRRRRRO 0
p
P
P
P
P
P
P
P
P
P
P
P
P
F
e a
RF.RRRRRRRRRRRB
P
P
P
P
P
P ^
P
P POSt
P DISC
P
P
P
r
ARTERIOSCLFROSISa 51-75X
OCCLUSION
T O HJ28H
POSTERIOR EFSC BRANCH,KICHT CCPONARY ART
T4J2I
VFIG. 7. Steps in the entry of the diagnosis "Arteriosclerosis."
encoded into SNOP code. The position of
the light pen created a branch decision
indicated in the program, and also constituted a locus of reference in a look-up
table. This table translated each of 1,024
points of the CRT "graphic" into SNOP
code. The code was displayed and printed
on demand, after insertion of the diagnosis.
T o illustrate this process, the following
diagnosis will be inserted: Necrosis of the
posterior aspect of the left ventricle and
arteriosclerosis of the posterior descending branch of the left coronary artery with
75% occlusion.
—On light-penning the cardiovascular
system, the program branches to an
initial broad description of cardiovascular pathology (Fig. 6/4).
— Necrosis of the heart is light-penned
on the next display (Fig. 67?).
— T h e program branches to display an
outline of the heart (Fig. 6C).
—Although the terminal does not display true graphics, a crude graphic
outline can be drawn using alphanumerics. When the anatomic site of
the disease is light-penned on the
outline, the program selects a display
that shows a cross-sectional view at
that level of the right and left ventricles. The posterior part of the left
ventricle is light-penned as the site of
the disease process (Fig. 6D).
— T h e SNOP-coded diagnosis appears
as the insertion is completed (Fig.
6£).
544
ENLANDER
T o illustrate t h e process by which severity
may be d e t e r m i n e d , t h e diagnosis C o r o n a r y A r t e r y Disease will be e n t e r e d in a
similar m a n n e r .
— T h e initial d e s c r i p t i o n is l i g h t p e n n e d (Fig. 7A), which creates a
b r a n c h to d e t e r m i n e t h e severity of
t h e lesion.
— T h e site of t h e lesion is r e p r e s e n t e d
in a c r u d e g r a p h i c t h a t depicts the
two c o r o n a r y a r t e r i e s (Fig. 7B) a n d
the severity of t h e lesion (Fig. 7C).
— T h e S N O P - c o d e d diagnosis a p p e a r s
as the insertion is c o m p l e t e d (Fig.
ID).
T h e s t a n d a r d S N O P c o d e d o e s n o t include the severity of the lesion in t h e
four-digit n u m e r a l . M 5 2 0 0 depicts arteriosclerosis. A n e x t r a S N O P c o d e digit
was used as a suffix to depict the severity
of t h e lesion, t h e suffix M 5 2 0 0 i d e p i c t i n g
2 5 % occlusion; M52002 d e p i c t i n g 5 0 %
occlusion; M 5 2 0 0 3 d e p i c t i n g 7 5 % occlusion; M 5 2 0 O / d e p i c t i n g total occlusion.
Each of t h e diagnoses d e s c r i b e d above
could b e i n s e r t e d a n d c o d e d in 15 seco n d s . I n a similar m a n n e r , d i a g n o s e s of
o t h e r systems a n d disease entities can be
c o d e d a n d inserted into a data base.
Several aspects of this system d e s e r v e
f u r t h e r r e f i n e m e n t ; for e x a m p l e , a t r u e
g r a p h i c t e r m i n a l a n d a d v a n c e d software
c o u l d g e n e r a t e a g r a p h i c display t h a t
would p o r t r a y o r g a n s in various a t t i t u d e s ,
r o t a t i o n s , cross-sections, a n d magnifications.
T h i s display-terminal a p p r o a c h to comp u t e r i z e d e n c o d i n g is still in a very early
stage of d e v e l o p m e n t . H o w e v e r , even
with its p r e s e n t lack o f sophistication, it
provides a n exciting a n d effective m o d e
of a p p r o a c h in o v e r c o m i n g s o m e of t h e
p r o b l e m s of diagnostic e n t r y a n d e n c o d ing.
A.J.C.P.—Vol.
63
Acknowledgment. Valuable assistance was provided
by Dr. Timothy Durbridge, Dr. Garth McBride, Dr.
Morris Collen, Ms. Sandra Emerson, and Ms. Susan
Eastwood.
References
1. College of American Pathologists: Systemized
Nomenclature of Pathology. Chicago, College
of American Pathologists, 1965
2. Enlander D, Durbridge TC: Evaluation of
SNOP coding of pathologic data for computer
retrieval. Lab Med 10:400-401, 1969
3. Lamson BG, Dimsdale B: A natural language
information retrieval system. Proc IEEE
54:1636-1637, 1966
4. Ledley RS: Syntax-directed concept analysis in
the reasoning foundations of medical diagnosis. Comp Biol Med 3:89-99, 1973
5. Pratt AW, Pacak M: Identification and transformation of terminal morphemes in medical
English. Meth Info Med 8:84-90, 1969
6. Van Brunt EE, Collen MF, Davis LS: Kaiser
Permanente Hospital computer systems, Hospital Computer Systems. New York, John
Wiley and Sons (in press)
GLOSSARY
Branch Decision. A point in a computer program where various alternatives exist.
Central Processing Unit (CPU). T h e central
computer, usually the main computer of a
system.
Cathode Ray Tube (CRT). Used in terminal to
display data.
Data Base. Data stored for future use in the
computer system.
Hardware. The machinery of a computer
system enacting the CPU, the I/O, etc.
I/O. Input/Output, various devices to gather
and disperse information.
Light Pen. A manual pointer which is an option
on certain display terminals. A photoelectric
cell senses a point on the CRT and transmits a
signal to the computer.
Real Time. Occurring at the moment of an
event.
Software. A computer completes a task in a
predetermined series of steps. This series of
steps or instructions is formulated by a program of events, detailing the order and nature
of each step. The series of instructions is
known as a computer program or software,
and is entered initially either by paper tape,
computer cards, or keyboard.