Computer Data Processing of Medical Diagnoses in Pathology DEREK ENLANDER, M.D. Department of Clinical Pathology and Laboratory Medicine, University of California, San Francisco, California 94110 ABSTRACT Enlander, Derek: Computer data processing of medical diagnoses in pathology. Am J Clin Pathol 63: 538-544, 1975. Modes of insertion of pathology diagnoses into a computer data storage and retrieval system are reviewed. The conversion of free-flowing diagnostic sentences into internal code is considered, and the advantages of coding are discussed from two aspects: (a) to minimize storage, and (b) to help alleviate difficulties in retrieval of synonymous terminology. Methods of manually pre-coding diagnoses into Systemized Nomenclature of Pathology (SNOP) code are discussed. Data encoding produces a fixed format record which provides significant economy in data handling. The potential use of a real-time* visual display unit in data gathering and automatic coding is presented. (Key words: Coding; Computer; Pathology: Standard Nomenclature of Pathology.) rarely express the same diagnosis in exactly the same manner. Although a consistent syntax of diagnoses has been proposed by Ledley, 4 in most cases physicians show individuality in their choice of diagnostic terms. The processing of medical data from freeflowing diagnostic sentences entails problems that are similar to those encountered in a computer analysis of semantic structure. Natural language may be translated into predicate calculus formulas that can PHYSICIANS Received August 2, 1974; received revised manuscript October 21, 1974; accepted for publication October 21, 1974. This work was performed at Stanford University Medical Center, Stanford, California, and at Kaiser Foundation Research Institute, Oakland, California. Address reprint requests to Dr. Enlander, University of California Clinical Laboratories, Building 100, Room 279, San Francisco General Hospital, San Francisco, California 94110. * Certain terms in this article are "computerese"; it is difficult to avoid using these terms when describing certain computer processes. A glossary of the terms used is appended at the end of the article. 538 be computed by certain algorithms which represent interrelationships among the various components of a sentence. The coding of natural sentences has particular benefits for use in medicine. The diagnosis can be expressed in a fixed format, and the coding obviates the problem of the retrieval of synonyms, since synonyms are designated by the same code. Because such coding permits several lines of identification and diagnostic data to be condensed into a single line, it allows for economy in terms of both computer storage and data search. Arnold Pratt and Milos Pacak 5 developed a semantic method for the automatic processing of medical English that is related to the Katz-Fodor (K-F) semantic theory. This theory is based on a dictionary, and a system of rules that defines the grammatical relationship among words and phrases of a sentence. The lexicon base of the Pratt/Pacak sys- April 1975 T2--- RESPIRATORY SYSTEM |—i—i—r-i—I FIG. 1. Numeric hierarchy of SNOP code allows accurate correlation of both general and precise diagnostic entries. 539 COMPUTER DATA INSERTION 1ST DIGIT 1—i T26-- BRONCHIAL SYSTEM 2ND DIGIT |—i—i—]—i—I—i—i—n RIGHT LOWER BRONCHUS T264 |—I RIGHT LOWER POST. SEG. BRONCHUS tern is the Systemized Nomenclature of Pathology (SNOP) 1 —a special-purpose thesaurus created by pathologists to assist in the organization and retrieval of medical data. In SNOP coding, a conceptual semantic unit of the diagnosis is listed in one of four categories: topography (T) — the anatomic site of the disease; morphology (M)—the visible or microscopic change as a result of the disease; etiology (E)—the causative agent; trauma, bacteria, virus, drugs, etc.; function (F) — the signs and symptoms associated with the disease. Each of these categories is formed in a four-digit numeric hierarchy. For example, in the topography (T) category, the first digit defines the gross system, e.g., T2xxx represents the respiratory tract. T h e addition of the second digit further defines the location of the disease, e.g., T26xx represents the bronchial system. T h e third digit specifies a particular bronchus, e.g., T264x represents the right lower lobe bronchus. T h e fourth digit defines the precise location of the disease, e.g., T2645 represents the right lower lobe, posterior segmental bronchus (Fig. 1). Diagnoses can be encoded into each of the remaining three SNOP categories according to a similar hierarchical structure, although most diagnoses can be coded adequately on the basis of two entities: the location or T code, and the morphology of the lesion or M code. Lamson and Dimsdale 3 developed an alternative mode of transforming a diagnostic sentence into code. In their system, 3RD DIGIT 1—I—I—I—I—I—r-| T2645 4TH DIGIT every word in a sentence is transformed into a numeric code. This internal code is unique to the Lamson/Dimsdale system, and does not relate to other coding systems. Both the Pratt/Pacak and the Lamson/ Dimsdale modes of approach require massive computer resources to accomplish the transformation of a diagnostic sentence. Since such extensive resources are not available to most users, Enlander and Durbridge 2 developed a more economic mode of approach. T h e Enlander/Durbridge system establishes a basic rule that each diagnostic sentence must contain only one diagnosis. A computer program searches for certain pre-established key words in the diagnostic sentence according to a hierarchical structure that is based on the numeric hierarchy of the four-digit SNOP code. An example of the key word search, representing part of the search in the coding for malignant disease, is shown in Figure 2. This encoding procedure formats a diagnosis within the ten alpha-numeric digits of the (M) and (T) SNOP codes. For example, bronchogenic carcinoma of the left mainstem bronchus is coded as T2650M8013. A less specific diagnosis of carcinoma arising from the left bronchus is coded as T2600M8003. Since both diagnoses have the same initial code, both will be retrieved by a two-digit search. The patient identification information inserted into the data base includes the patient's name, age, and hospital number, 540 A.J.C.P. — Vol. 63 ENLANDER -0MA NO NEXT SECTION SEARCH -CARC- •ADENO- INFILT- NO •CYST- INFILTRATING CARCINOMA M8Q13 INFILTRATING ADENOCARCINOMA M8143 SARCOMA ->CONTINUES •MUCINADENOMA ADENOMATOUSPOLP ADENOMATOSIS M8140 ADENOCARCINOMA M8016 CARCINOMA M8013 CARCINOMA-INSITU M8012 YES MUCINOUS CYSTADENOMA PSEUDO-MUCINOUSCYSTADENOMA M8470 METASTATIC ADENOCARCINOMA M8140 SEROUSCYSTADENOMA PAPILLARY CYSTADENOMA M8440 FIG. 2. Part of the flow chart of the key word search in automatic SNOP encoding of a diagnosis. tldentifiersi si r •Diagnoses- FRED BLOGGS 47 YR 2 20175 8F502 DR.R.SMITH T807 3M2400 T9600M1130 T6300M1520 T2 800M4174 Hospital number Pathology number Clinician FlG. 3. Example of a typical SNOP code, multi-diagnoses entry. April 1975 541 COMPUTER DATA INSERTION the pathology d e p a r t m e n t accession number, the clinician's name, and the coded diagnosis. In an average entry, diagnostic coding can reduce these data from as many as ten lines to a single line (Fig. 3). The coding in the data base is searched periodically to provide an index table which lists the consecutive SNOP codes against their data base adress. A search for a particular SNOP code initially will derive the address or addresses in the data base from this consecutive matrix. T h e diagnosis then can be cross-searched against other diagnoses or patient identifiers, as necessary (Fig. 4). When this mode was applied to 500 diagnostic sentences, 2 the automated key word search encoded 75% of the sentences. Of the remaining 25%, 20% were unencoded, and 5% were wrongly encoded. However, all was not lost in the wrongly-encoded sentences. Because the encoding and retrieval procedures used the same encoding routines, errors in SNOP encoding of the initial diagnosis were duplicated on the retrieval code, and the search found a match due to the consistency of the error. As an alternative to the automatic encoding of the diagnostic sentence by the computer, the person making the diagnosis, or a secretary trained to the task, could code the sentence manually. After the diagnostic code is obtained, it can be entered into the data base with the patient's identification. In theory, then, the retrieval benefits that are obtainable with automatic encoding routines also will be obtained with manual encoding. However, the manual coding of diagnoses has certain drawbacks. The same diagnosis may be encoded differently by different coders, depending upon the depth of their interest in the process. Snop code case insertion ^ Matrix formation II /—^ f V Index matrix Case file >( OUTPUT J Search input snop code FIG. 4. Formation of an index matrix from SNOPcode to make data search more efficient. Miscodad Uncodtd 3.5% 5% Ambiguoui in 2 digits 4.57. Ambiguoui in 4 digili 11% Correctly coded 76% FIG. 5. Results of manual coding by pathology residents. in a four-digit search of diagnoses resulted in 76% successful retrieval (Fig. 5). Of the data submitted, 11 % were retrievable on a two-digit search, and 13% of the data were not retrievable. Further breakdown of the nonretrievable 13% showed 3.5% miscoded, 5% uncoded, and 4.5% ambiguous in two-digit search. No matter which mode of coding is used, the difficulty of entering diagnostic data is always present. Since entering diagnoses by keyboard or keypunch is tiresome and time-consuming, we considered that entering data through a cathode ray tube (CRT) terminal might be less tedious for the operator. In a pilot study at the Medical Methods Research Department, Kaiser Foundation Research Institute, in Oakland, CaliforIn a survey of manual SNOP encoding nia, we used a Saunders alpha-numeric by pathology residents at Stanford Uni- display terminal equipped with a light pen versity Medical Center, 2 the retrieval rate attachment. This terminal was part of a 542 A.J.C.P.—Vol. ENLANDER PATHOLOGY m \r EXAMINATION THE FOLLOWING SYSTEMS ARE 63 ABNORMAL RA#RA#RA»RA RA RA RA RA TFIC VALVE RV RV RV RV RV RV PV RV CARDIO-VASCULAR RESPIRATORY GASTRO-INTESTINAL GENITO-UR1NARY ENDOCRINE LIVER LA(LA«LA*LA LA LA LA LA AS AS AS AS MITRAL VALVE VS VS VS VS VS VS VS VS LV LV LV LV LVLV LV LV SPLEEN GRAIN AND CNS REPORT COMPLETE PLEASE FRINT yv. >/- PERICARDIAL SAC CONTAINS MINIMAL FLUID SEROUS FLUID PURULENT FLUID PLOOD OTHER SEE COMMENT SEROSAL SURFACE OF THE HEART SHOVED NORMAL APPFRANCE FIBRINOUS EXUDATE FIBROSED AREAS -NECROSIS RUPTURE OTHER SEE COMMENT CORONARY ARTERIES APPEAPED PATENT SCLEROSED THROMBOSED RV LV RRRRRRPRR RPRRRRF.RRR RRRR RRRR RRfR LATRRPR PRRR RRRR RRRR RRRR RRRR RRPRRFRRR RPRRRFRR SSS ssr sss SSS SSS sss sss SSS SSS SSS SSS SSS LLI.LLLLLLLL LLLLLLLLLLLL LLLLLLLLLLLLL LLLLLL LLLLLL LLLLLl LLLLLL LLLLLL LIlLLLL LLLLLL LLLLLLLLLLLL LLILLLLLLLL LLLLLLLLLL \ NECROSIS LEFT VENTRICLF.POSTERIOR T32 M?«f) T33A7 FIG. 6. Steps in entry of diagnosis, necrosis in early myocardial infarction, left ventricle, posterior aspect, as seen on computer video terminal. Saunders Clini-Call experimental system 6 programmed in FOPS. T h e Central Processing Unit (CPU) was a Honeywell 516 minicomputer, with a Honeywell 416 as an 110 controller. When the light pen was pressed against a selected part of the screen, a signal was created, causing the program to branch to the next appropriate display (Fig. 6). In this way, when a diagnosis was inserted, the program branched to select the appropriate displays and, ultimately, the diagnosis was April 1975 543 COMPUTER DATA INSERTION A C r ARTERIOSCLEROSIS CORONARY ARTERIES APPEARED -SCLEROSED THROMBOSED < 25X OCCLUSION 25 TO SBX OCCLUSION • 51 TO 75X OCCLUSION 76 TO 99X OCCLUSION TOTAL OCCLUSION CORONARY ARTrRIES RIGHT C C C C C C 0 0 C C 0 OL C C 0 L L L CC I L CC I L CC L C A ANT A DESC RRRRRRRPRRRRRRO 0 p P P P P P P P P P P P P F e a RF.RRRRRRRRRRRB P P P P P P ^ P P POSt P DISC P P P r ARTERIOSCLFROSISa 51-75X OCCLUSION T O HJ28H POSTERIOR EFSC BRANCH,KICHT CCPONARY ART T4J2I VFIG. 7. Steps in the entry of the diagnosis "Arteriosclerosis." encoded into SNOP code. The position of the light pen created a branch decision indicated in the program, and also constituted a locus of reference in a look-up table. This table translated each of 1,024 points of the CRT "graphic" into SNOP code. The code was displayed and printed on demand, after insertion of the diagnosis. T o illustrate this process, the following diagnosis will be inserted: Necrosis of the posterior aspect of the left ventricle and arteriosclerosis of the posterior descending branch of the left coronary artery with 75% occlusion. —On light-penning the cardiovascular system, the program branches to an initial broad description of cardiovascular pathology (Fig. 6/4). — Necrosis of the heart is light-penned on the next display (Fig. 67?). — T h e program branches to display an outline of the heart (Fig. 6C). —Although the terminal does not display true graphics, a crude graphic outline can be drawn using alphanumerics. When the anatomic site of the disease is light-penned on the outline, the program selects a display that shows a cross-sectional view at that level of the right and left ventricles. The posterior part of the left ventricle is light-penned as the site of the disease process (Fig. 6D). — T h e SNOP-coded diagnosis appears as the insertion is completed (Fig. 6£). 544 ENLANDER T o illustrate t h e process by which severity may be d e t e r m i n e d , t h e diagnosis C o r o n a r y A r t e r y Disease will be e n t e r e d in a similar m a n n e r . — T h e initial d e s c r i p t i o n is l i g h t p e n n e d (Fig. 7A), which creates a b r a n c h to d e t e r m i n e t h e severity of t h e lesion. — T h e site of t h e lesion is r e p r e s e n t e d in a c r u d e g r a p h i c t h a t depicts the two c o r o n a r y a r t e r i e s (Fig. 7B) a n d the severity of t h e lesion (Fig. 7C). — T h e S N O P - c o d e d diagnosis a p p e a r s as the insertion is c o m p l e t e d (Fig. ID). T h e s t a n d a r d S N O P c o d e d o e s n o t include the severity of the lesion in t h e four-digit n u m e r a l . M 5 2 0 0 depicts arteriosclerosis. A n e x t r a S N O P c o d e digit was used as a suffix to depict the severity of t h e lesion, t h e suffix M 5 2 0 0 i d e p i c t i n g 2 5 % occlusion; M52002 d e p i c t i n g 5 0 % occlusion; M 5 2 0 0 3 d e p i c t i n g 7 5 % occlusion; M 5 2 0 O / d e p i c t i n g total occlusion. Each of t h e diagnoses d e s c r i b e d above could b e i n s e r t e d a n d c o d e d in 15 seco n d s . I n a similar m a n n e r , d i a g n o s e s of o t h e r systems a n d disease entities can be c o d e d a n d inserted into a data base. Several aspects of this system d e s e r v e f u r t h e r r e f i n e m e n t ; for e x a m p l e , a t r u e g r a p h i c t e r m i n a l a n d a d v a n c e d software c o u l d g e n e r a t e a g r a p h i c display t h a t would p o r t r a y o r g a n s in various a t t i t u d e s , r o t a t i o n s , cross-sections, a n d magnifications. T h i s display-terminal a p p r o a c h to comp u t e r i z e d e n c o d i n g is still in a very early stage of d e v e l o p m e n t . H o w e v e r , even with its p r e s e n t lack o f sophistication, it provides a n exciting a n d effective m o d e of a p p r o a c h in o v e r c o m i n g s o m e of t h e p r o b l e m s of diagnostic e n t r y a n d e n c o d ing. A.J.C.P.—Vol. 63 Acknowledgment. Valuable assistance was provided by Dr. Timothy Durbridge, Dr. Garth McBride, Dr. Morris Collen, Ms. Sandra Emerson, and Ms. Susan Eastwood. References 1. College of American Pathologists: Systemized Nomenclature of Pathology. Chicago, College of American Pathologists, 1965 2. Enlander D, Durbridge TC: Evaluation of SNOP coding of pathologic data for computer retrieval. Lab Med 10:400-401, 1969 3. Lamson BG, Dimsdale B: A natural language information retrieval system. Proc IEEE 54:1636-1637, 1966 4. Ledley RS: Syntax-directed concept analysis in the reasoning foundations of medical diagnosis. Comp Biol Med 3:89-99, 1973 5. Pratt AW, Pacak M: Identification and transformation of terminal morphemes in medical English. Meth Info Med 8:84-90, 1969 6. Van Brunt EE, Collen MF, Davis LS: Kaiser Permanente Hospital computer systems, Hospital Computer Systems. New York, John Wiley and Sons (in press) GLOSSARY Branch Decision. A point in a computer program where various alternatives exist. Central Processing Unit (CPU). T h e central computer, usually the main computer of a system. Cathode Ray Tube (CRT). Used in terminal to display data. Data Base. Data stored for future use in the computer system. Hardware. The machinery of a computer system enacting the CPU, the I/O, etc. I/O. Input/Output, various devices to gather and disperse information. Light Pen. A manual pointer which is an option on certain display terminals. A photoelectric cell senses a point on the CRT and transmits a signal to the computer. Real Time. Occurring at the moment of an event. Software. A computer completes a task in a predetermined series of steps. This series of steps or instructions is formulated by a program of events, detailing the order and nature of each step. The series of instructions is known as a computer program or software, and is entered initially either by paper tape, computer cards, or keyboard.
© Copyright 2026 Paperzz