Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University Outline Background: the CLEF project Patient records as data-encoded patient histories Role of NLG in CLEF Intuitive querying with natural language Generating tailored reports from CLEF data Background: the CLEF project CLEF (Clinical E-Science Framework) is an MRCfunded project aiming at providing a repository of well organised data-encoded clinical histories Aim: to provide the framework for a new type of medical research: in silico experiments Partners: NLP: OU, Sheffield Medical informatics: Manchester Electronic Health Records: Royal Marsden Hospital, UCL Privacy/confidentiality: Cambridge GRID Collect clinical information from multiple sites Analyse, structure and integrate it Make it available, using GRID tools To authorised clinicians and eHealth scientists In a secure and ethical collaborative framework The CLEF repository Data from: • Referral letters • Review notes • Lab results • Nurse notes • Hospital admission notes • Hospital discharge notes • Treatment notes • Surgery reports Repository Chronicle Organised data on individual patients The CLEF Chronicle Representing the story of a patient over time The story of an illness Human:1382 Pain:5735 Ulcer:1945 locus locus attends reason locus reason attends finding attends Breast:1492 Clinic:4096 reason plans Clinic:1024 plans plans reason locus Biopsy:1066 target finding time Clinic:2010 reason Radio:1812 plans Chemo:6502 treats reason Mass:1666 plans treats locus time Cancer:1914 time time time time time time A typical cancer patient ~200 15 Problems SimID SimID SimID ID Interventions ~100 Investigations ~5 Drugs ~10 Consults ~20 ~600 Loci Relations EventSta ExisteSimID Clinical Genot TumourMark NodesCo Item2ID NodesIn ID EventEnEventStartDate EventStartDate SimIDID Status Item1Type EventEndDate Item1ID Name Relation Status Item2Type ID SimID EventEndDate Name Outcome ID Name Histology Laterality IDName EventStartDate EventEndDate TypeStatus Location SimID NamemmSi Status Regime Grade rtDate dDate 3320133511 2342511987 2342511939 3320133511 3320133512 449 3320133512 3320133511 2342511938 primary 33201 23425 1 cancer 33511 3320133512 11936 449 3320133511 2342511990 2342511943 3320133511 3320133512 3320133511 2342511946 33201 3320133512 23425 3320133511 33511 11940 3320133511 33201 3320133512 23425 33511 11944 3320133511 3320133512 33201 23425 3320133511 33511 3320133512 11948 131 131 cancer 2342511947 3320133511 2342511997 449 3320133512 3320133511 2342511986 nce Course ype ze er unted volved PROBLEM 131 131 2342511936 131 package XrayHAS_LOCUS completed LOCUS 457 primaryepirubicin treatment completed successful 3322572593 2347911036 breast R2342511937 2342512318 daily mammography screening clinic 0 -1 BRCA 131 5.8completed 0 invasive oestrogen 0 1 +vecyclophosphamide receptor +ve 449 lumpectomy completed complete LOCUS 287 443 2342511937 287 completed XrayHAS_LOCUS completed PATIENT 3322572593 2347911042 chest tubular 2342512319 dailyexcision 3320133511 443 mammography screening clinic adeno 0 449 0 2342511941 CONSULT 443 2342511938 443 therapy XrayARRANGED completed CONSULT 0 hormone5-fluorouracil anatagonist 2342512320 daily 3322572593 2347911044 blood 449 completed initialstarted treatment planning clinic 0 2342511951 3320133511 3320133512 3320133511 2342512020 2342512287 197 287 287 cancer INVESTIGATION 446 465 2342511939 446 package XrayHAS_TARGET completed LOCUS 2342512479 2347911046 haemoglobin concentration daily 465 completed follow up 205 primarydoxorubicin treatment completed successful 0 3322572593 2342511953 3320133512 3320133511 3320133511 2342512064 2342512290 197 443 443 cancer 2342511971 2342512108 3320133511 3320133511 2342512316 211 3320133511 33201 3320133512 23425 33511 11955 3320133511 33201 3320133512 23425 3320133511 33511 11959 3320133512 33201 23425 3320133511 33511 11978 3320133512 3320133511 33201 23425 33511 11979 3320133512 3320133511 33201 23425 33511 3320133512 11980 3320133511 2342511973 3320133511 3320133511 2342512152 446 446 abnormality 2342512317 218 INVESTIGATION 446 489 2342512818 epirubicin 2342511939 446 completed testing HAS_FINDING completed PROBLEM daily excision 2342511940 3322572593 2347911048 leucocyte count 489 followcompleted up clinic 197 radical mastectomy incomplete -1 BRCA 0 0 invasive oestrogen 0 0 3322572593 2347911050 platelet count PROBLEM 446 545 446 completed examination HAS_LOCUS completed LOCUS 2342511937 545 followcompleted up clinic 1 +ve 2342511940 tubular receptor +ve 215 radiotherapy course adeno 3322572593 2347911052 GGT concentration CONSULT 446 2342511942 446 completed examination ARRANGED completed CONSULT followcompleted up clinic 0 0 633 0 2342511945 225 chemotherapy633 course 33201 23425 3320133511 33511 3320133512 11982 33201 23425 3320133511 33511 3320133512 11984 3320133511 33201 23425 33511 3320133512 11989 3320133511 33201 3320133512 23425 3320133511 33511 11993 3320133512 3320133511 33201 23425 3320133511 33511 3320133512 11996 2342511974 2342512196 3320133511 3320133511 2342512348 446 abnormality 213 446 2342511976 2342512222 3320133511 3320133511 2342512349 446 214 446 cancer recurr 2342511988 2342512229 3320133511 3320133511 ent 2342512350 216 3320133511 446 2342511991 446 lymphaden opathy 2342512373 217 2342511992 enlargement 3320133511 446 446 2342512375 219 2342511994 3320133511 446 446 enlargement 2342512000 3320133511 2342512377 219 446 2342512001 446 abnormality 3320133511 2342512378 221 449 2342512010 449 cancer 3320133511 stage1 2342512379 222 2342512012 3320133511 2342512381 449 222 449 cancer 2342512013 3320133511 2342512382 224 2342512021 3320133511 449 449 lymphnode metast 2342512383 225 2342512022 count 3320133511 atic 33201 23425 457 457 abnormality 3320133511 2342512475 2342512031 3320133511 3320133512 298 33511 12002 3322572593 2347911054 Bilirubin concentration INVESTIGATION 446 737 2342511943 446 completed examination HAS_TARGET completed LOCUS 737 follow up 213 radiotherapy cycle completed 0 3322572593 2347911056 Alkaline Phosphatase concentration INVESTIGATION 446 841 2342511943 446 completed examination HAS_FINDING completed PROBLEM 841 mammography screening 214 radiotherapy cycle completed 0 2347911058 Creatinine PROBLEM 449 3322572593 2342511944 449 cancer HAS_LOCUS staging concentration completed LOCUS 0 0 scheduled mammography screening 216 chemotherapy cycle completed 2347911060 ESR concentrationcompleted CONSULT CONSULT 449 2342511946 449 excision ARRANGED biopsy 0 3322572593 217 chemotherapy cycle histopathology completed 2347911062 axillary lymphnodes INVESTIGATION 449 3322572593 2342511947 449 HAS_FINDING completed PROBLEM 0 2347911065 abdomen 219 chemotherapy cycle excision deferred PROBLEM 449 3322572593 2342511948 449 HAS_LOCUS biopsy completed LOCUS 0 2347911070 liver CONSULT 457 3322572593 2342511950 XrayARRANGED completed CONSULT 219 packed red cell457 transfusion completed 0 2347911072 spleen INVESTIGATION 457 3322572593 2342511951 457 testing INDICATED_BY completed PROBLEM 221 chemotherapy cycle completed 2347911090 axilla INVESTIGATION 457 457 0 examination RECOMMENDED_BY -1 3322572593 BRCA2342511951 0 invasive completed oestrogen CONSULT 222 chemotherapy cycle 1 +ve tubular deferred receptor +ve 3322572593 2347911268 brain LOCUS 457 2342511952 457 examination HAS_LOCUS completed PATIENT adeno packed red cell transfusion -1 3322572593 1222 BRCA invasive completed oestrogen PROBLEM 2347911292 lung0 INVESTIGATION 457 2342511953 457 0 examination INDICATED_BY completed 1 +ve tubular receptor +ve 224 chemotherapy cycle adeno completed 2347911294 lung INVESTIGATION 465 3322572593 2342511953 465 XrayRECOMMENDED_BY completed CONSULT 0 225 chemotherapy cycle testing completed 2347911319 brain INVESTIGATION 465 3322572593 2342511953 465 HAS_TARGET completed LOCUS 0 2342511937 clinic 0 0 2342511937 clinic 0 2342511944 clinic 0 0 2342511937 clinic 0 2342511949 0 0 R2342511948 0 0 2342511937 0 2342511985 0 0 2342511936 0 R2342511950 0 3320133511 0 R2342511936 0 5 L 2342511950 0 L 2342511954 0 0 INVESTIGATION 465 3322572593 2342511953 465package examination HAS_FINDING completed PROBLEM 347 relapse treatment 2347911414 bone metabolismcompleted unsuccessful 0 2342511955 The role of NLG an intuitive query interface to provide efficient access to aggregated data-encoded patient histories for: Assisting in diagnosis and treatment Identifying patterns in treatment Selecting subjects for clinical trials generating reports from the data-encoded histories, for clinicians to use at the point of care. Intuitive querying of the CLEF repository What does the CLEF database provide Evidence from about 20,000 patient records, comprising 3.5 million record components (about 5GB of data). These are all in the area of cancer. 162 queriable fields various text-only records (non-queriable) Two types of data: Structured Extracted from narratives by IE Queriable data is encoded according to various medical terminologies (SNOMED, ICD, UMLS) There are approximately 19,500 different medical codes currently used in the database (a relatively small subset of SNOMED and ICD) Queriable data Structured data: Demographics: Age, gender, postal district, ethnical group, occupation Laboratory findings: 32 types of haematology findings 51 types of chemistry findings Cytology reports Histopathology reports Imaging studies: Radiology procedure, site, diagnosis, morphology, topography, report, indication, department Treatments: Prescription drugs Chemotherapy protocol IV chemotherapy Radiotherapy Surgical procedures Diagnoses Clinical diagnosis Cause(s) of death Data extracted from narratives Query interface requirements Designed for: casual and moderate users, who are familiar with the semantic domain of the repository but not with its technical implementation Typically clinicians or medical researchers Should be able to: Allow the construction of complex queries with nested structures and temporal expressions Minimise the risk of ambiguities Offer good coverage of the data types in the CLEF database Should be used with: Minimal training No prior knowledge of medical terminologies, formal querying languages, databases Typical queries “How many patients with AML have had a normal count after two cycles of treatment?” “ How many patients with primary breast cancer have relapsed in the last five years? ” “ What is the median time between first drug treatment for metastatic breast cancer and death? ” “ In breast cancer patients, what is the incidence of lymphoedema of the arm that persists more than two years after primary surgical treatment? ” “ What is the average number of x-rays for patients with prostate cancer? ” “ What is the average time between first treatment for cervical cancer and death for patients aged less than 60 at death compared with those aged over 60? ” “How many patients between the ages of 40 and 60 when they were first diagnosed with lung cancer had a platelet count higher than 300 but a white cell count lower than 3 before the 4th cycle of any course of chemotherapy they received during treatment? ” Querying alternatives SQL: Not appropriate for the typical CLEF user Requires deep knowledge of the database structure and content, medical terminologies used in the database Graphical interfaces: Have to cope with large number of parameters Nested structures and temporal restrictions are difficult to express Natural Language interfaces: More natural and more expressive than formal querying languages, but… Sensitive to errors in composition, spelling, vocabulary Normally understand only a subset of natural language Complex queries are difficult to process It is difficult to trace the source of errors in the result The CLEF approach Similar to Natural Language interfaces, however the user edits the conceptual meaning of a query instead of its surface text Allows users to easily construct non-ambiguous queries Guides the users towards constructing correct queries only (queries compatible with the content of the database) It is semi-database independent but very domain specific Based on the Conceptual Authoring (aka WYSIWYM) technique (Power and Scott, 1998) The query is presented to the user as an interactive text, and it is edited by making selections on various components of the query Each selection triggers a text re-generation process which results into a new feedback text containing the selection the user made Query editing Modelling queries There are 4 distinct sections of a query: A description of the subjects (in terms of demographics information and basic diagnosis) A description of treatments that the subjects received A description of laboratory findings An outcome section (what do we want from the group of patients we have just described) Each query element can be expressed as a conjunction or disjunction of same-type query elements, e.g.,: Cancer of the breast and of the lung Patients who received chemotherapy and radiotherapy Some query elements can be temporally related to each other, e.g.,: Patients who received chemotherapy within 5 months of surgery Patients alive 5 years after the diagnosis Constraining user choices At each step, users are only given correct choices Choices are context dependent Patients diagnosed with [some cancer] in [some body part] User selects [some cancer] => “squamous cell carcinoma” The interface restricts the choices available for [some body part] to those sites where squamous cell carcinoma can develop Dealing with ambiguities Once a query is constructed, there is only one way it can be interpreted – there is no disambiguation task to be performed … but users may be misled into constructing a different query than they intend to Answer generation The answer set consists of an age/gender breakdown of the patients that fulfil the query requirements Each additional clinical feature is combined with the age/gender breakdown to provide more detailed information 3 types of rendering: Text Charts Table Evaluation Research questions: Can the WYSIWYM query formulation method be easily learned by users of CLEF? Is it easier to formulate CLEF queries in SQL or with the WYSIWYM query formulation method? Are the interactive feedback texts ambiguous? Evaluation results show that… The CLEF Conceptual Authoring query interface works! The method is easily acquired. Investigation shows that it is much easier to use than current alternatives (viz. SQL). The feedback texts tend to be easily understood It is a viable solution to the querying the CLEF repository. However …. Unresolved issues Are the queries we currently support really the ones users will want to ask? Does the query interface provide sufficient data coverage? Generating reports from the CLEF repository The context We aim at generating reports from the dataencoded Electronic Patient Records Our reports are aimed at clinicians for use at the point of care Various types of report work on the same input (roughly the same content) but express information from different viewpoints We address the problem of conceptual restatement in generating summarised reports Typical input ~200 15 Problems SimID SimID SimID ID Interventions ~100 Investigations ~5 Drugs ~10 Consults ~20 ~600 Loci Relations EventSta ExisteSimID Clinical Genot TumourMark NodesCo Item2ID NodesIn ID EventEnEventStartDate EventStartDate SimIDID Status Item1Type EventEndDate Item1ID Name Relation Status Item2Type ID SimID EventEndDate Name Outcome ID Name Histology Laterality IDName EventStartDate EventEndDate TypeStatus Location SimID NamemmSi Status Regime Grade rtDate dDate 3320133511 2342511987 2342511939 3320133511 3320133512 449 3320133512 3320133511 2342511938 primary 33201 23425 1 cancer 33511 3320133512 11936 449 3320133511 2342511990 2342511943 3320133511 3320133512 3320133511 2342511946 33201 3320133512 23425 3320133511 33511 11940 3320133511 33201 3320133512 23425 33511 11944 3320133511 3320133512 33201 23425 3320133511 33511 3320133512 11948 131 131 cancer 2342511947 3320133511 2342511997 449 3320133512 3320133511 2342511986 nce Course ype ze er unted volved PROBLEM 131 131 2342511936 131 package XrayHAS_LOCUS completed LOCUS 457 primaryepirubicin treatment completed successful 3322572593 2347911036 breast R2342511937 2342512318 daily mammography screening clinic 0 -1 BRCA 131 5.8completed 0 invasive oestrogen 0 1 +vecyclophosphamide receptor +ve 449 lumpectomy completed complete LOCUS 287 443 2342511937 287 completed XrayHAS_LOCUS completed PATIENT 3322572593 2347911042 chest tubular 2342512319 dailyexcision 3320133511 443 mammography screening clinic adeno 0 449 0 2342511941 CONSULT 443 2342511938 443 therapy XrayARRANGED completed CONSULT 0 hormone5-fluorouracil anatagonist 2342512320 daily 3322572593 2347911044 blood 449 completed initialstarted treatment planning clinic 0 2342511951 3320133511 3320133512 3320133511 2342512020 2342512287 197 287 287 cancer INVESTIGATION 446 465 2342511939 446 package XrayHAS_TARGET completed LOCUS 2342512479 2347911046 haemoglobin concentration daily 465 completed follow up 205 primarydoxorubicin treatment completed successful 0 3322572593 2342511953 3320133512 3320133511 3320133511 2342512064 2342512290 197 443 443 cancer 2342511971 2342512108 3320133511 3320133511 2342512316 211 3320133511 33201 3320133512 23425 33511 11955 3320133511 33201 3320133512 23425 3320133511 33511 11959 3320133512 33201 23425 3320133511 33511 11978 3320133512 3320133511 33201 23425 33511 11979 3320133512 3320133511 33201 23425 33511 3320133512 11980 3320133511 2342511973 3320133511 3320133511 2342512152 446 446 abnormality 2342512317 218 INVESTIGATION 446 489 2342512818 epirubicin 2342511939 446 completed testing HAS_FINDING completed PROBLEM daily excision 2342511940 3322572593 2347911048 leucocyte count 489 followcompleted up clinic 197 radical mastectomy incomplete -1 BRCA 0 0 invasive oestrogen 0 0 3322572593 2347911050 platelet count PROBLEM 446 545 446 completed examination HAS_LOCUS completed LOCUS 2342511937 545 followcompleted up clinic 1 +ve 2342511940 tubular receptor +ve 215 radiotherapy course adeno 3322572593 2347911052 GGT concentration CONSULT 446 2342511942 446 completed examination ARRANGED completed CONSULT followcompleted up clinic 0 0 633 0 2342511945 225 chemotherapy633 course 33201 23425 3320133511 33511 3320133512 11982 33201 23425 3320133511 33511 3320133512 11984 3320133511 33201 23425 33511 3320133512 11989 3320133511 33201 3320133512 23425 3320133511 33511 11993 3320133512 3320133511 33201 23425 3320133511 33511 3320133512 11996 2342511974 2342512196 3320133511 3320133511 2342512348 446 abnormality 213 446 2342511976 2342512222 3320133511 3320133511 2342512349 446 214 446 cancer recurr 2342511988 2342512229 3320133511 3320133511 ent 2342512350 216 3320133511 446 2342511991 446 lymphaden opathy 2342512373 217 2342511992 enlargement 3320133511 446 446 2342512375 219 2342511994 3320133511 446 446 enlargement 2342512000 3320133511 2342512377 219 446 2342512001 446 abnormality 3320133511 2342512378 221 449 2342512010 449 cancer 3320133511 stage1 2342512379 222 2342512012 3320133511 2342512381 449 222 449 cancer 2342512013 3320133511 2342512382 224 2342512021 3320133511 449 449 lymphnode metast 2342512383 225 2342512022 count 3320133511 atic 33201 23425 457 457 abnormality 3320133511 2342512475 2342512031 3320133511 3320133512 298 33511 12002 3322572593 2347911054 Bilirubin concentration INVESTIGATION 446 737 2342511943 446 completed examination HAS_TARGET completed LOCUS 737 follow up 213 radiotherapy cycle completed 0 3322572593 2347911056 Alkaline Phosphatase concentration INVESTIGATION 446 841 2342511943 446 completed examination HAS_FINDING completed PROBLEM 841 mammography screening 214 radiotherapy cycle completed 0 2347911058 Creatinine PROBLEM 449 3322572593 2342511944 449 cancer HAS_LOCUS staging concentration completed LOCUS 0 0 scheduled mammography screening 216 chemotherapy cycle completed 2347911060 ESR concentrationcompleted CONSULT CONSULT 449 2342511946 449 excision ARRANGED biopsy 0 3322572593 217 chemotherapy cycle histopathology completed 2347911062 axillary lymphnodes INVESTIGATION 449 3322572593 2342511947 449 HAS_FINDING completed PROBLEM 0 2347911065 abdomen 219 chemotherapy cycle excision deferred PROBLEM 449 3322572593 2342511948 449 HAS_LOCUS biopsy completed LOCUS 0 2347911070 liver CONSULT 457 3322572593 2342511950 XrayARRANGED completed CONSULT 219 packed red cell457 transfusion completed 0 2347911072 spleen INVESTIGATION 457 3322572593 2342511951 457 testing INDICATED_BY completed PROBLEM 221 chemotherapy cycle completed 2347911090 axilla INVESTIGATION 457 457 0 examination RECOMMENDED_BY -1 3322572593 BRCA2342511951 0 invasive completed oestrogen CONSULT 222 chemotherapy cycle 1 +ve tubular deferred receptor +ve 3322572593 2347911268 brain LOCUS 457 2342511952 457 examination HAS_LOCUS completed PATIENT adeno packed red cell transfusion -1 3322572593 1222 BRCA invasive completed oestrogen PROBLEM 2347911292 lung0 INVESTIGATION 457 2342511953 457 0 examination INDICATED_BY completed 1 +ve tubular receptor +ve 224 chemotherapy cycle adeno completed 2347911294 lung INVESTIGATION 465 3322572593 2342511953 465 XrayRECOMMENDED_BY completed CONSULT 0 225 chemotherapy cycle testing completed 2347911319 brain INVESTIGATION 465 3322572593 2342511953 465 HAS_TARGET completed LOCUS 0 2342511937 clinic 0 0 2342511937 clinic 0 2342511944 clinic 0 0 2342511937 clinic 0 2342511949 0 0 R2342511948 0 0 2342511937 0 2342511985 0 0 2342511936 0 R2342511950 0 3320133511 0 R2342511936 0 5 L 2342511950 0 L 2342511954 0 0 INVESTIGATION 465 3322572593 2342511953 465package examination HAS_FINDING completed PROBLEM 347 relapse treatment 2347911414 bone metabolismcompleted unsuccessful 0 2342511955 Why are textual reports needed? Clinicians and other health professionals use patient health summaries at the point of care, where time is a critical resource Reports provide quick access to an overview of a patient’s medical history Typically, an electronic patient record contains around 1000 messages Even structured, this volume of data is very large Access to relevant information about particular patients is difficult Textual reports: are easy to read and understand can be customised to the type of information needed provide a quick way of identifying errors in the patient record alleviate the need to know in detail the structure of the underlying database Why are paraphrases needed? Alternative views of the patient record, i.e., Reports from various viewpoints: Full chronological reports Summaries of investigations, interventions, treatments Same content, different textual representation Potted summaries also important (30second overview of patient’s history) Content selection •Two notions: •Spine events: the main concepts in the summary (depending on userdefined type of summary) •Skeleton events: linked to the spine by various relations •Basic procedure: •Step 1: group linked events into clusters and remove small clusters •Typically, a small number of very large clusters and a small number of small clusters •Small clusters are assumed not to be related to the main topic of the summary •Step 2: Identify spine events according to the type of summary Longitudinal, Investigations, Interventions, Problems •Step 3: Identify the skeleton events If (“problem is spine event” and “investigation has_indication problem”) then select investigation (unless already selected) Repeat step 2 a certain number of times (given by a threshold parameter) Spine of Problem events mammogram pain biopsy lump breast cancer cancer radiotherapy ulcer radiotherapy cycle Problem Hyperbaric oxygenation The patient identifies pain in the left breast. A lump in the breast is found through a mammogram. A biopsy performed on the breast reveals cancer in the left breast. The patient receives radiotherapy to treat the cancer. Skin ulceration develops in the left breast as a result of radiotherapy, which is treated with hyperbaric oxygenation. radiotherapy pain breast radiotherapy cycle mammogram cancer biopsy Hyperbaric oxygenation lump Interventions ulcer Radiotherapy on the breast is initiated to treat cancer in the breast. A first radiotherapy cycle is performed. The radiotherapy causes skin ulceration. The patient receives hyperbaric oxygenation to treat the ulcer. mammogram pain breast cancer lump biopsy radiotherapy ulcer radiotherapy cycle Hyperbaric oxygenation Investigations A mammogram is performed because of pain in the left breast, which identifies a lump in the breast. A biopsy of the lump identifies cancer in the left breast. radiotherapy pain breast mammogram lump biopsy cancer radiotherapy cycle ulcer Hyperbaric oxygenation Interventions mammogram mammogram pain pain biopsy lump breast breast cancer lump cancer cancer biopsy radiotherapy ulcer radiotherapy cycle Problem Hyperbaric oxygenation radiotherapy ulcer radiotherapy cycle Hyperbaric oxygenation Investigations Discourse structuring Mostly given by relations in the EPR 19 different types of relations, which can be: Attributive: Problem has_locus Locus Rhetorical: Problem caused_by Intervention Attributive relations do not contribute to the discourse structure In a first step, events linked through attributive relations are combined: Message_Problem+Message_Locus => Message_Problem_Locus Messages are grouped according to type of summary: Longitudinal: events occurring in the same week should be grouped together and further grouped into years Logical: arrange chronologically and then group similar events (e.g., liver panels, screening consults) Discourse structuring Within each group: link messages by discourse relations inferred from EPR relations: Cause, Result, Sequence assume a List relation if no relation specified Between groups: If all events in one group are linked to events in another group by some EPR relation, link groups through the corresponding discourse relation Otherwise, assume a List relation Text structuring Aggregation Problems: Problem_1:name HAS_LOCUS Locus_1 Problem_2:name HAS_LOCUS Locus_2 Problem_3 HAS_LOCUS {Locus_1, Locus_2} Enlargement of the liver + Enlargement of the spleen => Enlargement of the liver and/but not of the spleen Investigations: Investigation_1:name HAS_INDICATION Problem_1 HAS_LOCUS Locus_1 Investigation_2:name HAS_INDICATION Problem_2 HAS_LOCUS Locus_2 Investigation_3 HAS_INDICATION {Problem_1, Problem_2} Examination of the abdomen revealed no enlargement of the liver Examination of the lymphnodes revealed no lymphadenopathy => Examination revealed no enlargement of the liver and no lymphadenopathy Text structuring Aggregation Interventions Intervention_1 PART_OF Intervention_0 Intervention_2 PART_OF Intervention_0 {count} Intervention_1 [ID01]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID02]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID03]Chemotherapy cycle PART_OF [ID0]Chemotherapy 3 chemotherapy cycles Ellipsis Examination of the left breast revealed no recurrent cancer in the left breast => Examination of the left breast revealed no recurrent cancer Text structuring Events can be compacted according to domainspecific rules: Clinical examination is: examination of the liver, examination of the spleen, examination of the abdomen Clinical examination was normal Clinical examination was normal apart from an enlargement of the spleen Clinical examination revealed enlargement of the spleen Liver panel is: billirubin concentration, ESR concentration, GCT concentration The liver panel was in the normal range (apart from a very high level of GCT) Maintaining the thread of discourse Textual representation should reflect the relative importance of events At discourse level: spine concepts are preferably realised in nuclear units and skeleton events in satellite units At sentence level: spine events are assigned salient syntactical roles The status of an event of being on the spine or on the skeleton determines its realisation as a sentence, a main or subordinate clause, phrase Typical output of the NL generator Long chronological report Year 1 Week 0 A mammography screening was scheduled at the clinic. Week 1 Primary cancer of the right breast; histopathology: invasive tubular adenocarcinoma. YEAR 2 Week 131 Xray revealed no cancer of the right breast. YEAR 5 Week 287 Xray revealed no cancer of the right breast. YEAR 8 Week 443 Xray revealed cancer of the right breast. Week 446 Examination (indicated by primary cancer of the right breast) revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing (indicated by primary cancer of the right breast) revealed no abnormality of the haemoglobin concentration and no abnormality of the leucocyte count. An Xray (indicated by primary cancer of the right breast) was performed. Very high level of the ESR concentration. Very high level of the Creatinine concentration. Very high level of the Alkaline Phosphatase concentration. Very high level of the Bilirubin concentration. Very high level of the GGT concentration. No abnormality of the platelet count. Week 449 An initial treatment planning was completed at the clinic. Excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Cancer staging revealed stage1 cancer. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Lumpectomy was performed on the breast to treat primary cancer of the right breast. Primary treatment package was started to treat primary cancer of the right breast. …………………. YEAR 17 Week 893 Xray revealed no cancer of the right breast. Typical output of the NL generator Compact reports Focus on Problems Focus on Interventions In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was started to treat primary cancer of the right breast. In week 446, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes revealed by examination. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was initiated to treat primary cancer of the right breast. In weeks 457 to 737, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast. Focus on Investigations In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. In weeks 457 to 737, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast Ongoing work on report generation Add domain-specific knowledge to improve content selection Some events are become important depending on context Change the (sub-)domain Test if the generation method is easily portable Link NLG to IR to improve IR Produce reports for patients Summary and Conclusions CLEF is now entering the integration phase, moving towards testing and deployment Major emphases at this point are on privacy and security Informing patients a major thread for future work. Integrating IE and NLG Thank You! Collaborators: Catalina Hallett Richard Power Evaluation procedure Subjects: We tested the performance of 15 subjects. Subjects had a range of expertise in the CLEF domain -from expert (oncologist) to novice (computer scientist), but most subjects had some medical training. Subjects had no previous experience with the CLEF WYSIWYM query interface, but most were aware of its fundamental principles. Methodology: Subjects were given a set of four fixed queries to formulate using the CLEF WYSIWYM query interface. The queries were expressed in language as different as possible from the language in the query interface. Each subject received the queries in a different order. Evaluation – data analysis We recorded the time taken to compose each query. the number of operations used for constructing a query and compared it with the optimal number of operations (pre-computed). We analysed whether performance, as indicated by Speed Efficiency improves with training (experience). Evaluation results Time to completion After their first experience of composing a query, subjects’ completion time halved, and asymptotes at that level. Tim e to com pletion 7 6 Time (mins) Subjects’ performance improved dramatically with experience. 5 4 3 2 1 0 1 2 3 Order of query 4 Evaluation results Performance over time: performance normalised over complexity Operations (total - optimal /optimal) After just one go with the CLEF interface, subjects are highly proficient in their ability to compose complex queries. By the time they get to their fourth query, subjects’ performance is almost perfect. 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 Order of query Mean : 0.18 Optimal operation = min # of operations needed to compose the query perfectly. This is a measure of the complexity of the query. Evaluation – comparison with SQL Very small scale experiment Two subjects: with expert knowledge of the structure, organisation and content of the CLEF database highly skilled users of SQL with minimal experience with WYSIWYM were given access to the SNOMED and ICD codes required to build the SQL Each subject composed a query first in the CLEF WYSIWYM Interface and then in SQL Evaluation – comparison with SQL Subject 1 – Query 1 WYSIWYM: 2.3 mins SQL: 8.5 mins (incomplete) Subject 2 – Query 2 WYSIWYM: 4.5 mins SQL:12 mins (incomplete) 12 10 8 WYSIWYM SQL 6 4 2 0 Subject 1 Subject 2 Even with a slowly reacting interface, the subjects were much faster composing queries in WYSIWYM than in SQL Are the feedback texts ambiguous to the users Identified 6 types of ambiguity 4 examples of each, with forced-choice judgements by 15 subjects Random jugements would give a score of 33% Results show 84% correct judgements repository summarisation summary patient records for clinicians and medical researchers summary patient records linear text for patients hypertext animated dialogue Sample report for Clinicians In the weeks 195 to 196, self examination revealed lump of the right breast. In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast. Sample report for Clinicians In the weeks 195 to 196, self examination revealed lump of the right breast. In week 197, self examination revealed lump of the right breast. Excision biopsy revealed metastatic lymphnode count of the right axilla. Histopathology revealed cancer of the right breast. Cancer staging revealed stage2 cancer. Radical mastectomy was performed on the breast to treat the primary cancer. The patient was diagnosed with metastatic lymphnode count of the right axilla; 19 nodes involved out of 24. The patient was diagnosed with metastatic cancer of the right axilla; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with cancer of the right breast; histopathology: invasive undifferentiated adenocarcinoma. The patient was diagnosed with stage2 cancer; histopathology: invasive undifferentiated adenocarcinoma. Primary treatment package was initiated to treat primary cancer of the right breast. … Sample report for Patients You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination and you found that you had a lump in your right breast. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. On October 4th you did another self examination and you found that you still had a lump in your right breast. On October 11th you had a radical mastectomy to treat cancer in your right breast. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Cancer is a tumour that tends to spread, both locally and to other parts of the body. … Presenting patient records in hypertext: dividing the text into related units You had a consultation with your doctor on September 20th 1993. SEQUENCE On September 27th you did a self examination. HAS-FINDING you found that you had a lump in your right breast. SEQUENCE On October 4th you did another self examination. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. SEQUENCE to treat cancer in your right breast. MOTIVATION On October 11th you had a radical mastectomy. Cancer is a tumour that tends to spread, both locally and to other parts of the body. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Presenting patient records in hypertext: giving graphical attributes to the text units You had a consultation with your doctor on September 20th 1993. SEQUENCE On September 27th you did a self examination. HAS-FINDING you found that you had a lump in your right breast. SEQUENCE On October 4th you did another self examination. A self examination is an examination of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. SEQUENCE to treat cancer in your right breast. MOTIVATION On October 11th you had a radical mastectomy. Cancer is a tumour that tends to spread, both locally and to other parts of the body. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Presenting patient records in hypertext: using animation to represent discourse patterns dynamically You had a consultation with your doctor on September 20th 1993. On September 27th you did a you found that you self examination. had a lump in your A self examination is an examination right breast. of the breasts by running your hand On October 4th you over did each breast and up under your arms and checking for changes to another self examination. their size, shape or feel. The radical mastectomy was On October 11th you had a done to treat cancer in your right radical mastectomy. A radical mastectomy is an operation breast. to remove the breast, along with the Cancer is a tumour that tends lymph glands under the arm and the to spread, both locally and to muscles of the chest wall. other parts of the body. You had a consultation with your doctor on September 20th 1993. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You had a consultation with your doctor on September 20th 1993. On September 27th you did a self examination. You found that you had a lump in your right breast. You had a consultation with your doctor on September 20th 1993. On September 27th you did a You found that you self examination. had a lump in your A self examination is an examination right breast. of the breasts by running your hand over each breast and up under your arms and checking for changes to their size, shape or feel. You had a consultation with your doctor on September 20th 1993. On September 27th you did a You found that you self examination. had a lump in your A self examination is an examination right breast. of the breasts by running your hand On October 4th you over did each breast and up under your arms and checking for changes to another self examination. their size, shape or feel. You had a consultation with your doctor on September 20th 1993. On September 27th you did a You found that you self examination. had a lump in your A self examination is an examination right breast. of the breasts by running your hand On October 4th you over did each breast and up under your arms and checking for changes to another self examination. their size, shape or feel. On October 11th you had a radical mastectomy. You had a consultation with your doctor on September 20th 1993. On September 27th you did a You found that you self examination. had a lump in your A self examination is an examination right breast. of the breasts by running your hand On October 4th you over did each breast and up under your arms and checking for changes to another self examination. their size, shape or feel. On October 11th you had a radical mastectomy. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. You had a consultation with your doctor on September 20th 1993. On September 27th you did a You found that you self examination. had a lump in your A self examination is an examination right breast. of the breasts by running your hand On October 4th you over did each breast and up under your arms and checking for changes to another self examination. their size, shape or feel. The radical mastectomy was done to treat cancer in your right breast. On October 11th you had a radical mastectomy. A radical mastectomy is an operation to remove the breast, along with the lymph glands under the arm and the muscles of the chest wall. Monologues/Dialogues Monologue Autonomous agent reads the generated report Aims: accessibility, education (not translation) Dialogue Report is generated as a script that 2 agents act out Aims: accessibility, vicarious learning Example (video clip)
© Copyright 2025 Paperzz