Scott-manchester-seminar`06.pps

Applying Natural Language
Generation to Electronic Health
Records in an e-Science context
Donia Scott
Centre for Research in Computing
The Open University
Outline
Background: the CLEF project
Patient records as data-encoded patient
histories
Role of NLG in CLEF
Intuitive querying with natural language
Generating tailored reports from CLEF
data
Background: the CLEF project
CLEF (Clinical E-Science Framework) is an MRCfunded project aiming at providing a repository of
well organised data-encoded clinical histories
Aim: to provide the framework for a new type of
medical research: in silico experiments
Partners:
NLP: OU, Sheffield
Medical informatics: Manchester
Electronic Health Records: Royal Marsden Hospital, UCL
Privacy/confidentiality: Cambridge
GRID
Collect clinical information from
multiple sites
Analyse, structure and integrate it
Make it available, using GRID tools
To authorised clinicians and eHealth scientists
In a secure and ethical collaborative
framework
The CLEF repository
Data from:
• Referral letters
• Review notes
• Lab results
• Nurse notes
• Hospital admission notes
• Hospital discharge notes
• Treatment notes
• Surgery reports
Repository
Chronicle
Organised data on
individual patients
The CLEF Chronicle
Representing the story of a
patient over time
The story of an illness
Human:1382
Pain:5735
Ulcer:1945
locus
locus
attends
reason
locus
reason
attends
finding
attends
Breast:1492
Clinic:4096
reason
plans
Clinic:1024
plans
plans
reason
locus
Biopsy:1066
target
finding
time
Clinic:2010
reason
Radio:1812
plans
Chemo:6502
treats
reason
Mass:1666
plans
treats
locus
time
Cancer:1914
time
time
time
time
time
time
A typical cancer patient
~200
15
Problems
SimID SimID
SimID
ID
Interventions
~100
Investigations
~5
Drugs
~10
Consults
~20
~600
Loci
Relations
EventSta
ExisteSimID
Clinical
Genot
TumourMark
NodesCo Item2ID
NodesIn
ID EventEnEventStartDate
EventStartDate
SimIDID Status
Item1Type
EventEndDate
Item1ID
Name
Relation
Status
Item2Type
ID SimID
EventEndDate
Name
Outcome
ID
Name Histology
Laterality
IDName
EventStartDate
EventEndDate
TypeStatus
Location
SimID
NamemmSi Status
Regime
Grade
rtDate
dDate
3320133511 2342511987
2342511939
3320133511
3320133512
449
3320133512
3320133511
2342511938
primary
33201
23425
1
cancer
33511 3320133512
11936
449
3320133511 2342511990
2342511943
3320133511
3320133512
3320133511
2342511946
33201 3320133512
23425
3320133511
33511
11940
3320133511
33201 3320133512
23425
33511
11944
3320133511
3320133512
33201
23425
3320133511
33511 3320133512
11948
131
131
cancer
2342511947
3320133511
2342511997
449
3320133512
3320133511
2342511986
nce
Course
ype
ze
er
unted
volved
PROBLEM
131 131
2342511936
131 package
XrayHAS_LOCUS
completed
LOCUS
457
primaryepirubicin
treatment
completed
successful
3322572593
2347911036
breast
R2342511937
2342512318
daily
mammography
screening
clinic 0
-1
BRCA 131 5.8completed
0
invasive
oestrogen
0
1 +vecyclophosphamide
receptor
+ve
449
lumpectomy
completed
complete
LOCUS
287 443
2342511937
287 completed
XrayHAS_LOCUS
completed
PATIENT
3322572593
2347911042
chest tubular
2342512319
dailyexcision 3320133511
443
mammography
screening
clinic
adeno
0 449
0 2342511941
CONSULT
443
2342511938
443 therapy
XrayARRANGED
completed
CONSULT
0
hormone5-fluorouracil
anatagonist
2342512320
daily
3322572593
2347911044
blood
449
completed
initialstarted
treatment
planning
clinic 0
2342511951
3320133511
3320133512
3320133511
2342512020
2342512287
197
287
287
cancer
INVESTIGATION
446 465
2342511939
446 package
XrayHAS_TARGET
completed LOCUS
2342512479
2347911046
haemoglobin
concentration
daily
465
completed
follow
up
205
primarydoxorubicin
treatment
completed
successful
0 3322572593
2342511953
3320133512
3320133511
3320133511
2342512064
2342512290
197
443
443
cancer
2342511971 2342512108
3320133511
3320133511
2342512316
211
3320133511
33201 3320133512
23425
33511
11955
3320133511
33201 3320133512
23425
3320133511
33511
11959
3320133512
33201
23425
3320133511
33511
11978
3320133512
3320133511
33201
23425
33511
11979
3320133512
3320133511
33201
23425
33511 3320133512
11980
3320133511
2342511973
3320133511
3320133511
2342512152
446
446 abnormality
2342512317
218
INVESTIGATION
446 489
2342512818
epirubicin
2342511939
446 completed
testing
HAS_FINDING
completed PROBLEM
daily excision
2342511940
3322572593
2347911048
leucocyte count
489
followcompleted
up
clinic
197
radical mastectomy
incomplete
-1
BRCA
0
0
invasive
oestrogen
0
0
3322572593
2347911050
platelet count
PROBLEM
446 545
446 completed
examination
HAS_LOCUS
completed
LOCUS
2342511937
545
followcompleted
up
clinic
1 +ve 2342511940
tubular
receptor +ve
215
radiotherapy
course
adeno
3322572593
2347911052
GGT concentration
CONSULT
446
2342511942
446 completed
examination
ARRANGED
completed CONSULT
followcompleted
up
clinic 0
0 633
0 2342511945
225
chemotherapy633
course
33201
23425
3320133511
33511 3320133512
11982
33201
23425
3320133511
33511 3320133512
11984
3320133511
33201
23425
33511 3320133512
11989
3320133511
33201 3320133512
23425
3320133511
33511
11993
3320133512
3320133511
33201
23425
3320133511
33511 3320133512
11996
2342511974 2342512196
3320133511
3320133511
2342512348 446 abnormality
213
446
2342511976 2342512222
3320133511
3320133511
2342512349 446
214
446
cancer
recurr
2342511988 2342512229
3320133511
3320133511
ent
2342512350
216
3320133511
446 2342511991
446
lymphaden
opathy
2342512373
217
2342511992 enlargement
3320133511
446
446
2342512375
219
2342511994
3320133511
446
446 enlargement
2342512000
3320133511
2342512377
219
446 2342512001
446 abnormality
3320133511
2342512378
221
449 2342512010
449
cancer 3320133511
stage1
2342512379
222
2342512012
3320133511
2342512381 449
222
449
cancer
2342512013
3320133511
2342512382
224
2342512021
3320133511
449
449
lymphnode
metast
2342512383
225
2342512022 count 3320133511
atic
33201
23425
457
457 abnormality
3320133511 2342512475
2342512031
3320133511
3320133512
298
33511
12002
3322572593
2347911054
Bilirubin concentration
INVESTIGATION
446 737
2342511943
446 completed
examination
HAS_TARGET
completed LOCUS
737
follow up
213
radiotherapy cycle
completed
0
3322572593
2347911056
Alkaline Phosphatase
concentration
INVESTIGATION
446 841
2342511943
446 completed
examination
HAS_FINDING
completed
PROBLEM
841
mammography
screening
214
radiotherapy
cycle
completed
0
2347911058
Creatinine
PROBLEM
449 3322572593
2342511944
449
cancer
HAS_LOCUS
staging concentration
completed
LOCUS
0
0
scheduled
mammography
screening
216
chemotherapy cycle
completed
2347911060
ESR
concentrationcompleted CONSULT
CONSULT
449
2342511946
449
excision
ARRANGED
biopsy
0 3322572593
217
chemotherapy
cycle histopathology
completed
2347911062
axillary lymphnodes
INVESTIGATION
449 3322572593
2342511947
449
HAS_FINDING
completed PROBLEM
0
2347911065
abdomen
219
chemotherapy
cycle excision
deferred
PROBLEM
449 3322572593
2342511948
449
HAS_LOCUS
biopsy
completed LOCUS
0
2347911070
liver
CONSULT
457 3322572593
2342511950
XrayARRANGED
completed CONSULT
219
packed red
cell457
transfusion
completed
0
2347911072
spleen
INVESTIGATION
457 3322572593
2342511951
457
testing
INDICATED_BY
completed PROBLEM
221
chemotherapy cycle
completed
2347911090
axilla
INVESTIGATION
457
457 0 examination
RECOMMENDED_BY
-1 3322572593 BRCA2342511951
0
invasive completed
oestrogen CONSULT
222
chemotherapy
cycle
1 +ve
tubular deferred
receptor +ve
3322572593
2347911268
brain
LOCUS
457
2342511952
457
examination
HAS_LOCUS
completed PATIENT
adeno
packed
red cell transfusion
-1 3322572593
1222
BRCA
invasive completed
oestrogen PROBLEM
2347911292
lung0
INVESTIGATION
457
2342511953
457 0 examination
INDICATED_BY
completed
1 +ve
tubular
receptor +ve
224
chemotherapy
cycle
adeno completed
2347911294
lung
INVESTIGATION
465 3322572593
2342511953
465
XrayRECOMMENDED_BY
completed CONSULT
0
225
chemotherapy
cycle testing
completed
2347911319
brain
INVESTIGATION
465 3322572593
2342511953
465
HAS_TARGET
completed LOCUS
0
2342511937
clinic 0
0
2342511937
clinic
0
2342511944
clinic
0
0
2342511937
clinic
0
2342511949
0
0
R2342511948
0
0
2342511937
0
2342511985
0
0
2342511936
0 R2342511950
0
3320133511
0 R2342511936
0
5
L 2342511950
0
L 2342511954
0
0
INVESTIGATION
465 3322572593
2342511953
465package
examination
HAS_FINDING
completed PROBLEM
347
relapse treatment
2347911414
bone metabolismcompleted
unsuccessful
0
2342511955
The role of NLG
an intuitive query interface to provide
efficient access to aggregated data-encoded
patient histories for:
Assisting in diagnosis and treatment
Identifying patterns in treatment
Selecting subjects for clinical trials
generating reports from the data-encoded
histories, for clinicians to use at the point of
care.
Intuitive querying of the
CLEF repository
What does the CLEF database
provide
Evidence from about 20,000 patient records,
comprising 3.5 million record components (about 5GB
of data). These are all in the area of cancer.
162 queriable fields
various text-only records (non-queriable)
Two types of data:
Structured
Extracted from narratives by IE
Queriable data is encoded according to various
medical terminologies (SNOMED, ICD, UMLS)
There are approximately 19,500 different medical
codes currently used in the database (a relatively
small subset of SNOMED and ICD)
Queriable data
Structured data:
Demographics:
Age, gender, postal district, ethnical group, occupation
Laboratory findings:
32 types of haematology findings
51 types of chemistry findings
Cytology reports
Histopathology reports
Imaging studies:
Radiology procedure, site, diagnosis, morphology, topography, report, indication,
department
Treatments:
Prescription drugs
Chemotherapy protocol
IV chemotherapy
Radiotherapy
Surgical procedures
Diagnoses
Clinical diagnosis
Cause(s) of death
Data extracted from narratives
Query interface requirements
Designed for:
casual and moderate users, who are familiar with the
semantic domain of the repository but not with its technical
implementation
Typically clinicians or medical researchers
Should be able to:
Allow the construction of complex queries with nested
structures and temporal expressions
Minimise the risk of ambiguities
Offer good coverage of the data types in the CLEF database
Should be used with:
Minimal training
No prior knowledge of medical terminologies, formal querying
languages, databases
Typical queries
“How many patients with AML have had a normal count after two
cycles of treatment?”
“ How many patients with primary breast cancer have relapsed in
the last five years? ”
“ What is the median time between first drug treatment for
metastatic breast cancer and death? ”
“ In breast cancer patients, what is the incidence of lymphoedema
of the arm that persists more than two years after primary
surgical treatment? ”
“ What is the average number of x-rays for patients with prostate
cancer? ”
“ What is the average time between first treatment for cervical
cancer and death for patients aged less than 60 at death
compared with those aged over 60? ”
“How many patients between the ages of 40 and 60 when they were
first diagnosed with lung cancer had a platelet count higher
than 300 but a white cell count lower than 3 before the 4th
cycle of any course of chemotherapy they received during
treatment? ”
Querying alternatives
SQL:
Not appropriate for the typical CLEF user
Requires deep knowledge of the database structure and
content, medical terminologies used in the database
Graphical interfaces:
Have to cope with large number of parameters
Nested structures and temporal restrictions are
difficult to express
Natural Language interfaces:
More natural and more expressive than formal querying
languages, but…
Sensitive to errors in composition, spelling, vocabulary
Normally understand only a subset of natural language
Complex queries are difficult to process
It is difficult to trace the source of errors in the result
The CLEF approach
Similar to Natural Language interfaces, however the user edits
the conceptual meaning of a query instead of its surface text
Allows users to easily construct non-ambiguous queries
Guides the users towards constructing correct queries only
(queries compatible with the content of the database)
It is semi-database independent but very domain specific
Based on the Conceptual Authoring (aka WYSIWYM) technique
(Power and Scott, 1998)
The query is presented to the user as an interactive text, and it
is edited by making selections on various components of the
query
Each selection triggers a text re-generation process which
results into a new feedback text containing the selection the
user made
Query editing
Modelling queries
There are 4 distinct sections of a query:
A description of the subjects (in terms of demographics
information and basic diagnosis)
A description of treatments that the subjects received
A description of laboratory findings
An outcome section (what do we want from the group of
patients we have just described)
Each query element can be expressed as a conjunction or
disjunction of same-type query elements, e.g.,:
Cancer of the breast and of the lung
Patients who received chemotherapy and radiotherapy
Some query elements can be temporally related to each other,
e.g.,:
Patients who received chemotherapy within 5 months of
surgery
Patients alive 5 years after the diagnosis
Constraining user choices
At each step, users are only given correct
choices
Choices are context dependent
Patients diagnosed with [some cancer] in [some
body part]
User selects [some cancer] => “squamous cell
carcinoma”
The interface restricts the choices available for
[some body part] to those sites where squamous
cell carcinoma can develop
Dealing with ambiguities
Once a query is constructed, there
is only one way it can be interpreted
– there is no disambiguation task to
be performed
… but users may be misled into
constructing a different query than
they intend to
Answer generation
The answer set consists of an age/gender breakdown of the
patients that fulfil the query requirements
Each additional clinical feature is combined with the age/gender
breakdown to provide more detailed information
3 types of rendering:
Text
Charts
Table
Evaluation
Research questions:
Can the WYSIWYM query formulation
method be easily learned by users of CLEF?
Is it easier to formulate CLEF queries in
SQL or with the WYSIWYM query
formulation method?
Are the interactive feedback texts
ambiguous?
Evaluation results show that…
The CLEF Conceptual Authoring query interface
works!
The method is easily acquired.
Investigation shows that it is much easier to use
than current alternatives (viz. SQL).
The feedback texts tend to be easily understood
It is a viable solution to the querying the CLEF
repository.
However ….
Unresolved issues
Are the queries we currently support
really the ones users will want to ask?
Does the query interface provide
sufficient data coverage?
Generating reports from the
CLEF repository
The context
We aim at generating reports from the dataencoded Electronic Patient Records
Our reports are aimed at clinicians for use at
the point of care
Various types of report work on the same
input (roughly the same content) but express
information from different viewpoints
We address the problem of conceptual
restatement in generating summarised
reports
Typical input
~200
15
Problems
SimID SimID
SimID
ID
Interventions
~100
Investigations
~5
Drugs
~10
Consults
~20
~600
Loci
Relations
EventSta
ExisteSimID
Clinical
Genot
TumourMark
NodesCo Item2ID
NodesIn
ID EventEnEventStartDate
EventStartDate
SimIDID Status
Item1Type
EventEndDate
Item1ID
Name
Relation
Status
Item2Type
ID SimID
EventEndDate
Name
Outcome
ID
Name Histology
Laterality
IDName
EventStartDate
EventEndDate
TypeStatus
Location
SimID
NamemmSi Status
Regime
Grade
rtDate
dDate
3320133511 2342511987
2342511939
3320133511
3320133512
449
3320133512
3320133511
2342511938
primary
33201
23425
1
cancer
33511 3320133512
11936
449
3320133511 2342511990
2342511943
3320133511
3320133512
3320133511
2342511946
33201 3320133512
23425
3320133511
33511
11940
3320133511
33201 3320133512
23425
33511
11944
3320133511
3320133512
33201
23425
3320133511
33511 3320133512
11948
131
131
cancer
2342511947
3320133511
2342511997
449
3320133512
3320133511
2342511986
nce
Course
ype
ze
er
unted
volved
PROBLEM
131 131
2342511936
131 package
XrayHAS_LOCUS
completed
LOCUS
457
primaryepirubicin
treatment
completed
successful
3322572593
2347911036
breast
R2342511937
2342512318
daily
mammography
screening
clinic 0
-1
BRCA 131 5.8completed
0
invasive
oestrogen
0
1 +vecyclophosphamide
receptor
+ve
449
lumpectomy
completed
complete
LOCUS
287 443
2342511937
287 completed
XrayHAS_LOCUS
completed
PATIENT
3322572593
2347911042
chest tubular
2342512319
dailyexcision 3320133511
443
mammography
screening
clinic
adeno
0 449
0 2342511941
CONSULT
443
2342511938
443 therapy
XrayARRANGED
completed
CONSULT
0
hormone5-fluorouracil
anatagonist
2342512320
daily
3322572593
2347911044
blood
449
completed
initialstarted
treatment
planning
clinic 0
2342511951
3320133511
3320133512
3320133511
2342512020
2342512287
197
287
287
cancer
INVESTIGATION
446 465
2342511939
446 package
XrayHAS_TARGET
completed LOCUS
2342512479
2347911046
haemoglobin
concentration
daily
465
completed
follow
up
205
primarydoxorubicin
treatment
completed
successful
0 3322572593
2342511953
3320133512
3320133511
3320133511
2342512064
2342512290
197
443
443
cancer
2342511971 2342512108
3320133511
3320133511
2342512316
211
3320133511
33201 3320133512
23425
33511
11955
3320133511
33201 3320133512
23425
3320133511
33511
11959
3320133512
33201
23425
3320133511
33511
11978
3320133512
3320133511
33201
23425
33511
11979
3320133512
3320133511
33201
23425
33511 3320133512
11980
3320133511
2342511973
3320133511
3320133511
2342512152
446
446 abnormality
2342512317
218
INVESTIGATION
446 489
2342512818
epirubicin
2342511939
446 completed
testing
HAS_FINDING
completed PROBLEM
daily excision
2342511940
3322572593
2347911048
leucocyte count
489
followcompleted
up
clinic
197
radical mastectomy
incomplete
-1
BRCA
0
0
invasive
oestrogen
0
0
3322572593
2347911050
platelet count
PROBLEM
446 545
446 completed
examination
HAS_LOCUS
completed
LOCUS
2342511937
545
followcompleted
up
clinic
1 +ve 2342511940
tubular
receptor +ve
215
radiotherapy
course
adeno
3322572593
2347911052
GGT concentration
CONSULT
446
2342511942
446 completed
examination
ARRANGED
completed CONSULT
followcompleted
up
clinic 0
0 633
0 2342511945
225
chemotherapy633
course
33201
23425
3320133511
33511 3320133512
11982
33201
23425
3320133511
33511 3320133512
11984
3320133511
33201
23425
33511 3320133512
11989
3320133511
33201 3320133512
23425
3320133511
33511
11993
3320133512
3320133511
33201
23425
3320133511
33511 3320133512
11996
2342511974 2342512196
3320133511
3320133511
2342512348 446 abnormality
213
446
2342511976 2342512222
3320133511
3320133511
2342512349 446
214
446
cancer
recurr
2342511988 2342512229
3320133511
3320133511
ent
2342512350
216
3320133511
446 2342511991
446
lymphaden
opathy
2342512373
217
2342511992 enlargement
3320133511
446
446
2342512375
219
2342511994
3320133511
446
446 enlargement
2342512000
3320133511
2342512377
219
446 2342512001
446 abnormality
3320133511
2342512378
221
449 2342512010
449
cancer 3320133511
stage1
2342512379
222
2342512012
3320133511
2342512381 449
222
449
cancer
2342512013
3320133511
2342512382
224
2342512021
3320133511
449
449
lymphnode
metast
2342512383
225
2342512022 count 3320133511
atic
33201
23425
457
457 abnormality
3320133511 2342512475
2342512031
3320133511
3320133512
298
33511
12002
3322572593
2347911054
Bilirubin concentration
INVESTIGATION
446 737
2342511943
446 completed
examination
HAS_TARGET
completed LOCUS
737
follow up
213
radiotherapy cycle
completed
0
3322572593
2347911056
Alkaline Phosphatase
concentration
INVESTIGATION
446 841
2342511943
446 completed
examination
HAS_FINDING
completed
PROBLEM
841
mammography
screening
214
radiotherapy
cycle
completed
0
2347911058
Creatinine
PROBLEM
449 3322572593
2342511944
449
cancer
HAS_LOCUS
staging concentration
completed
LOCUS
0
0
scheduled
mammography
screening
216
chemotherapy cycle
completed
2347911060
ESR
concentrationcompleted CONSULT
CONSULT
449
2342511946
449
excision
ARRANGED
biopsy
0 3322572593
217
chemotherapy
cycle histopathology
completed
2347911062
axillary lymphnodes
INVESTIGATION
449 3322572593
2342511947
449
HAS_FINDING
completed PROBLEM
0
2347911065
abdomen
219
chemotherapy
cycle excision
deferred
PROBLEM
449 3322572593
2342511948
449
HAS_LOCUS
biopsy
completed LOCUS
0
2347911070
liver
CONSULT
457 3322572593
2342511950
XrayARRANGED
completed CONSULT
219
packed red
cell457
transfusion
completed
0
2347911072
spleen
INVESTIGATION
457 3322572593
2342511951
457
testing
INDICATED_BY
completed PROBLEM
221
chemotherapy cycle
completed
2347911090
axilla
INVESTIGATION
457
457 0 examination
RECOMMENDED_BY
-1 3322572593 BRCA2342511951
0
invasive completed
oestrogen CONSULT
222
chemotherapy
cycle
1 +ve
tubular deferred
receptor +ve
3322572593
2347911268
brain
LOCUS
457
2342511952
457
examination
HAS_LOCUS
completed PATIENT
adeno
packed
red cell transfusion
-1 3322572593
1222
BRCA
invasive completed
oestrogen PROBLEM
2347911292
lung0
INVESTIGATION
457
2342511953
457 0 examination
INDICATED_BY
completed
1 +ve
tubular
receptor +ve
224
chemotherapy
cycle
adeno completed
2347911294
lung
INVESTIGATION
465 3322572593
2342511953
465
XrayRECOMMENDED_BY
completed CONSULT
0
225
chemotherapy
cycle testing
completed
2347911319
brain
INVESTIGATION
465 3322572593
2342511953
465
HAS_TARGET
completed LOCUS
0
2342511937
clinic 0
0
2342511937
clinic
0
2342511944
clinic
0
0
2342511937
clinic
0
2342511949
0
0
R2342511948
0
0
2342511937
0
2342511985
0
0
2342511936
0 R2342511950
0
3320133511
0 R2342511936
0
5
L 2342511950
0
L 2342511954
0
0
INVESTIGATION
465 3322572593
2342511953
465package
examination
HAS_FINDING
completed PROBLEM
347
relapse treatment
2347911414
bone metabolismcompleted
unsuccessful
0
2342511955
Why are textual reports
needed?
Clinicians and other health professionals use patient health
summaries at the point of care, where time is a critical resource
Reports provide quick access to an overview of a patient’s
medical history
Typically, an electronic patient record contains around 1000
messages
Even structured, this volume of data is very large
Access to relevant information about particular patients is difficult
Textual reports:
are easy to read and understand
can be customised to the type of information needed
provide a quick way of identifying errors in the patient record
alleviate the need to know in detail the structure of the underlying
database
Why are paraphrases needed?
Alternative views of the patient record, i.e.,
Reports from various viewpoints:
Full chronological reports
Summaries of investigations, interventions,
treatments
Same content, different textual
representation
Potted summaries also important (30second overview of patient’s history)
Content selection
•Two notions:
•Spine events: the main concepts in the summary (depending on userdefined type of summary)
•Skeleton events: linked to the spine by various relations
•Basic procedure:
•Step 1: group linked events into clusters and remove small clusters
•Typically, a small number of very large clusters and a small number
of small clusters
•Small clusters are assumed not to be related to the main topic of the
summary
•Step 2: Identify spine events according to the type of summary
Longitudinal, Investigations, Interventions, Problems
•Step 3: Identify the skeleton events
If (“problem is spine event” and “investigation has_indication
problem”) then select investigation (unless already selected)
Repeat step 2 a certain number of times (given by a threshold
parameter)
Spine of Problem events
mammogram
pain
biopsy
lump
breast
cancer
cancer
radiotherapy
ulcer
radiotherapy
cycle
Problem
Hyperbaric
oxygenation
The patient identifies pain in the left
breast. A lump in the breast is found
through a mammogram.
A biopsy performed on the breast
reveals cancer in the left breast. The
patient receives radiotherapy to treat
the cancer. Skin ulceration develops in
the left breast as a result of
radiotherapy, which is treated with
hyperbaric oxygenation.
radiotherapy
pain
breast
radiotherapy
cycle
mammogram
cancer
biopsy
Hyperbaric
oxygenation
lump
Interventions
ulcer
Radiotherapy on the breast is
initiated to treat cancer in the
breast. A first radiotherapy cycle
is performed.
The radiotherapy causes skin
ulceration. The patient receives
hyperbaric oxygenation to treat
the ulcer.
mammogram
pain
breast
cancer
lump
biopsy
radiotherapy
ulcer
radiotherapy
cycle
Hyperbaric
oxygenation
Investigations
A mammogram is performed
because of pain in the left breast,
which identifies a lump in the
breast. A biopsy of the lump
identifies cancer in the left
breast.
radiotherapy
pain
breast
mammogram
lump
biopsy
cancer
radiotherapy
cycle
ulcer
Hyperbaric
oxygenation
Interventions
mammogram
mammogram
pain
pain
biopsy
lump
breast
breast
cancer
lump
cancer
cancer
biopsy
radiotherapy
ulcer
radiotherapy
cycle
Problem
Hyperbaric
oxygenation
radiotherapy
ulcer
radiotherapy
cycle
Hyperbaric
oxygenation
Investigations
Discourse structuring
Mostly given by relations in the EPR
19 different types of relations, which can be:
Attributive: Problem has_locus Locus
Rhetorical: Problem caused_by Intervention
Attributive relations do not contribute to the discourse
structure
In a first step, events linked through attributive
relations are combined:
Message_Problem+Message_Locus =>
Message_Problem_Locus
Messages are grouped according to type of summary:
Longitudinal: events occurring in the same week should
be grouped together and further grouped into years
Logical: arrange chronologically and then group similar
events (e.g., liver panels, screening consults)
Discourse structuring
Within each group:
link messages by discourse relations inferred from
EPR relations: Cause, Result, Sequence
assume a List relation if no relation specified
Between groups:
If all events in one group are linked to events in
another group by some EPR relation, link groups
through the corresponding discourse relation
Otherwise, assume a List relation
Text structuring
Aggregation
Problems:
Problem_1:name HAS_LOCUS Locus_1
Problem_2:name HAS_LOCUS Locus_2
Problem_3 HAS_LOCUS {Locus_1, Locus_2}
Enlargement of the liver + Enlargement of the spleen
=> Enlargement of the liver and/but not of the spleen
Investigations:
Investigation_1:name HAS_INDICATION Problem_1
HAS_LOCUS Locus_1
Investigation_2:name HAS_INDICATION Problem_2
HAS_LOCUS Locus_2
Investigation_3 HAS_INDICATION
{Problem_1, Problem_2}
Examination of the abdomen revealed no enlargement of the liver
Examination of the lymphnodes revealed no lymphadenopathy
=> Examination revealed no enlargement of the liver and no
lymphadenopathy
Text structuring
Aggregation
Interventions
Intervention_1 PART_OF Intervention_0
Intervention_2 PART_OF Intervention_0
{count} Intervention_1
[ID01]Chemotherapy cycle PART_OF [ID0]Chemotherapy
[ID02]Chemotherapy cycle PART_OF [ID0]Chemotherapy
[ID03]Chemotherapy cycle PART_OF [ID0]Chemotherapy
3 chemotherapy cycles
 Ellipsis
Examination of the left breast revealed no recurrent cancer
in the left breast =>
Examination of the left breast revealed no recurrent cancer
Text structuring
Events can be compacted according to domainspecific rules:
Clinical examination is: examination of the liver, examination
of the spleen, examination of the abdomen
Clinical examination was normal
Clinical examination was normal apart from an
enlargement of the spleen
Clinical examination revealed enlargement of the spleen
Liver panel is: billirubin concentration, ESR concentration,
GCT concentration
The liver panel was in the normal range (apart from a very
high level of GCT)
Maintaining the thread of
discourse
Textual representation should reflect the relative
importance of events
At discourse level: spine concepts are preferably
realised in nuclear units and skeleton events in
satellite units
At sentence level: spine events are assigned salient
syntactical roles
The status of an event of being on the spine or on the
skeleton determines its realisation as a sentence, a
main or subordinate clause, phrase
Typical output of the NL generator
Long chronological report
Year 1
Week 0
A mammography screening was scheduled at the clinic.
Week 1
Primary cancer of the right breast; histopathology: invasive tubular adenocarcinoma.
YEAR 2
Week 131
Xray revealed no cancer of the right breast.
YEAR 5
Week 287
Xray revealed no cancer of the right breast.
YEAR 8
Week 443
Xray revealed cancer of the right breast.
Week 446
Examination (indicated by primary cancer of the right breast) revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right
breast and no lymphadenopathy of the right axillary lymphnodes.
Testing (indicated by primary cancer of the right breast) revealed no abnormality of the haemoglobin concentration and no abnormality of the
leucocyte count.
An Xray (indicated by primary cancer of the right breast) was performed.
Very high level of the ESR concentration.
Very high level of the Creatinine concentration.
Very high level of the Alkaline Phosphatase concentration.
Very high level of the Bilirubin concentration.
Very high level of the GGT concentration.
No abnormality of the platelet count.
Week 449
An initial treatment planning was completed at the clinic.
Excision biopsy revealed no metastatic lymphnode count of the right axilla.
Histopathology revealed primary cancer of the right breast.
Cancer staging revealed stage1 cancer.
Hormone anatagonist therapy was started to treat primary cancer of the right breast.
Lumpectomy was performed on the breast to treat primary cancer of the right breast.
Primary treatment package was started to treat primary cancer of the right breast.
………………….
YEAR 17
Week 893
Xray revealed no cancer of the right breast.
Typical output of the NL generator
Compact reports
Focus on Problems
Focus on Interventions
In week 0, the patient is diagnosed with primary cancer of the right
breast, histopathology: invasive tubular adenocarcinoma.
In week 0, the patient is diagnosed with primary cancer of the
right breast, histopathology: invasive tubular adenocarcinoma.
In weeks 131 and 287 Xray revealed no cancer of the right breast.
In week 449, excision biopsy revealed no metastatic lymphnode
count of the right axilla. Histopathology revealed primary cancer
of the right breast. Lumpectomy was performed on the right
breast. Hormone anatagonist therapy was started to treat primary
cancer of the right breast.
In week 446, there was no enlargement of the liver or of the spleen,
no recurrent cancer of the right breast and no lymphadenopathy of
the right axillary lymphnodes revealed by examination. There was no
abnormality of the haemoglobin concentration or of the leucocyte
count, no abnormality of the platelet count, very high level of the
GGT concentration, of the Bilirubin concentration, of the Alkaline
Phosphatase concentration, of the Creatinine concentration or of the
ESR concentration.
In week 449, excision biopsy revealed no metastatic lymphnode count
of the right axilla. Histopathology revealed primary cancer of the
right breast. Lumpectomy was performed on the right breast.
Hormone anatagonist therapy was initiated to treat primary cancer of
the right breast.
In weeks 457 to 737, there was no enlargement of the liver or of the
spleen, no recurrent cancer of the right breast and no
lymphadenopathy of the right axillary lymphnodes. There was no
abnormality of the haemoglobin concentration or of the leucocyte
count, no abnormality of the platelet count, very high level of the
GGT concentration, of the Bilirubin concentration, of the Alkaline
Phosphatase concentration, of the Creatinine concentration and of
the ESR concentration.
In weeks 457 to 893, Xray revealed no cancer of the right breast.
Focus on Investigations
In week 0, the patient is diagnosed with primary cancer of the right breast,
histopathology: invasive tubular adenocarcinoma.
In weeks 131 and 287 Xray revealed no cancer of the right breast.
In week 446, examinations revealed no enlargement of the liver or of the spleen, no
recurrent cancer of the right breast and no lymphadenopathy of the right axillary
lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or
of the leucocyte count, no abnormality of the platelet count, very high level of the
GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase
concentration, of the Creatinine concentration or of the ESR concentration.
In week 449, excision biopsy revealed no metastatic lymphnode count of the right
axilla. Histopathology revealed primary cancer of the right breast.
In weeks 457 to 737, examinations revealed no enlargement of the liver or of the
spleen, no recurrent cancer of the right breast and no lymphadenopathy of the
right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin
concentration or of the leucocyte count, no abnormality of the platelet count, very
high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline
Phosphatase concentration, of the Creatinine concentration and of the ESR
concentration.
In weeks 457 to 893, Xray revealed no cancer of the right breast
Ongoing work on report
generation
Add domain-specific knowledge to
improve content selection
Some events are become important
depending on context
Change the (sub-)domain
Test if the generation method is easily
portable
Link NLG to IR to improve IR
Produce reports for patients
Summary and Conclusions
CLEF is now entering the integration
phase, moving towards testing and
deployment
Major emphases at this point are on
privacy and security
Informing patients a major thread for
future work.
Integrating IE and NLG
Thank You!
Collaborators:
Catalina Hallett
Richard Power
Evaluation procedure
Subjects:
We tested the performance of 15 subjects.
Subjects had a range of expertise in the CLEF domain -from expert (oncologist) to novice (computer scientist), but
most subjects had some medical training.
Subjects had no previous experience with the CLEF
WYSIWYM query interface, but most were aware of its
fundamental principles.
Methodology:
Subjects were given a set of four fixed queries to formulate
using the CLEF WYSIWYM query interface.
The queries were expressed in language as different as
possible from the language in the query interface.
Each subject received the queries in a different order.
Evaluation – data analysis
We recorded
the time taken to compose each query.
the number of operations used for constructing a
query and compared it with the optimal number of
operations (pre-computed).
We analysed whether performance, as indicated by
Speed
Efficiency
improves with training (experience).
Evaluation results
Time to completion
After their first experience
of composing a query,
subjects’ completion time
halved, and asymptotes at
that level.
Tim e to com pletion
7
6
Time (mins)
Subjects’ performance
improved dramatically with
experience.
5
4
3
2
1
0
1
2
3
Order of query
4
Evaluation results
Performance over time: performance normalised over
complexity
Operations
(total - optimal /optimal)
After just one go with
the CLEF interface,
subjects are highly
proficient in their
ability to compose
complex queries.
By the time they get to
their fourth query,
subjects’ performance is
almost perfect.
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
Order of query
Mean : 0.18
Optimal operation = min # of operations needed to compose the query perfectly.
This is a measure of the complexity of the query.
Evaluation – comparison with
SQL
Very small scale experiment
Two subjects:
with expert knowledge of the structure,
organisation and content of the CLEF database
highly skilled users of SQL
with minimal experience with WYSIWYM
were given access to the SNOMED and ICD codes
required to build the SQL
Each subject composed a query first in the
CLEF WYSIWYM Interface and then in SQL
Evaluation – comparison with
SQL
Subject 1 – Query 1
WYSIWYM: 2.3 mins
SQL: 8.5 mins (incomplete)
Subject 2 – Query 2
WYSIWYM: 4.5 mins
SQL:12 mins (incomplete)
12
10
8
WYSIWYM
SQL
6
4
2
0
Subject 1
Subject 2
Even with a slowly reacting interface, the subjects were much faster composing
queries in WYSIWYM than in SQL
Are the feedback texts
ambiguous to the users
Identified 6 types of
ambiguity
4 examples of each, with
forced-choice judgements by
15 subjects
Random jugements would give
a score of 33%
Results show 84% correct
judgements
repository
summarisation
summary patient records
for clinicians and medical
researchers
summary patient records
linear text
for patients
hypertext
animated dialogue
Sample report for Clinicians
In the weeks 195 to 196, self examination revealed lump of the right
breast.
In week 197, self examination revealed lump of the right breast.
Excision biopsy revealed metastatic lymphnode count of the right axilla.
Histopathology revealed cancer of the right breast. Cancer staging
revealed stage2 cancer. Radical mastectomy was performed on the
breast to treat the primary cancer. The patient was diagnosed with
metastatic lymphnode count of the right axilla; 19 nodes involved out of
24. The patient was diagnosed with metastatic cancer of the right
axilla; histopathology: invasive undifferentiated adenocarcinoma. The
patient was diagnosed with cancer of the right breast; histopathology:
invasive undifferentiated adenocarcinoma. The patient was diagnosed
with stage2 cancer; histopathology: invasive undifferentiated
adenocarcinoma. Primary treatment package was initiated to treat
primary cancer of the right breast.
Sample report for Clinicians
In the weeks 195 to 196, self examination revealed lump of the right
breast.
In week 197, self examination revealed lump of the right breast.
Excision biopsy revealed metastatic lymphnode count of the right axilla.
Histopathology revealed cancer of the right breast. Cancer staging
revealed stage2 cancer. Radical mastectomy was performed on the
breast to treat the primary cancer. The patient was diagnosed with
metastatic lymphnode count of the right axilla; 19 nodes involved out of
24. The patient was diagnosed with metastatic cancer of the right
axilla; histopathology: invasive undifferentiated adenocarcinoma. The
patient was diagnosed with cancer of the right breast; histopathology:
invasive undifferentiated adenocarcinoma. The patient was diagnosed
with stage2 cancer; histopathology: invasive undifferentiated
adenocarcinoma. Primary treatment package was initiated to treat
primary cancer of the right breast.
…
Sample report for Patients
You had a consultation with your doctor on September 20th 1993.
On September 27th you did a self examination and you found that you
had a lump in your right breast. A self examination is an examination of
the breasts by running your hand over each breast and up under your
arms and checking for changes to their size, shape or feel.
On October 4th you did another self examination and you found that
you still had a lump in your right breast.
On October 11th you had a radical mastectomy to treat cancer in your
right breast. A radical mastectomy is an operation to remove the breast,
along with the lymph glands under the arm and the muscles of the
chest wall. Cancer is a tumour that tends to spread, both locally and to
other parts of the body.
…
Presenting
patient records
in hypertext:
dividing the
text into
related units
You had a consultation with your
doctor on September 20th 1993.
SEQUENCE
On September 27th you did a
self examination.
HAS-FINDING
you found that you had a
lump in your right breast.
SEQUENCE
On October 4th you did another
self examination.
A self examination is an examination
of the breasts by running your hand
over each breast and up under your
arms and checking for changes to
their size, shape or feel.
SEQUENCE
to treat cancer in your
right breast.
MOTIVATION
On October 11th you had a
radical mastectomy.
Cancer is a tumour that tends
to spread, both locally and to
other parts of the body.
A radical mastectomy is an operation
to remove the breast, along with the
lymph glands under the arm and the
muscles of the chest wall.
Presenting
patient records
in hypertext:
giving graphical
attributes to
the text units
You had a consultation with your
doctor on September 20th 1993.
SEQUENCE
On September 27th you did a
self examination.
HAS-FINDING
you found that you had a
lump in your right breast.
SEQUENCE
On October 4th you did another
self examination.
A self examination is an examination
of the breasts by running your hand
over each breast and up under your
arms and checking for changes to
their size, shape or feel.
SEQUENCE
to treat cancer in your
right breast.
MOTIVATION
On October 11th you had a
radical mastectomy.
Cancer is a tumour that tends
to spread, both locally and to
other parts of the body.
A radical mastectomy is an operation
to remove the breast, along with the
lymph glands under the arm and the
muscles of the chest wall.
Presenting
patient records
in hypertext:
using animation to
represent
discourse patterns
dynamically
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
you found that you
self examination.
had a lump in your
A self examination
is an examination
right breast.
of the breasts by running your hand
On October 4th you over
did each breast and up under your
arms and checking for changes to
another self examination.
their size, shape or feel.
The radical mastectomy was
On October 11th you had a
done to treat cancer in your right
radical mastectomy.
A radical mastectomy is an operation
breast.
to remove the breast, along with the
Cancer is a tumour that tends
lymph glands under the arm and the
to spread, both locally and to
muscles of the chest wall.
other parts of the body.
You had a consultation with
your doctor on September
20th 1993.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
self examination.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
self examination.
You found that you
had a lump in your
right breast.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
You found that you
self examination.
had a lump in your
A self examination
is an examination
right breast.
of the breasts by running your hand
over each breast and up under your
arms and checking for changes to
their size, shape or feel.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
You found that you
self examination.
had a lump in your
A self examination
is an examination
right breast.
of the breasts by running your hand
On October 4th you over
did each breast and up under your
arms and checking for changes to
another self examination.
their size, shape or feel.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
You found that you
self examination.
had a lump in your
A self examination
is an examination
right breast.
of the breasts by running your hand
On October 4th you over
did each breast and up under your
arms and checking for changes to
another self examination.
their size, shape or feel.
On October 11th you had a
radical mastectomy.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
You found that you
self examination.
had a lump in your
A self examination
is an examination
right breast.
of the breasts by running your hand
On October 4th you over
did each breast and up under your
arms and checking for changes to
another self examination.
their size, shape or feel.
On October 11th you had a
radical mastectomy.
A radical mastectomy is an operation
to remove the breast, along with the
lymph glands under the arm and the
muscles of the chest wall.
You had a consultation with
your doctor on September
20th 1993.
On September 27th you did a
You found that you
self examination.
had a lump in your
A self examination
is an examination
right breast.
of the breasts by running your hand
On October 4th you over
did each breast and up under your
arms and checking for changes to
another self examination.
their size, shape or feel.
The radical mastectomy was
done to treat cancer in your right
breast.
On October 11th you had a
radical mastectomy.
A radical mastectomy is an operation
to remove the breast, along with the
lymph glands under the arm and the
muscles of the chest wall.
Monologues/Dialogues
Monologue
Autonomous agent reads
the generated report
Aims: accessibility,
education (not translation)
Dialogue
Report is generated as a
script that 2 agents act out
Aims: accessibility,
vicarious learning
Example (video clip)