cLuepkes-ICICTH2012

Supporting Healthcare Professionals by providing
suggestions based on comparable analysis results of
previous cases
Christian Lüpkes
OFFIS e.V., Escherweg 2, 26121 Oldenburg, Germany
[email protected]
Abstract. Within the EU-founded research project iCARDEA, patient data are
collected from different sources and then integrated into a standardized and
harmonized central data repository. This generates new forms of support for
healthcare professionals, especially using data analysis techniques to generate
statistic valid patterns and by that providing suggestions for the currently treated patient based on previous cases.
Since patient data are collected over a longer period of time, there are frequent
changes in the used classifications for documentation. In the evaluating clinic in
Austria, for example, the ICD-10GM classification is used for documentation of
patient diagnosis which is updated every year.
This causes so-called semantic shifts within the data, meaning that the same
code has a (slightly) different interpretation from one version to another. The
codes of the pattern have to be adapted to the latest used classification in order
to correctly correlate current patient data to older patterns.
Therefore, a general attempt and an implementation are provided which are capable to show the evolution of codes as a graph, and indicate various meanings
of the same code in different versions. This is enabled by storing different classifications together with officially provided transition rules and using these to
calculate comparable result sets.
With the introduced graph-based attempt, users are able to decide which code or
set of codes is correct to reflect the meaning of the pattern. Thereby, the patterns which were induced on historical patient data using different classifications can be adapted to the actual environment and applied to current patients.
Keywords. Data analysis in healthcare, semantic shift, information support
1
Project Background
The iCARDEA system aims to automate and personalize the follow-up
of cardiac arrhythmia patients who have a CIED implant with computer
interpretable clinical guideline models using standard device interfaces
and integrating patient EHRs. One goal is to support the healthcare actor by providing an integrated view on all available patient related in-
formation, including patterns based on data analysis of historical patient cases. These statistically valid patterns are used for making suggestions to healthcare actors and are also useful for understanding patients´ treatments. The patterns inherit knowledge from previous treatments and should be displayed to the healthcare actor, if the patient
information of a current patient is similar to the values of a pattern.
2
iCARDEA architecture
Figure 1 shows the brief overall architecture and the environment in
which iCARDEA provides interoperational services. The details and
functionalities for each module have been characterized explicitly in
[1,2]
Figure 1 Brief iCARDEA Architecture Overview
The modules have the following functions:
 CIED Telemonitoring: interoperability interface exposing the CIED /
ICD Data from reports of different vendors in IEEE 11073 Nomenclature and HL7v2 standard with IHE IDCO profile.
 EHR and PHR: integrating clinical EHR and external PHR using the
appropriate standards such as CM, XDS and XPHR.
 Adaptive Careplan Engine: operates as a guideline-based clinical
decision support system together with the care plan monitoring tool.
 Data analysis and correlation mechanism: providing suggestions
based on data analysis of medical knowledge bases to help healthcare
team.
Patient data obtained from the different sources of CIED-reports, EHR
and PHR systems via the interoperability layers are stored at a single
data repository in an integrated and standardized format for immediate
or later use by all iCARDEA components.
This enables healthcare actors to receive information on previous
treatment, represented by patterns, when treating a patient whose parameters are similar. For the generation of patterns, historic patient data
were obtained from the hospital information system of the Austrian
clinic where iCARDEA is evaluated. This data were used because it is
semantically compatible to the used target environment. Also, the data
and derived analysis results are trusted by the healthcare team that
evaluates iCARDEA.
3
Data analysis of clinical data
Data analysis is a process with the goal of highlighting information,
suggesting conclusions and supporting decision making. It has multiple
approaches and encompasses diverse techniques under a variety of
names, like Data Mining for predictive purposes or Online Analytical
Processing (OLAP) for aggregation and exploration of data [3].
3.1
Medical data sources for analysis
The underlying data for data analysis must be reliable, approved, legally and technically available and potentially useful for the question to be
answered [4].
 Reliable data: data must origin from a trustful environment and must
be based on real cases.
 Approved data: the facts, conclusions, actions represented by the data
must be approved by domain experts to inherit appropriate conclusions.
 Available: no legal regulations may prohibit the access or use of the
data and also the data must be in a computer interpretable electronic
format.
 Potential useful: the patient data must origin from similar medical
cases as the envisioned target environment.
Most of these requirements are fulfilled when using historical patient
cases from the clinic where the data analysis results should be used.
Only the part of availability is problematic since not all legacy hospital
information systems provide all information in a structured attribute
value format.
Other possible knowledge bases, such as the MIMICII dataset from
Physionet[5], are lacking the approval and especially the reliability of
the data items. This lowers the trust and raises objections about the
meaning of the analysis results. Therefore, within the iCARDEA project, the medical data of 200 patients with implants treated between
2007 and 2011 were obtained by the evaluation clinic.
3.2
Data preparation for analysis and modelling
The patient data were integrated into a special data analysis database,
known as data warehouse system. This serves as the data source for all
data analysis tasks [6]. Here, the data were stored and quality assured.
This means missing data items are treated, the syntax is harmonized
and checked for validity according to the used classifications, e.g.
ICD10-GM (International Classification for diseases – German Modification) for diagnosis.
This quality-assured data was then modeled into analysis specific data
representations. For OLAP, so called cubes were defined to explore the
patient data, giving clues about the patient distribution, the age at first
visit to the cardiological department and the amount of diagnosis. Also
for association rule data mining, different propositionalized relations
where created to generate hypothesis about the correlation of age, gender and different diagnosis.
3.3
Data analysis results and usage
The results from OLAP are charts and navigable excel sheets providing
and representing statistics of the historical patient cases. The information obtained by them is the derivation from expected distributed
values and knowledge about the patients, whose data were used for data
analysis tasks. This information helps the healthcare actor to under-
stand his patients and to estimate and interpret other analysis results,
like the patterns created by data mining.
The predictive patterns where obtained by association rule mining. Association rule mining means identifying sets of patient data that frequently occur together and the results follow the scheme “antecedent
 consequent” or more natural “prerequisite  conclusion” [7]. To
interpret the results, also the amount of different patients fulfilling the
prerequisite, and the amount and derived percentage of a rule is presented. As data items the age, different kind of diagnosis and gender
were used. To prohibit overspecialized pattern, the age was binned into
five ranges and the diagnosis, coded in ICD10-GM, was aggregated to
the three digits categories. The results are shown in Figure 2.
Figure 2 Patterns obtained from association rule mining
From a technical point, it is remarkable that the attribute “gender” is
not included in any of the created top confidence 50 rules. Also the
nearly non- presence of age was not to be expected.
Also remarkable is the low diversity of used ICD10 Codes. From the
total available 253 used diagnosis on the group level, only the six
groups E11, E78, I10, I25, I47, and I50 appear together. All other 247
ICD groups are not significantly used with other codes. Since I10 to I50
are cardiac diseases this increased occurrence was expected. E11 and
E78 - diabetes and hyperlipidaemia – seem to be indicating the most
two common spotted risk diagnosis.
These patterns are stored within the iCARDEA environment at the
evaluation clinic in Austria. When a healthcare actor views the patient
data of a currently treated patient, provided by the EHR, PHR and
CIED interoperability and integration layer, the data items are checked
for similarity to the prerequisites of the stored patterns. If the pattern is
suitable for the patient, the healthcare actor receives a feedback to consider the conclusion of the pattern for the patient. By doing so, the
healthcare actor receives suggestions based on the inherited information
of the treatment of previous cases, and therefore benefits from regular
knowledge.
4
Evolution of classifications causing semantic shift
The problem of different and changing classifications arises at data
analysis task, since data analysis is intended to make statements over
the meaning of data [8]. To make useful suggestions, the meaning of
data elements within patterns and current patient data must be comparable.
In data analysis special metadata, so called dimensions, are used to describe the meaning of data elements. In the medical field, the taxonomy
of the ICD10 is often used as a dimension.
As stated above, in Austrian healthcare, as well as in Germany, the
ICD10-GM is used for clinical documentation [9,10]. This classification is updated every year and provided together with transition rules
between the old and new classifications. The updates cover insertion of
newly discovered diseases, regrouping or deletion of diagnosis.
Since we used patient data from 2007 to 2011, five different ICD10GM versions were used within the analyzed data. And when the patterns are used, the current patient data is provided following the newest
available classifications that are used at the evaluation clinic. Therefore, the patterns have to be adapted to the current version at the clinic.
In order to deal with this problem, a prototype was developed in
iCARDEA to show the evolution which a special disease made and also
indicates if the code is still available. In Figure 3 the prototype with
two complete different meanings of the ICD-GM code C83.3 (a type of
cancer) are shown. For the years of 2004 to 2010 C83.3 was stable, but
in 2011 it changed to C85.2. If the meaning of C83.3 in 2011 was interesting, for the years 2004 to 2010 the code C83.4 has to be taken as
reference value.
Figure 3 Prototype showing different meanings of ICD-GM code C83.3
For the suggestion system this means if a pattern has the code of C83.3
in the prerequisite, the pattern has to be shown to the healthcare professional if the current patient dataset contains C85.2 since the pattern was
created on patient data obtained before 2011 and the similar meaning of
the code is now represented by C85.2.
The graph based attempt was used, since the dimensions themselves
can always be represented as graphs and they are easy to understand for
specialists using data analysis tools [11]. For the iCARDEA project,
this tool was used first to harmonize the patient data to one version of
ICD10-GM and then to adapt the analysis results to be comparable with
the target environment.
5
Summary
This paper presented the attempt of iCARDEA to provide analysis results of historical patient data as suggestions to healthcare actors when
treating a new, similar patient. The similarity of a patient to previous
cases thereby depends on the compatibility and congruence of patient
data. Since the classifications used for coding the patient data change
regularly, a graph based tool to map data was presented. This visualizes
the evolution of codes. By using this tool, the patterns were updated to
a target environment compatible version. By this it is ensured that the
patterns which represent knowledge that was inherited from the treatment process in the past can be used for current patients and providing
the same meaning of the pattern.
Acknowledgements
The research leading to these results received funding from the European Community’s 7th Framework Programme (FP7/2007-2013) under
Grant Agreement no. ICT-248240.
References
1. Yang, M; Lüpkes, C; Dogac, A; Yuksel M; Tunçer, F; Namlı, T;; Plössing M; Ulbts, J;
Eichelberg M; Interoperability Challenges in the Health Management of Patients with Implantable Defibrillators: Computing in Cardiology 2010, ISSN 0276-6574
2. Laleci, G; Dogac, A; Yuksel, M; Kabak, Y; Arbelo, E; Danmayr, F; Hinterbuchner, L;
Chronaki, C; Eichelberg, M; Lüpkes, C Personalized Remote Monitoring of the Atrial Fibrillation Patients with Electronic Implant Devices, Journal of Healthcare Engineering
Vol.2 No. 2 2011, ISSN 1756-8250
3. Han, J; Kamber, J: Data Mining: Concepts and Techniques, 2nd ed. The Morgan Kaufmann Series in Data Management Systems, Jim Gray(Eds) Morgan Kaufmann Publishers,
March 2006. ISBN 1-55860-901-6
4. Kimball, R ; Ross, M ; Thornthwaite, W ; Mundy, J ; Becker,B: The Data Warehouse
Lifecycle Toolkit. 2nd. Wiley Publishing, 2008. – ISBN 0470149779
5. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, Mietus J, Moody G,
Peng C, Stanley H. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220 , 2000
6. Inmon, W: Building the data warehouse (2nd ed.). New York, NY, USA: John Wiley &
Sons, Inc., 1996. – ISBN 0–471–14161–5
7. Fayyad, U; Piatetsky-Shapiro, P; Padhraic, S (1996): From Data Mining to Knowledge
Discovery in Databases In AI Magazine, American Association for Artificial Intelligence,
California, USA
8. Lüpkes, C: Ad-hoc Datentransformationen für Analytische Informationssysteme. Gassler,
W; Zangerle, E; Specht, G (Eds): Proceedings of the 23rd GI-Workshop Grundlagen von
Datenbanken 2011 ISSN 1613-0073.
9. DIMDI - Deutsches Institut für Medizinische Dokumentation und Information: ICD-10GM Version 2006. Systematisches Verzeichnis. Deutsche Krankenhaus VerlagsGesellschaft, (2005).
10. DIMDI - Deutsches Institut für Medizinische Dokumentation und Information: ICD-10GM Version 2007. Band I: Systematisches Verzeichnis. Deutsche Krankenhaus VerlagsGesellschaft, (2006)
11. Blaschka, M: FIESTA: A Framework for SchemaEvolution in Multidimensional Databases, Technische Universität München, 2000.