Supporting Healthcare Professionals by providing suggestions based on comparable analysis results of previous cases Christian Lüpkes OFFIS e.V., Escherweg 2, 26121 Oldenburg, Germany [email protected] Abstract. Within the EU-founded research project iCARDEA, patient data are collected from different sources and then integrated into a standardized and harmonized central data repository. This generates new forms of support for healthcare professionals, especially using data analysis techniques to generate statistic valid patterns and by that providing suggestions for the currently treated patient based on previous cases. Since patient data are collected over a longer period of time, there are frequent changes in the used classifications for documentation. In the evaluating clinic in Austria, for example, the ICD-10GM classification is used for documentation of patient diagnosis which is updated every year. This causes so-called semantic shifts within the data, meaning that the same code has a (slightly) different interpretation from one version to another. The codes of the pattern have to be adapted to the latest used classification in order to correctly correlate current patient data to older patterns. Therefore, a general attempt and an implementation are provided which are capable to show the evolution of codes as a graph, and indicate various meanings of the same code in different versions. This is enabled by storing different classifications together with officially provided transition rules and using these to calculate comparable result sets. With the introduced graph-based attempt, users are able to decide which code or set of codes is correct to reflect the meaning of the pattern. Thereby, the patterns which were induced on historical patient data using different classifications can be adapted to the actual environment and applied to current patients. Keywords. Data analysis in healthcare, semantic shift, information support 1 Project Background The iCARDEA system aims to automate and personalize the follow-up of cardiac arrhythmia patients who have a CIED implant with computer interpretable clinical guideline models using standard device interfaces and integrating patient EHRs. One goal is to support the healthcare actor by providing an integrated view on all available patient related in- formation, including patterns based on data analysis of historical patient cases. These statistically valid patterns are used for making suggestions to healthcare actors and are also useful for understanding patients´ treatments. The patterns inherit knowledge from previous treatments and should be displayed to the healthcare actor, if the patient information of a current patient is similar to the values of a pattern. 2 iCARDEA architecture Figure 1 shows the brief overall architecture and the environment in which iCARDEA provides interoperational services. The details and functionalities for each module have been characterized explicitly in [1,2] Figure 1 Brief iCARDEA Architecture Overview The modules have the following functions: CIED Telemonitoring: interoperability interface exposing the CIED / ICD Data from reports of different vendors in IEEE 11073 Nomenclature and HL7v2 standard with IHE IDCO profile. EHR and PHR: integrating clinical EHR and external PHR using the appropriate standards such as CM, XDS and XPHR. Adaptive Careplan Engine: operates as a guideline-based clinical decision support system together with the care plan monitoring tool. Data analysis and correlation mechanism: providing suggestions based on data analysis of medical knowledge bases to help healthcare team. Patient data obtained from the different sources of CIED-reports, EHR and PHR systems via the interoperability layers are stored at a single data repository in an integrated and standardized format for immediate or later use by all iCARDEA components. This enables healthcare actors to receive information on previous treatment, represented by patterns, when treating a patient whose parameters are similar. For the generation of patterns, historic patient data were obtained from the hospital information system of the Austrian clinic where iCARDEA is evaluated. This data were used because it is semantically compatible to the used target environment. Also, the data and derived analysis results are trusted by the healthcare team that evaluates iCARDEA. 3 Data analysis of clinical data Data analysis is a process with the goal of highlighting information, suggesting conclusions and supporting decision making. It has multiple approaches and encompasses diverse techniques under a variety of names, like Data Mining for predictive purposes or Online Analytical Processing (OLAP) for aggregation and exploration of data [3]. 3.1 Medical data sources for analysis The underlying data for data analysis must be reliable, approved, legally and technically available and potentially useful for the question to be answered [4]. Reliable data: data must origin from a trustful environment and must be based on real cases. Approved data: the facts, conclusions, actions represented by the data must be approved by domain experts to inherit appropriate conclusions. Available: no legal regulations may prohibit the access or use of the data and also the data must be in a computer interpretable electronic format. Potential useful: the patient data must origin from similar medical cases as the envisioned target environment. Most of these requirements are fulfilled when using historical patient cases from the clinic where the data analysis results should be used. Only the part of availability is problematic since not all legacy hospital information systems provide all information in a structured attribute value format. Other possible knowledge bases, such as the MIMICII dataset from Physionet[5], are lacking the approval and especially the reliability of the data items. This lowers the trust and raises objections about the meaning of the analysis results. Therefore, within the iCARDEA project, the medical data of 200 patients with implants treated between 2007 and 2011 were obtained by the evaluation clinic. 3.2 Data preparation for analysis and modelling The patient data were integrated into a special data analysis database, known as data warehouse system. This serves as the data source for all data analysis tasks [6]. Here, the data were stored and quality assured. This means missing data items are treated, the syntax is harmonized and checked for validity according to the used classifications, e.g. ICD10-GM (International Classification for diseases – German Modification) for diagnosis. This quality-assured data was then modeled into analysis specific data representations. For OLAP, so called cubes were defined to explore the patient data, giving clues about the patient distribution, the age at first visit to the cardiological department and the amount of diagnosis. Also for association rule data mining, different propositionalized relations where created to generate hypothesis about the correlation of age, gender and different diagnosis. 3.3 Data analysis results and usage The results from OLAP are charts and navigable excel sheets providing and representing statistics of the historical patient cases. The information obtained by them is the derivation from expected distributed values and knowledge about the patients, whose data were used for data analysis tasks. This information helps the healthcare actor to under- stand his patients and to estimate and interpret other analysis results, like the patterns created by data mining. The predictive patterns where obtained by association rule mining. Association rule mining means identifying sets of patient data that frequently occur together and the results follow the scheme “antecedent consequent” or more natural “prerequisite conclusion” [7]. To interpret the results, also the amount of different patients fulfilling the prerequisite, and the amount and derived percentage of a rule is presented. As data items the age, different kind of diagnosis and gender were used. To prohibit overspecialized pattern, the age was binned into five ranges and the diagnosis, coded in ICD10-GM, was aggregated to the three digits categories. The results are shown in Figure 2. Figure 2 Patterns obtained from association rule mining From a technical point, it is remarkable that the attribute “gender” is not included in any of the created top confidence 50 rules. Also the nearly non- presence of age was not to be expected. Also remarkable is the low diversity of used ICD10 Codes. From the total available 253 used diagnosis on the group level, only the six groups E11, E78, I10, I25, I47, and I50 appear together. All other 247 ICD groups are not significantly used with other codes. Since I10 to I50 are cardiac diseases this increased occurrence was expected. E11 and E78 - diabetes and hyperlipidaemia – seem to be indicating the most two common spotted risk diagnosis. These patterns are stored within the iCARDEA environment at the evaluation clinic in Austria. When a healthcare actor views the patient data of a currently treated patient, provided by the EHR, PHR and CIED interoperability and integration layer, the data items are checked for similarity to the prerequisites of the stored patterns. If the pattern is suitable for the patient, the healthcare actor receives a feedback to consider the conclusion of the pattern for the patient. By doing so, the healthcare actor receives suggestions based on the inherited information of the treatment of previous cases, and therefore benefits from regular knowledge. 4 Evolution of classifications causing semantic shift The problem of different and changing classifications arises at data analysis task, since data analysis is intended to make statements over the meaning of data [8]. To make useful suggestions, the meaning of data elements within patterns and current patient data must be comparable. In data analysis special metadata, so called dimensions, are used to describe the meaning of data elements. In the medical field, the taxonomy of the ICD10 is often used as a dimension. As stated above, in Austrian healthcare, as well as in Germany, the ICD10-GM is used for clinical documentation [9,10]. This classification is updated every year and provided together with transition rules between the old and new classifications. The updates cover insertion of newly discovered diseases, regrouping or deletion of diagnosis. Since we used patient data from 2007 to 2011, five different ICD10GM versions were used within the analyzed data. And when the patterns are used, the current patient data is provided following the newest available classifications that are used at the evaluation clinic. Therefore, the patterns have to be adapted to the current version at the clinic. In order to deal with this problem, a prototype was developed in iCARDEA to show the evolution which a special disease made and also indicates if the code is still available. In Figure 3 the prototype with two complete different meanings of the ICD-GM code C83.3 (a type of cancer) are shown. For the years of 2004 to 2010 C83.3 was stable, but in 2011 it changed to C85.2. If the meaning of C83.3 in 2011 was interesting, for the years 2004 to 2010 the code C83.4 has to be taken as reference value. Figure 3 Prototype showing different meanings of ICD-GM code C83.3 For the suggestion system this means if a pattern has the code of C83.3 in the prerequisite, the pattern has to be shown to the healthcare professional if the current patient dataset contains C85.2 since the pattern was created on patient data obtained before 2011 and the similar meaning of the code is now represented by C85.2. The graph based attempt was used, since the dimensions themselves can always be represented as graphs and they are easy to understand for specialists using data analysis tools [11]. For the iCARDEA project, this tool was used first to harmonize the patient data to one version of ICD10-GM and then to adapt the analysis results to be comparable with the target environment. 5 Summary This paper presented the attempt of iCARDEA to provide analysis results of historical patient data as suggestions to healthcare actors when treating a new, similar patient. The similarity of a patient to previous cases thereby depends on the compatibility and congruence of patient data. Since the classifications used for coding the patient data change regularly, a graph based tool to map data was presented. This visualizes the evolution of codes. By using this tool, the patterns were updated to a target environment compatible version. By this it is ensured that the patterns which represent knowledge that was inherited from the treatment process in the past can be used for current patients and providing the same meaning of the pattern. Acknowledgements The research leading to these results received funding from the European Community’s 7th Framework Programme (FP7/2007-2013) under Grant Agreement no. ICT-248240. References 1. Yang, M; Lüpkes, C; Dogac, A; Yuksel M; Tunçer, F; Namlı, T;; Plössing M; Ulbts, J; Eichelberg M; Interoperability Challenges in the Health Management of Patients with Implantable Defibrillators: Computing in Cardiology 2010, ISSN 0276-6574 2. Laleci, G; Dogac, A; Yuksel, M; Kabak, Y; Arbelo, E; Danmayr, F; Hinterbuchner, L; Chronaki, C; Eichelberg, M; Lüpkes, C Personalized Remote Monitoring of the Atrial Fibrillation Patients with Electronic Implant Devices, Journal of Healthcare Engineering Vol.2 No. 2 2011, ISSN 1756-8250 3. Han, J; Kamber, J: Data Mining: Concepts and Techniques, 2nd ed. The Morgan Kaufmann Series in Data Management Systems, Jim Gray(Eds) Morgan Kaufmann Publishers, March 2006. ISBN 1-55860-901-6 4. Kimball, R ; Ross, M ; Thornthwaite, W ; Mundy, J ; Becker,B: The Data Warehouse Lifecycle Toolkit. 2nd. Wiley Publishing, 2008. – ISBN 0470149779 5. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, Mietus J, Moody G, Peng C, Stanley H. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220 , 2000 6. Inmon, W: Building the data warehouse (2nd ed.). New York, NY, USA: John Wiley & Sons, Inc., 1996. – ISBN 0–471–14161–5 7. Fayyad, U; Piatetsky-Shapiro, P; Padhraic, S (1996): From Data Mining to Knowledge Discovery in Databases In AI Magazine, American Association for Artificial Intelligence, California, USA 8. Lüpkes, C: Ad-hoc Datentransformationen für Analytische Informationssysteme. Gassler, W; Zangerle, E; Specht, G (Eds): Proceedings of the 23rd GI-Workshop Grundlagen von Datenbanken 2011 ISSN 1613-0073. 9. DIMDI - Deutsches Institut für Medizinische Dokumentation und Information: ICD-10GM Version 2006. Systematisches Verzeichnis. Deutsche Krankenhaus VerlagsGesellschaft, (2005). 10. DIMDI - Deutsches Institut für Medizinische Dokumentation und Information: ICD-10GM Version 2007. Band I: Systematisches Verzeichnis. Deutsche Krankenhaus VerlagsGesellschaft, (2006) 11. Blaschka, M: FIESTA: A Framework for SchemaEvolution in Multidimensional Databases, Technische Universität München, 2000.
© Copyright 2026 Paperzz