2016 IEEE International Conference on Big Data (Big Data) A Framework to Predict Outcome for Cancer Patients Using data from a Nursing EHR Muhammad K Lodhi∗ , Rashid Ansari∗ , Yingwei Yao† , Gail M. Keenan† , Diana J. Wilkie † and Ashfaq Khokhar‡ ∗ College of Engineering University of Illinois at Chicago, Chicago, IL, 60607 Email: mlodhi3,[email protected] † College of Nursing University of Florida, Gainsville, FL, 32610 Email:y.yao,gkeenan,[email protected] ‡ College of Engineering, Chicago, IL, 60616 Email: [email protected] Abstract—With the rapid growth of electronic data repositories in diverse application domains, including healthcare, considerable research interest has been developed to solve issues related to extraction of hidden knowledge in these repositories. Electronic health record systems (EHRs) are the fastest growing in terms of size and data diversity. In this work, we focus on mining a high dimensional sparse dataset using nursing care data as an exemplar. To mine a high-dimensional and sparse dataset is a challenging task due to a number of reasons. There are several dimension reduction methods, however, they do not work well with contextual datasets. In our study, we have used association mining as a dimension reduction step and for extracting important features from the dataset. Our results show that association mining can be effectively used for dimension reduction and feature extraction step. Our predictive modeling results show that decision tree models generally have high accuracy and the results are easy to interpret and determine the influence of different variables. Keywords-dimension reduction, association mining, predictive modeling, Electronic Health Records (EHRs). I. I NTRODUCTION The use of electronic media to capture, process and accumulate information is witnessing extraordinary developments [1], in part due to developments in digital technologies. This situation has resulted in high dimensional and sparse datasets for various applications such as in electronic health records (EHRs), biology, astronomy, medical imaging, video archiving, and web data. Different data mining techniques have been utilized to extract knowledge embedded in some of these data sets, albeit with limited success [2], [3]. Existing data mining techniques and models can be applied successfully on traditional transactional datasets. However, a number of challenges exist for analyzing high-dimensional data using the traditional methods. Firstly, the complexity of most of these algorithms is exponential in the number of dimensions (columns) since there are an exponential number of column arrangements. Hence, as the number of 978-1-4673-9005-7/16/$31.00 ©2016 IEEE 3387 dimensions of a dataset increase, so does the search space. These column-enumeration based algorithms are therefore infeasible for many of the real world applications. A few row-enumeration algorithms have been introduced [4], [5], though they work best with dense datasets. Secondly, predictive power of predictive models reduces as the dimensionality increases [7]. Different dimension-reduction algorithms, such as Principal Component Analysis [6] are effective and popular algorithms for reducing dimensionality. However, the effectiveness of these algorithms is limited due to their global linearity. All of these conventional dimension-reduction techniques only characterize linear subspaces within the data and do not consider the context of different variables and predictors in the dataset. In this work, we analyze a high dimensional and sparse dataset stemming from nursing care EHRs. These datasets contain thousands of variables (high dimensionality), yet, only a few of them are monitored for an individual patient. Although nursing data are a vital part of EHR, it is frequently disregarded [8], even though nurses are the forefront providers of care to the patients. Using different data mining methods effectively can be exceptionally useful for provision of better care to the patient by development of more effective care, and thus reducing the healthcare costs. Our aim of this work is to utilize the EHR data to reveal best palliative care practices and deliver clinical decision support directly to the nurses after development of an automated system. Accurately timed and controlled palliative care has been shown to decrease the healthcare costs significantly, as well as increasing the quality of care provided to the chronically ill patients [9]. A lot of money being spent every year on chronically ill patients, but palliative care is underutilized due, in part, to ineffective use of available technology [10]. Applying big data techniques to discover knowledge can help in help saving around $300 million annually [11]. Previously, nursing EHR data has not been amenable to big data methods due to the lack of standardization. Our emphasis, in this study, is on the patients suffering from cancer. Cancer is one of the leading cause of death worldwide and the costs of treatments are high due to complexity of this disease. Cancer is not a single disease, but a multitude of diseases, meaning that effective treatment of one type of cancer may not be as effective for treatment of another type of cancer. According to a study from the World Health Organization, cancer accounted for nearly 13% of all deaths over the world in 2012. As the baby boomers age, cancer-related deaths are projected to continue rising, reaching 12 million deaths by 2030, as cancer is more prevalent in those who are advanced in age [12], [13]. Therefore, there is an urgent need to provide cost effective care, and medical and nursing interventions need to be applied more successfully to the hospitalized patients suffering from cancer. Recently, a lot of studies have considered patient-level data to generate useful results in one step towards individualized care. Predictive models have been built on data of cancer patients [14]–[21]. However, these studies have some limitations. Some of the studies [15], [16], [18], [19] target only a selective group of patients based on sex or race, whereas some studies only consider patients belonging to a single hospital unit [14], [20], [21]. Only a few studies by Caballero et al [14] and Ghassemi et al [21] have considered text data for prediction models. None of the works, however, have considered nursing data for predictive modeling which makes our study unique in this regard. Despite all the research, our work is innovative and different in several ways. Firstly, none of the previous works have considered nursing care data provided to the hospitalized patients suffering from cancer. Secondly, we use data mining techniques to help discover hidden knowledge about best practices performed by the nurses associated with better outcomes among such patients. We believe that this is an important innovation for advancing the relevance of EHRs to clinicians as due to unavailability of standardized nursing care data, since much is still not known about the care that the nurses provide to the patients. Standardized data allows for personalization of care provided to the patient to achieve the desired goals and will also help in understanding the important features of care to specific populations. Lastly, although a hierarchical structure of classification has been used comprehensively in some of the other domains like text mining [23], they have not been used as extensively in the medical domain and specially for predicting cancer outcomes. In this on-going work, our aim is to identify models with strong and significant prediction of patient outcomes relevant to palliative care at the shift level and the episode level from different predictors including patient and nurse characteristics and nursing care. The goal is to determine the associations between a variety of nursing care interventions or patient characteristics and improved outcomes for cancer 3388 Table I S AMPLE L IST OF WORDS OR PHRASES USED TO IDENTIFY CANCER PATIENTS IN THE DATABASE Adenocarcinoma Basal Cell Breast Chemomobilization Duodenectomy Glioblastoma hemifacialectomy laryngeal cancer Leukemia Lymphoma Mass Merkell Cell Prostatectomy Radiation Radiation Therapy Rectal Bleeding Tumor VATS patients. II. DATA D ESCRIPTION Our data have been gathered using HANDS database, a nursing EHR system, deployed in 9 different hospital units for three years (2005-2008) [24]. In the database, the nursing diagnoses are based on NANDA-Is [25], different nurse outcomes are based on Nursing Outcome Classification (NOC) [26] for each diagnosis, and the interventions provided to achieve the expected outcomes are based on Nursing Intervention Classification (NIC) [27]. During the original study, 34,927 patients were recorded with 42,403 unique episodes. An episode is defined as an uninterrupted patient stay in a hospital unit, comprised of nurse shifts (single or multiple). Each shift corresponds to a single plan of care (POC) in which nurses documented many attributes (variables), including the patients’ diagnoses (medical and NANDA-I), different outcomes (NOCs) for each diagnosis along with their initial and expected score (between 1 and 5), and the interventions (NICs) provided to achieve the expected outcomes. The POC also incorporates the characteristics of each nurse (e.g., experience, education, patient load, continuity, work pattern, etc.) who provided care during the patient’s hospital stay. Linked to the POC, the HANDS also stores demographic information about the patient (e.g., medical diagnosis, age, gender, and race). The database also consists of medical notes that contains text data. A. Identification of Cancer Patients To identify the cancer patients, the data were extracted from the database using SQL queries. For this study, we extracted cancer patients that had a NOC of Knowledge Cancer Management. We also examine the medical notes in the database to determine all the patients suffering from cancer. We extracted a list of nouns from notes associated with each nurse shift in the database. These lists were reviewed by the domain experts and the final list consist of only those words or phrases that indicate the presence of cancer in the patient. Table I gives a sample of list of words that have been used to distinguish cancer patients. In this work, our focus is on cancer patients suffering from pain since cancer is a common condition where addressing pain is a concern for the patient. Only a limited number of the patients receive proper pain treatment [22]. Table II gives a brief overview of the characteristics of our final dataset. Table II DATASET C HARACTERISTICS Full Dataset Characteristics Total number of Patient Admissions 42,403 Total number of Plan of Cares (POCs) 400,000 Number of Patients 34,927 Number of Cancer Patients 3,685 Number of POCs for Cancer Patients 40,753 III. D IMENSION R EDUCTION AND F EATURE E XTRACTION In this work, our emphasis is on finding patterns of predictors of interest and building predictive models. Our aim is to perform predictive analysis at two different levels of granularity. The initial analysis has been implemented on data at the episode-level, whereas the second analysis has been completed at the POC or shift level and is more ”fine grained”. The episode-level or ”coarse-grained” models will help predict the result at the end of patient hospitalization, whereas the fine-grained models, conducted on the shiftlevel data, will aid in the prediction of the outcome at the end of each nurse shift, thus providing us a course of predict outcomes over the entire patient hospitalization. The coarse-grained models is helpful in predicting outcome at the conclusion of the patient hospitalization, whereas fine-grained models can be utilize to predict the outcome at the end of each shift, thus giving us the ability to track the care provision during hospitalization. We use different nursing and patient characteristics, along with nursing care parameters to determine whether a NOC is met or not met. The target variable for predictive modeling is ”NOC met” which is defined as follows: Final NOC rating ≥ M et, N OC met = Expected Outcome rating. N ot met, Otherwise. (1) Our dataset has 747 variables or features, including but not limited to patients and nurses demographics and the nursing diagnoses, outcomes and interventions. Most of these features, however, are null or empty for a given patient record. 3389 A typical patient has four different diagnoses, five NOCs and ten different NICs on their care plans, thus making only about 2% of the entire dataset containing any useful information. This statistic results in our dataset being a sparse and high-dimensional dataset. To reduce the dimensions, we do not use typical dimension reduction techniques like Principal component analysis or data manifolds since these techniques do not consider the contextual information of the variables. These techniques transform the data and have global linearity assumptions. Instead, association mining has been utilized to reduce the dimensions. Association mining helps in identification of prevalent patterns and hence those patterns can be used as a basis for predictive modeling. We use the apriori algorithm to discover and extract association rules among the dataset. Only rules having a minimum confidence of 55% and support of at least 10% have been included. To study the clinical significance of the patient and nurse variables like patient’s age, length of stay (LOS) and nurse experience on the outcome being met or not met, these features have been discretized based on theoretical rationale and data frequency distribution [28]–[30]. Furthermore, the NANDA-Is and NICs have been clustered together into their respective domains and classes [25]–[27], according to the nursing literature to further reduce the dimensions of the data. Some rules generated using the association mining are given: • • • • • • • • • • • Young Patient → Expected Outcome is not met (confidence: 61.2%, support = 11.3%) Old Patient → Expected Outcome is met (confidence: 57.5%, support = 20.7%) Short Stay → Expected Outcome is not met (confidence: 59.0%, support = 12.1%) Medium Stay → Expected Outcome is met (confidence: 56.0%, support = 18.1%) First Rating = 5 → Expected Outcome is not met (confidence: 60.3%, support = 29.0%) First Rating = 3 or 4 → Expected Outcome is met (confidence: 77.1%, support = 36.2%) Expected Rating ≤ 4 → Expected Outcome is met (confidence: 76.5%, support = 37.4%) At least one diagnosis is present from Infection Class is present → Expected Outcome is met (confidence: 60.4%, support = 40.9%) No NANDA from Infection Class is present → Expected Outcome is met (confidence: 55.8%, support = 16.0%) No Intervention from the Patient Education Class is present → Expected Outcome is not met (confidence: 58.6%, support = 12.3%) Inexperienced Nurse → Expected Outcome is met (confidence: 56.9%, support = 39.2%) Using the results from the association mining, we determine Table III F EATURES U SED F OR P REDICTIVE M ODELING Features Used Feature Name Feature Values Initial & Expected NOC Rating 1-5 Age Young(18-49), Middle-aged(5064), Old(65-84) & Very Old(85+) Length of Stay(LOS) (Derived) Short(<2 days), Medium (2-5 days) & Long(5+ days) Average Nurse Experience (Derived) Inexperienced (Experience less than 2 years) & Experienced (Experience of 2+ years) NANDA Domains Activity/Rest, Comfort, Coping/Stress Tolerance, Elimination, Nutrition, Perception, & Safety/Protection NANDA Classes Cardiovascular/Pulmonary Responses, Cognition, Hydration, Infection, Physical Comfort, Physical Injury, & Pulmonary System NOC Domains Community Health, Family Health, Functional Health, Health Knowledge & Behavior, Perceived Health, Physiologic Health, & Psychosocial Health NOC Classes Community Health Protection, Family Caregiver Performance, Growth & Development, Mobility, Self-Care, Health Behavior, Health Knowledge, Risk Control & Safety, Cardiopulmonary, Sensory, Therapeutic Response, Psychological Well-Being, Social Interaction, & Self-Control NIC Domains Behavioral, Health System, Safety, Physiological: Basic, & Physiological: Complex NIC Classes Activity & Exercise Management, Cognitive Therapy, Communication Enhancement, Drug Management, Electrolyte and Acid/Base Management, Immobility Management, Information Management, Nutrition Support, Patient Education, Physical Comfort Promotion, Psychological Comfort Promotion, Respiratory Management, Risk Management, Self-Care Facilitation, Skin/Wound Management, & Tissue Perfusion Management a comprehensive list of all the features that we will be using for our predictive modeling experiments. All of the features (primitive as well as derived) are listed in Table III. IV. F RAMEWORK OVERVIEW In this section, we thoroughly explain the framework, as given in the Figure 1, and the experimental setup that has been used for predictive modeling. A brief overview of 3390 Figure 1. Framework for Predictive Analysis of Patient Hospitalization and Analysis of Plan of Cares the models is also provided that have been applied in our experiments. A. Problem Definition 1 Given a patient suffering from cancer, our objective is to determine whether that patient meets their expected pain outcome at the end of their hospitalization, irrespective of the fact that whether the patient is discharged or dies. Different classification methods have been availed for prediction analysis using the features listed. Our aim, therefore, is to determine the outcome at the end of the patient hospitalization (Z) such that : 1, Patient i meets the expected outcome at the end Z= of patient hospitalization. 0, Otherwise. (2) Figure 1 depicts the two different types of analyses that can be performed on the data. Figure 1-”Workflow for Episodes Analysis” shows a framework for prediction analysis of the episodes. After identifying all the patients suffering from cancer, as mentioned previously, we then start our prediction analysis using different classification methods. The process may be repeated iteratively to improve the results. The best predictive models are selected depending on the selective evaluation metrics and the results can be clinically interpreted. A simplified procedure to determine patients that met or did not meet their expected outcome is given. Algorithm 1 : Finding best classification model to find patients that meet their Expected Outcome Input: set S = {s | s is a patient in the database}. Select Y ⊂ S such that Y = {y | y is a patient that has cancer and pain through SQL and medical notes } while (y in Y ) do Use different classification methods to build a model that predicts whether patient y meets their expected outcome. end while Output: : Classification model X that gives the best accuracy and AUC results. B. Problem Definition 2 Given a patient suffering from cancer, our second objective is to determine whether that patient meets their pain outcome at the end of the current nurse shift. Similar classification methods are utilized for prediction analysis using the features listed along with the result of the previous shift. Hence, our objective is to determine the outcome at the end of each nurse shift (Z) such that: 1, If patient i meets the expected outcome at the end Z0 = of the current nurse shift. 0, Otherwise. (3) We have developed a dynamic hierarchical framework for prediction analysis of the care plans as given in Figure 1 -”Workflow of POC Analysis”. After identifying all the patients suffering from cancer, as mentioned previously, we then start our prediction analysis using different classification methods. The first level predicts the outcome based on all the features whether the patient will be meeting their outcome at the end of the nurse shift. Based on the results, in the further steps, we determine whether the patient meets their outcome in the current nurse shift. The process continues until the result of the last care plan is predicted. Since, the patient’s total number of care plans is not known at the beginning of the hospitalization, the framework does not have a definite number of steps, and thus, can be considered dynamic and in a hierarchy. A simplified algorithm 2 is given: C. Modeling After the data extraction and data refinement step, the next step is the building of the actual predictive models. To predict whether the patient will be able to meet their expected outcome either at the end of their hospitalization or at the end of the current nurse shift, we build multiple predictive models on our dataset and compared their performances. 3391 Algorithm 2 : Finding best classification model that determines whether the patient meets their expected outcome at the end of the current nurse shift Input: set S = {s | s is a patient in the database}. Select Y ⊂ S such that Y = {y | y is a patient that has cancer and pain through SQL and medical notes } while (y in Y ) do For the first care plan or POC of patient y, use different classification methods to build a predictive model that predicts whether patient y meets their expected outcome at the end of first nurse shift. while (shif t number = 2 till last shif t) do Use the result from the previous iteration, along with other features to predict whether patient y meets their expected outcome at the end of the nurse shift shif t number, utilizing the various classification algorithms. end while end while Output: : Classification Model X’ that gives the best accuracy and AUC results. These models are based on Decision Trees [31], k-NN [32], Support Vector Machines (SVM) [33], Naı̈ve-Bayes [34] and Linear Regression (LR) [32]. D. Experimental Setup and Evaluation Metrics The performance of the models has been evaluated using 10-fold cross-validation method [35]. Different evaluation metrics, such as accuracy, f-measure and Area Under Curve are utilized for comparing the results from the experiments. These different evaluation metrics are beneficial under different environments and can benefit under different circumstances.Wherever practical, we have also performed χ2 test [36] to verify if our achieved results are statistically significant. V. R ESULTS A. Predicting outcome at the end of hospitalization (CoarseGrained Models) In the first set of experiments, we determine whether the patient will be able to meet their expected outcome at the end of their hospitalization, based on different patient and nurse characteristics, along with different nursing diagnoses and interventions that were applied to achieve the expected rating. The foremost objective is to determine best practices from these different variables to have better outcome results. Therefore, the target variable ”NOC met” was a binary variable. The results are delivered in the Table IV below. As observed from the results, the decision tree (given in 2) has the best prediction accuracy at 80.1%, AUC with 0.731 and F-measure of 0.81, when compared with all the other prediction results. Naı̈ve-Bayes model also has a good accuracy and comparable AUC measure with the AUC of decision tree. The k-NN models fare less well when accuracy measures are compared, though k-NN (k = 10) has a better AUC than the decision tree and k-NN (k = 5) has the almost the same AUC as decision tree. SVM results are virtually same with the k-NN models, whereas LR has the worst prediction accuracy, AUC and the f-measure of all the models. We use z-test to check whether the results of different predictive models are statistically the same or different. We assume that the difference is significant if the p-value is below 0.05. Also, we only compare the best and the second best results. If both are same, then we check the best result with the third best and so on. Performing z-test, Decision Tree model accuracy results were statistically significant compared with the accuracy results of Naı̈ve-Bayes (p-value <0.001). AUC results are statistically same for Decision Tree, Naı̈ve-Bayes, k-NN ( k = 5 or 10) and SVM. They were statistically significantly better than k-NN (k = 2) (pvalue = 0.035). F-Measure statistics for Decision Tree were also statistically significant from Naı̈ve-Bayes’s statistics (pvalue <0.01). The prediction model given by the decision tree of whether a patient meets their expectation is given in Figure 2. The NOC rating in the first nurse visit is the most important attribute on predicting whether the patient will meet their expectation. When the nurse sets the rating at 5, there is a 68.6% probability that the patient will not be meeting the outcome; conversely, when the nurse sets the first rating at 4 or below, the patients meet their outcome 77.1% of the times. This difference was determined to be very significant statistically (p-value ≤ 0.0005). When the rating in the first nurse shift is set at 4 or under, the next important attribute is the Electrolyte and Acid/Base Management NIC class. Whenever any intervention from this class was applied to the patient, they did not meet their outcome 55.6% of the times, whereas, when no intervention from Electrolyte and Acid/Base Management Class was used, 79.7% of the patients met their outcome. After running the chi-squared test, this result was also deemed statistically significant (p-value ≤ 0.001). When the rating was set to 5 by the nurses in the first shift, the next significant feature is the Cognitive Therapy Table IV P REDICTION R ESULTS FOR C ANCER C OARSE -G RAINED M ODELS Model Decision Tree Naı̈ve-Bayes k-NN (k=2) k-NN (k=5) k-NN (k=10) SVM Linear Regression Accuracy 80.1 70.8 68.1 66.7 69.6 67.1 63.1 AUC 0.731 0.728 0.671 0.718 0.744 0.732 0.638 F Measure 0.81 0.74 0.73 0.70 0.72 0.68 0.63 3392 Table V P REDICTION R ESULTS FOR F INE -G RAINED M ODELS Model Used Decision Tree Naı̈ve-Bayes k-NN (k = 2) k-NN (k = 5) k-NN (k = 10) SVM LR Accuracy 72.8 68.2 63.8 69.8 70.2 64.4 62.6 NIC class. Whenever no interventions from the Cognitive Therapy class was administered, the probability of the patient not meeting their outcome was 71.5%. Conversely, 55.2% of the patients met their expected outcome whenever there was an intervention present in the patients’ care plan. Once more, the finding was statistically significant (p-value = 0.003). In the instances when patients’ POC included an intervention from Cognitive Therapy class, age was the next key element. Young patients did not fare too well as twothird of the young patients did not meet their outcome. On the other hand, 59.7% of the middle-aged patients had met their expected outcomes. 54.7% of the older patients had also met their outcome. B. Fine-grained Analysis of Plan of Cares (POC) After performing the earlier experiment at the episodelevel, we then proceeded to build models at plan of care (POC) or the nurse shift level. Our main objective for this experiment was to predict whether the patient would meet their expected outcome at the end of each nurse shift within each episode. The predictors for this experiment included NOC rating in the previous shift (not available for first shift), Expected NOC rating set by the nurse in the first shift, age of the patient, experience of the nurse taking care of the patient in the current shift, and different NANDA and NIC domains and classes. We have excluded the LOS feature from the analysis on POC-level data. SVM and LR models work only for binomial target variables, therefore we have used 1-against-all strategy under polynomial by binomial classification for both SVM and LR models. Only accuracy was used as a performance metric for this experiment. Others could not be used because of the polynomial nature of the target variable. The obtained results are mentioned in Table V. The results indicate that decision tree model (Figure 3) has the best accuracy, at 72.8% prediction accuracy, among all the models. K-NN models (k = 5 or 10) and Naı̈ve-Bayes model also have a good accuracy. 2-NN, SVM and LR do not perform as well as compared to all the other models. Using z-test, it was determined that the difference between the accuracy results of decision tree model and the k-NN (k = 10) model is statistically significant (p-value <0.01). Figure 3 depicts the decision tree model for predicting the NOC rating at the end of each shift. The NOC ratings Figure 2. Figure 3. Decision Tree Diagram for Patients suffering from Cancer Decision Tree Model for Cancer Patients shift level records (Predicting NOC Rating in the current shift) 3393 were coded in the experiment as follows: NOC rating of 1 as worst, 2 as bad, 3 as average, 4 as good, and 5 was coded as the best rating. The decision tree shows that the most important attribute to determine the NOC rating in the current shift is the NOC rating of the previous shift. Other important attributes were Patient Education NIC class, Risk Management NIC class, diagnoses from Comfort domain, interventions from the Tissue Perfusion Management class, and age of the patient. VI. C ONCLUSION Big data from EHR systems have provided new opportunities for rapidly discovering evidence that can be used to improve care for the patients. Predictive modeling is one method of examining EHR data that has great promise. It is, however, a difficult task to employ predictive modeling techniques on high-dimensional and sparse datasets like that gathered in EHRs. Traditional techniques of dimension reduction cannot be used for such datasets because these methods do not consider the contextual features, which is taken into account while applying association mining as a dimension-reduction technique. Through association mining, strong correlations are found in the data and thus can be of aid in eliminating the dimensions having little to no impact on the prediction accuracy results. In this study, we used association mining as dimensionreduction step in data transformation stage to determine the relationship, if any, between different nursing and patient features and outcome of cancer patients. The results obtained from association mining were used to extract features needed for the predictive modeling. Models were then built utilizing various existing techniques. Generally high accuracy were achieved for our experiments at the episode-level (coarse-grained experiment) with decision tree giving the best accuracy and a good AUC. Using the decision tree, we also gave a list of features that were vital in predicting whether the outcome was met or not met. Some of the results were statistically significant and with the help of domain experts, we were able to make sure that those results were clinically significant. In the later experiment, we drilled down to shift-level data to make more fine-grained models in a dynamic hierarchical framework. We predicted the outcome rating at the end of each shift, using different features, including the rating from the previous shift, once again getting high accuracy results, especially from decision tree model. Using the prediction results at both the episode-level and POC level, we are providing a trajectory of predicted outcomes. The results achieved from different experiments conducted on big data from a nursing EHR system has demonstrated that these systems can be used as a building step in decision support framework for nursing and all the related fields in healthcare. Using innovative and interesting methods to explore data can help in understanding the full potential of 3394 such systems. The incorporation of EHRs with data mining can balance traditional research methods by filling gaps in knowledge. This can be achieved by suggestions of novel methods for systematic research. The results achieved using predictive models can be integrated into these systems quite easily, whereas previously, it took decades to include research evidence into common practice. Without using the data from EHRs in prediction, EHRs are only useful to monitor the progress of the patient. Different data mining techniques, including association mining and predictive modeling, can aid the hospital management to improve the quality of care provided to the general community [37]. These models can be potentially helpful in minimizing the healthcare costs, along with improving nurse care. Although, limitations in the data exist, standardization of different techniques to help the patients is an important step forward towards personalized care. R EFERENCES [1] M. Hilbert and P. Lø’pez,The worlds technological capacity to store, communicate, and compute information, Science, 332 (2011), pp. 60–65. [2] Y. Chen, D. Hu and G. Zhang, Data Mining and Critical Success Factors in Data Mining, Knowledge Enterprise: Intelligent Strategies in Product Design, Manufacturing, and Management, (2006), pp. 281–287. [3] M. K. Lodhi, R. Ansaari, Y. Yao, G. M. Keenan, D. J. Wilkie, and A. A. Khokhar,Predictive Modeling for Comfortable Death Outcome Using Electronic Health Records, 2015 IEEE International Congress on Big Data, 2015, pp. 409–415. [4] F. Pan, G. Cong, A. Tung, J. Yang, and M. J. Zaki, CARPENTER: Finding Closed Patterns in Long Biological Datasets,Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2003), pp. 637–642. [5] H. Han, J. Liu, D. Shao, and Z. Xin, Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach, Proceedings of the Sixth SIAM International Conference on Data Mining, 124 (2006), pp. 282 [6] I. Jolliffe, Principal component analysis, Wiley Online Library (2002). [7] G. Hughes, On the Mean Accuracy of Statistical Pattern Recognizers, IEEE Transaction of Information Theory, 14(1), January 1968, pp. 55–63. [8] E. Harper and J. Sensmeier, Why is big data important to Nurses?, HIMSS (2015), Retrieved June 10, 2016 from: http://www.himss.org/News/NewsDetail.aspx?ItemNu mber=43374 [9] D. E. Meier, Increased access to palliative care and hospice services: opportunities to improve value in health care,Milbank Quarterly, 89(3), 2011, pp. 343–380. [10] B. Fung, How the U.S. Health-Care System Wastes $750 Billion Annually, Retrieved June 12, 2016 from http://www.theatlantic.com/health/archive/2012/09/how-theus-health-care-system-wastes-750-billion-annually/262106/ [23] J. Li, S. Fong, Y. Zhuang, and R. Khoury, Hierarchical Classification in Text Mining for Sentiment Analysis, IEEE International Conference on Soft Computing and Machine Intelligence (ISCMI), 2014, pp. 46–51. [11] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers, Big data: The next frontier for innovation, competition, and productivity, 2011 [24] G. M. Keenan, E. Yakel, Y. Yao, D. Xu, L. Szalacha, D. Tschannen, Y. Ford, Y. Chen, A. Johnson, K. D. Lopez, and D. J. Wilkie, Maintaining a Consistent Big Picture: Meaningful Use of a Web-based POC EHR System, International journal of nursing knowledge, 23 (2012), pp. 119–113. [12] S. H. Landis, T. Murray, S. Bolden, and P. A. Wingo, Cancer statistics, 1998, CA: a cancer journal for clinicians,48(1), 1998, pp. 6–29. [13] R. Yancik, and M. E. Holmes, NIA/NCI Report of the Cancer Center Workshop (June 13–15, 2001). Exploring the role of cancer centers for integrating aging and cancer research 2002. [14] B. Caballero, L. Karla and R. Akella, Dynamically Modeling Patient’s Health State from Electronic Medical Records: A Time Series Approach, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 69–78. [15] M. H. Gail, L. A. Brinton, D. P. Byar, D. K. Corle, S. B. Green, C. Schairer, and J. J. Mulvihill, Projecting individualized probabilities of developing breast cancer for white females who are being examined annually, Journal of the National Cancer Institute, 81(24), 1989, pp. 1879–1886. [25] North American Nursing Diagnosis Association, NANDA Nursing Diagnoses, North American Nursing Diagnosis Association, (2007). [26] S. Moorhead, M. Johnson, and M. Maas, Iowa Outcomes Project, Nursing Outcomes Classification (NOC), Mosby, (2004). [27] G. M. Bulecheck, H. K. Butcher, and J. M. Dochterman, Nursing interventions classification (NIC), Mosby, (2008). [28] K. W. Gronbach, The age curve: How to profit from the coming demographic storm, AMACOM Div American Mgmt Assn, (2008). [29] US Department of Health and Human Services, Hospital utilization (in non-federal short-stay hospitals), (2012). [16] M. H. Gail, J. P, Costantino, D. Pee, M. Bondy, L. Newman, M. Selvan, G. L. Anderson, K. E. Malone, P. A. Marchbanks, W. McCaskill-Stevens and others. Journal of the National Cancer Institute, 99(23), 2007, pp. 1782–1792. [30] P. Benner, From novice to expert, Menlo Park, (1984). [17] M. Garzotto, G. Hudson, L. Peters, Y. Hsieh, E. Barrera, M. Mori, T. M. Beer, and T. Klein, Predictive modeling for the presence of prostate carcinoma using clinical, laboratory, and ultrasound parameters in patients with prostate specific antigen levels leq 10 ng/mL, Cancer, 98(7), 2003, pp. 1417–1422. [32] D. W. Aha, D. Kibler, and M. K. Albert,Instance-based learning algorithms, Machine Learning, 6 (1991), pp. 37–66. [31] J. R. Quinlan, C4. 5: Programs for Machine Learning, Elsevier, (2014). [33] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), pp. 273–297. [18] R. K. Matsuno, J. P. Costantino, R. G. Ziegler, G. L. Anderson, H. Li, D. Pee, and M. H. Gail, Projecting individualized absolute invasive breast cancer risk in Asian and Pacific Islander American women, Journal of the National Cancer Institute, 2011. [34] M. A. Arbib, The Handbook of Brain Theory and Neural Networks, MIT press, (2003). [19] R. Yancik, M. N. Wesley, L. A. Ries, R. J. Haylik, B. K. Edwards, and J. W. Yates, Effect of age and comorbidity in postmenopausal breast cancer patients aged 55 years and older, Jama, 285(7), 2001, pp. 885–892. [36] F. Yates, Contingency tables involving small numbers and the χ 2 test, Supplement to the Journal of the Royal Statistical Society, 1(2), 1934, pp. 217–235. [20] K. Angelo, A. Dalhaug, A. Pawinski, E. Haukland, and C. Nieder, Survival prediction score: a simple but age-dependent method predicting prognosis in patients undergoing palliative radiotherapy, ISRN oncology, 2014. [21] M. Ghassemi, T. Naumann, F. Doshi-Velez, N. Brimmer, R. Joshi, A. Rumshisky, and P. Szolvits, Unfolding physiological state: mortality modelling in intensive care units, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 75–84. [22] N. S. Murthy, and A. Mathew, Cancer epidemiology, prevention and control, Curr Sci, 86(4), 2004, pp. 518-524. 3395 [35] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, (2005). [37] P. Desikan, R. Khare, J. Srivastasa, R. M. Kaplan, J. Ghosh, and L. Liu, Predictive Modeling in Healthcare: Challenges and Opportunities (2013).
© Copyright 2026 Paperzz