338 Original Articles An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling A. Fong1; R. Ratwani1,2 1MedStar Institute for Innovation – National Center for Human Factors in Healthcare, Washington, D.C., USA; University School of Medicine, Washington, D.C., USA 2Georgetown Keywords Patient safety event reports, topic model, latent dirichlet allocation, general event type, natural language processing, unsupervised learning Summary Objective: Patient safety event data repositories have the potential to dramatically improve safety if analyzed and leveraged appropriately. These safety event reports often consist of both structured data, such as general event type categories, and unstructured data, such as free text descriptions of the event. Analyzing these data, particularly the rich free text narratives, can be challenging, especially with tens of thousands of reports. To overcome the resource intensive manual review process of the free text descriptions, we demonstrate the effectiveness of using an unsupervised natural language processing approach. Methods: An unsupervised natural language processing technique, called topic modeling, was applied to a large repository of patient Correspondence to: Allan Fong, MS MedStar Institute for Innovation – National Center for Human Factors in Healthcare 3007 Tilden St. NW, Suite 7M Washington, D.C. 20008 USA E-mail: [email protected] safety event data to identify topics, or themes, from the free text descriptions of the data. Entropy measures were used to evaluate and compare these topics to the general event type categories that were originally assigned by the event reporter. Results: Measures of entropy demonstrated that some topics generated from the unsupervised modeling approach aligned with the clinical general event type categories that were originally selected by the individual entering the report. Importantly, several new latent topics emerged that were not originally identified. The new topics provide additional insights into the patient safety event data that would not otherwise easily be detected. Conclusion: The topic modeling approach provides a method to identify topics or themes that may not be immediately apparent and has the potential to allow for automatic reclassification of events that are ambiguously classified by the event reporter. Methods Inf Med 2015; 54: 338–345 http://dx.doi.org/10.3414/ME15-01-0010 received: January 14, 2015 accepted: February 27, 2015 epub ahead of print: April 2, 2015 1. Introduction The Institute of Medicine and several state legislatures have recommended the use of patient safety event reporting systems (PSRS) to better understand and improve safety hazards [1, 2]. Numerous healthcare providers have adopted these systems which provide a framework for healthcare provider staff, including frontline clinicians, nurses, and technicians to report patient safety events [3]. Reported patient events range from “near misses”, where no patient harm occurs, to serious safety events that result in patient harm. If the reported data can be analyzed effectively, reporting systems have the potential to dramatically improve the safety and quality of care by exposing possible weaknesses in the care process [4]. A patient safety event (PSE) report generally consists of both structured and unstructured data elements [5]. Structured data are pre-defined, fixed fields that solicit specific information about the event. For example, there is often a field for “event type” which provides pre-determined categories of different types of patient safety hazards (e.g. medication, falls, lab/specimen, diagnosis/imaging, safety/security, miscellaneous, etc.). Generally, no definition of the event type category is provided on the reporting portal; consequently, the reporter must select a category based on their own knowledge and intuition. As a result, the miscellaneous category is used frequently as a “catch all” when the reporter is unsure of the specific category for the event being reported. The unstructured data fields generally includes a free text field where the reporter can enter a text description of the event. The text descriptions are often a rich data source in that the reporter is not constrained to limited categories or selection options and is able to freely describe the details of the event. Below is an example of a PSE report narrative that was originally classified by the reporter as a Diagnosis/Imaging event: “... patient had MRI ordered. Pt needed premedication. Pt called for at 1100. At 1130 nurse said she needed to call MD to get meds. At 1145 nurse called to say pt was given meds and would be sent up. At 1225 nurse called to ask why we hadn’t gotten the pt. When asked if she called transport she said no. I told her that transport needs to be © Schattauer 2015 Methods Inf Med 4/2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. 339 A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling called or they won’t come. At 1245 transport called us back to let us know the pt was on the way. Pt was on the MRI table in the scanner. Before we could start she started yelling and tried to get out of the scanner. It was difficult to get her back on the stretcher to continue the scan. We made several attempts to reach her nurse and decided to send her back downstairs without the MRI. Called the charge nurse to have her let her nurse know since we couldn’t reach her. We were later told that the family was very angry saying that the delay caused the medicine to wear off . . .” While this event took place in the context of performing an MRI, it is apparent from the free text description that there are several contributing factors (e.g. medication, communication, etc). Eliciting this kind of detail in a structured data format would require the reporter to spend a tremendous amount of time inputting a report; yet, analyzing this kind of free text narrative is resource intensive. PSRS databases can grow to include thousands of case reports, depending on the size of the healthcare provider, and effectively analyzing the report data to make improvements in safety and quality is a significant challenge [6]. While the structured data elements of the reports are conducive to analysis with descriptive statistics, the unstructured text descriptions are particularly difficult to analyze. Without a detailed review of the text descriptions it is difficult to categorize and quantify the events. Many events span multiple categories (e.g. diagnostic imaging, medication, communication) and unlike the structured data fields that constrain to a single category, the unstructured descriptions contain details that reflect multiple categories. With databases containing tens of thousands of event reports, more efficient algorithmic techniques are needed to effectively analyze these data. In the present study we describe the application of an unsupervised natural language processing (NLP) technique, called topic modeling, to better understand and categorize the free text descriptions of PSE reports. The topics that emerge from the unsupervised NLP approach are compared to the general event categories provided by the reporter to determine how well the topics align and to examine which new topics arise. New topics that emerge from this NLP approach represent latent concepts that have the potential to shed light on patterns in the PSE data that may otherwise be difficult to detect. An algorithmic approach to analyzing the free text narratives of PSE reports has the potential to dramatically reduce the resources required to analyze these data and provide new insights to better identify safety hazards. 2. Background Natural language processing (NLP) is the research and application of computers to study and analyze written or spoken language [7, 8]. Numerous NLP algorithms and models have been developed to analyze and help extract meaning from text in a variety of disciplines and applications, such as understanding social media, political speeches, and physician discharge summaries [9 –11]. With the increased use of electronic health records, there has been a growing corpus of text information in healthcare that is ripe for NLP. Researchers have applied NLP to the analysis of clinical documentation and discharge summaries to improve care and workflow processes [8, 12]. For example, NLP has been used to assess clinical conditions, improve clinical decision support, and identify medications [13–15]. 2.1 Natural Language Processing to Improve Patient Safety NLP techniques can be used to better recognize patient safety hazards and to improve the analysis of patient safety data. Researchers have applied NLP to recognize adverse events from clinical documentation and have also applied NLP to large adverse event reporting repositories to classify events [11, 16, 17]. Our focus is on the application of NLP to PSE report data with a focus on the free text narrative within each report. NLP techniques have been successfully used to categorize and classify PSE reports, for example, to classify serious safety events or health information technology (IT) events [17–20]. This research demonstrated that the NLP approach can be applied successfully and that unique insights can be gained from the analysis of the free text. Previous NLP approaches, however, have relied on supervised modeling which require a “gold standard” or ground truth from which algorithms can learn from and train on in order to perform well. One challenge with supervised learning is that the training sets are not always available and often have to be created using a manual review process that is resource intensive, especially given large datasets with free text as found in PSE reports. Unsupervised NLP techniques, such as topic modeling, offer another method to understand free text without the resource intensive training datasets required of supervised learning. Topic modeling, such as Latent Dirichlet Allocation (LDA), is a Table 1 Distribution of patient safety events General Event Type Category Percent of total reports Medication/Fluid 18% Lab/Specimen 15% Fall 12% Miscellaneous 10% Blood Bank 8% Diagnosis/Treatment 5% Patient ID/Documentation/Consent 5% Surgery/Procedure 4% Skin/Tissue 4% Lines/Tubes/Drain 3% Safety/Security 3% Diagnostic Imaging 3% Professional Conduct 2% Equipment/Medical Device 2% Maternal/Childbirth 1% Airway Management 1% Infection Prevention 1% Facilities 1% Healthcare IT less than 1% Restraints/Seclusion Injury less than 1% Tube/Drain less than 1% Methods Inf Med 4/2015 © Schattauer 2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling the event, the categories available to reporters are shown in the left column of ▶ Table 1. Furthermore, we used entropy measures, which are indicators of information context, to evaluate the report distributions along the LDA topics and the GET categories. Our method for topic modeling and evaluation is summarized in ▶Figure 1. 4.1 Data Source and Preprocessing Figure 1 Outline of topic modeling and evaluation methodology statistical approach to discover or identify “topics” associated with words or phrases [21]. Topic models have several advantages when working with large complex text datasets where the topic categories might not be clear or easy to discern a priori and have been shown to be generally useful in several applications, such as analyzing text based survey responses, foreign media, and FDA drug labels [22–24]. Topic models, however, have not been used to analyze free text in PSE reports. The application of an unsupervised NLP approach to analyzing PSE reports provides several advantages. This approach does not require the resource intensive task of creating a training set, it is less influenced by human annotation biases, and it has the potential to identify latent themes in the data that are not immediately apparent. 3. Objectives We extend previous topic modeling research by utilizing unsupervised LDA to model topics in PSE reports and compare the resulting LDA topics to the general event type (GET) categories originally selected by the event reporters. In this study, we are interested in how reports are distributed across the LDA topics, and in particular, how this distribution relates to the reporter defined GET categories. We hypothesize that while some LDA topics will align well with certain GET categories, there will be several new topics or themes discovered through LDA that provide greater insight into the PSE data. The algorithmic recognition of these “latent” topics may serve to uncover patterns in the data that would otherwise be difficult to detect. 4. Methods LDA was used to discover topics, or themes, in the free text descriptions of PSE reports. We compared the LDA topics to the default general event type (GET) categories available to event reporters. When submitting a report, reporters have to select one category (from the list of 21 GET categories) believed to be most applicable to PSE reports were collected over a 16 month period (January 2013 to April 2014) from a large, multi-hospital, healthcare system in the mid-Atlantic region of the United States. A total of 29,300 reports were collected during this time period from a common patient safety reporting system (PSRS), RL Solutions (www.rlsolutions. com). Each report has an associated GET category that is selected by the reporter of the event. ▶ Table 1 shows the GET categories and the frequency of reports in each GET category. Reports were most commonly categorized by the reporter as “Medication/Fluid”, “Lab/Specimen”, “Fall”, and “Miscellaneous”. Reports categorized under “Tube/Drain” were combined with the “Miscellaneous” report category because there were only two “Tube/Drain” reports; all other GET categories had at least 100 unique reports. We used these 20 GET categories for the remainder of our analysis. The free text narrative in each PSE report was extracted for analysis using the LDA approach. To perform the topic model analysis the data were first preprocessed. Duplicate free text reports and reports that had less than two words were removed from the data. This resulted in 29,131 useful reports. We also removed numbers, punctuations, extra white space, and tokenized the text into unigrams. Unlike newspaper articles or physician discharge summaries, which are typical candidates for NLP, the text in the PSE reports are generally written in a colloquial manner and the text are considerably less structured than other sources. The nature of the PSE reports makes determining certain features, such as abbreviations difficult. For example, the “pt” in the phrase “pt arrived in the morning” could mean “pa- © Schattauer 2015 Methods Inf Med 4/2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. 340 341 A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling Figure 2 Generative model for LDA tient” or “physical therapist”. For the purposes of this study, we did not account for all of the nuanced and context specific differences in abbreviations, synonyms, or misspellings. This will be an important topic to address in future work. egories. At the end of this process, each of the 29,131 reports had two labeled assignments, one from the 33 LDA topics and one from the 20 GET categories. The dispersion of reports across LDA topics and GET categories were evaluated using measures of entropy [26]. Entropy was used as a proxy of the information 4.2 Topic Modeling LDA was used to model topics in the PSE reports [21]. LDA assumes a generative process represented by a standard plate model (▶ Figure 2). In this model, there are M number of reports and N number of words in a report. Each report i has a topic distribution Θi and each latent topic k has a word distribution. Unlike the GET categories, the topics generated through this process are considered latent because they are not directly observable. The LDA approach assumes a Dirichlet prior on the distributions of topics per report α, and also for the distribution of words per topic β. Intuitively, each word w in report i has a probability of being generated by a latent topic j. We used LDA as implemented in the LDA R package [25]. All reports were first assigned to the LDA topic they most align with. Following a method similar to Bisgin et al., topics were ranked by the total number of reports assigned to the topic [24]. To reduce the topic space, we selected the top one-third most populated topics (33 topics) and used these topics to re-categorize all the reports. Thirty-three topics were chosen because it was slightly more but similar to the number of GET cat- Figure 3 LDA topic distribution results after re-categorization Methods Inf Med 4/2015 © Schattauer 2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling topic with low entropy implies the LDA topic can be explained fairly well by a few GET categories. There is little information in the LDA topic that was not already accounted for by the GET category and there is little randomness in the GET category associated with that specific LDA topic. On the other hand, a LDA topic with high entropy suggests that the LDA topic is not clearly accounted for by any of the GET categories. A LDA topic with high entropy implies a latent topic in the PSE reports that spans many GET categories. The high entropy LDA topics are particularly interesting as they highlight important patterns and trends in the reports that would otherwise be hidden amongst the GET categories. Similarly, GET categories with low entropy imply low randomness across the LDA topics. GET categories with high entropy suggest that these GET categories do not have highly structured themes. This approach allows us to both understand the latent topics in the PSE reports as well as how well these topics align with the GET categories. 5. Results Figure 4 Entropy for LDA topics across GET categories content of a topic or category. A topic, or category, with low entropy is considered stable or well understood and characterized. However, a topic with a higher entropy value would be considered more uncertain or unpredictable and suggests that the topic has new additional information. We defined the entropy of the LDA topics and GET categories as: H(xt) is the entropy of the reports across the LDA topic t. It is the probability that a report categorized as LDA topic t, would have a general event type category g, scaled by the informational context of xt,g . The resulting entropy is this product summed across all GET categories. For example, suppose there were only two GET cat- egories (G-a, G-b) and we were evaluating two LDA topics or themes (L-a, L-b). Assume L-a was more highly associated with G-a category (the probability of L-a appearing in G-a was higher than the probability of L-a appearing in G-b) and L-b appeared similarly throughout the G-a and G-b reports. As a result, the entropy of L-a across the GET categories would be much lower than L-b. The former LDA topic more clearly aligned with the GET categories while the latter LDA topic represented a theme that crossed multiple clinical GET categories. Similarly we calculate the entropy of the reports across the GET categories, H(xg), by switching t and g notations. We used this approach to evaluate and compare how the LDA topics and the GET categories categorize the reports. A LDA LDA was used to generate 100 topics and each PSE report was initially assigned to their most probable LDA topic. The most populated 33% of LDA topics were then used to re-categorize all the reports, resulting in the LDA topic distribution as shown in ▶ Figure 3. This process reduced the topic space to the more relevant topics and provided smoothing to the report distribution. 5.1 LDA Entropy We first used entropy to evaluate the GET category information content for each LDA topic. The entropy varied greatly (range = 0.6 to 3.5, m = 2.4, std = 1.2) depending on the LDA topic (▶ Figure 4). The LDA topics with the lowest entropy were clearly associated with specific general event type (GET) categories such as “Medication/ Fluid”, “Fall”, and “Lab/Specimen” (▶ Table 2). The top words for these LDA topics are intuitive and are easy to understand as © Schattauer 2015 Methods Inf Med 4/2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. 342 343 A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling clinically relevant to the assigned GET category. On the other hand, LDA topics with the highest entropy, 26 and 32, were not clearly identifiable with any single GET category (▶ Table 3). The high entropy topic words were more representative of themes or topics relevant to several GET categories and tended to cross the more clinically oriented GET boundaries. For example, words from topic 26 centered around parts of the body, spatial orientation, and position. This body orientation/position “theme” relates to one of many GETs such as “Fall”, “Surgery/Procedure”, and “Skin/ Tissue”. Furthermore, words from topic 32 centered around the topic of breathing which highlights the complexity of respiratory events as not just an “Airway Management” GET category, but also as related to the diagnosis, treatment, safety and security of the patient. 5.2 GET Entropy Lastly we used entropy to evaluate the LDA topic information content for each of the GET categories. Our results (▶ Figure 5) show that “Airway Management”, “Skin/ Tissue”, “Blood Bank”, “Maternal/Childbirth”, and “Fall” categories had the lowest entropy across the LDA topics. “Equipment/Medical Device”, “Infection Prevention”, “Diagnosis/Treatment”, “Miscellaneous”, and “Healthcare IT” had the highest entropy across the LDA topics. While some of these results were expected, such as high entropy for “Miscellaneous” reports, others results were more surprising. For example, we expected “Fall” reports to have much lower entropy because falls are generally well defined. However, upon further inspection we noticed that “Fall” reports were distributed amongst several LDA topics including 2 [i.e., fall, bathroom, head, floor, hit], 1 [i.e., bed, floor, alarm, chair, sit], and 15 [i.e., chair, floor, wheelchair, therapist]. 6. Discusssion An unsupervised topic modeling approach was used to identify latent topics in a large dataset of PSE reports and these topics Table 2 Top topic words and associated GET categories for LDA topics with low entropies LDA Topic ID H(xt) Top words Predominately associated GET categories 8 0.655 mg, dose, po, order, daily Medication/Fluid 2 0.943 fell bathroom, head, fall, floor Fall 4 0.944 lab, specimen, urine, results, receive Lab/Specimen 5 1.084 medication, pharmacy, pyxis, med Medication/Fluid 9 1.131 enter, done, critical, result, mgdl Lab/Specimen Table 3 Top topic words and associated GET categories for LDA topics with large entropies LDA Topic ID H(xt) Top words Predominately associated GET categories 26 3.644 right, left, hand, arm, side Fall, Miscellaneous, Safety/Security, Skin/Tissue, Surgery/Procedure 32 3.586 respirator, oxygen, sat, air, place Airway Management, Diagnosis/Treatment, Miscellaneous, Safety/Security were compared to the general event type (GET) categories as defined by the event reporters. The analyses demonstrated instances where certain LDA topics aligned with the GET categories, while there were several other instances where the LDA topics spanned multiple GET categories and there was no clear alignment. While previous researchers have utilized supervised modeling approaches to better understand PSE reports, supervised modeling requires an established training and test set for effective model development [17, 18, 20]. The advantage of the unsupervised approach, as demonstrated here, is that no training and test set is required. There are, however, certain drawbacks to the unsupervised approach. Convergence time of unsupervised approaches often takes longer and the results may not be as accurate as supervised approaches. Furthermore, the unsupervised topic models do require humans to interpret and evaluate the utility of the outputs. Both supervised and unsupervised approaches have advantages and limitations; utilizing a combination of the approaches will likely yield the most fruitful path for more effectively analyzing PSE data. Natural language processing (NLP) approaches provide several benefits to PSE reporting and, if leveraged appropriately, NLP can both enhance the rate of reporting PSEs and can dramatically improve the analysis of these events. 6.1 Reducing the Burden of Event Reporting One of the primary barriers to event reporting is the time cost of entering a report; for busy provider staff taking several minutes to fill out an event report is difficult to do and, consequently, reports may not be submitted. Traditional event reporting systems require several different structured data fields to be completed, including selection of the GET. By using a NLP approach some of the structured data elements may be reduced in favor of a more natural text based narrative from the reported. This may then reduce the time burden of reporting. Utilizing NLP can provide a balance between structured and free text data elements in the reporting process. Selecting a category for an event is also difficult for provider staff entering event reports. In many circumstances, the event reporter may not know the GET. For example, if a nurse is about to administer a medication that was ordered by the physician through a computerized physician order entry (CPOE) system and the nurse notices that the medication is the wrong dose what type of event should the nurse Methods Inf Med 4/2015 © Schattauer 2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling Figure 5 The LDA topics that do not align with the GET categories provide a novel way to look at the data that may lead to new insights to improve patient safety and better understand underlying causes. For example, LDA topic 26 represents spatial orientation and position, especially as it relates to the patient’s body. This theme spans multiple GET categories including “Surgery/Procedure”, “Skin/Tissue”, and “Fall”. Without this LDA category, one might not readily think about confusion over body orientation as a potential underlying factor that connects seemingly unrelated reports. The LDA topics highlight the commonalities that would be more difficult to discover through the GET categories. This approach also allows for the discovery of topics that may fall outside of traditional clinical categories. Topic clusters centered on human factors concepts like communication and teamwork may arise providing even greater insight into the underlying causes of the PSE reports being entered. Leveraging NLP to reshape the data to make the underlying causes of events more transparent is an area ripe for future research. Entropy for GET categories across LDA topics 6.3 Challenges and Opportunities enter? Is this a “Medication” event or a “Healthcare IT” event, or both? Expecting the nurse to determine the event type under these conditions may be unreasonable. The NLP approach offers an alternative to having to select the event category and may also serve to control bias that may be introduced by the reporter selecting the event category. Further, certain events may very well span multiple categories and should be classified as such for a more rigorous analysis of the data. While reporting systems can allow for the reporter to select multiple categories it is more efficient to utilize an algorithmic approach. 6.2.1 Re-categorizing Reports In the dataset that was analyzed, approximately ten percent of the reports were categorized as miscellaneous. Without an algorithmic approach these events would require manual review in order to re-categorize these events under more meaningful topics. The NLP approach provides a method to more efficiently re-categorize these events. In addition, some event reports may be inappropriately categorized and the NLP approach can be used to identify those reports that do not fit within the category that was specified by the reporter. 6.2 Enhancing the Analysis of Patient Safety Event Reports 6.2.2 Discovery of New Topics There are several ways in which the NLP approach can dramatically improve the analysis of PSE data and reduce the resources required to analyze these data. The high entropy LDA topics represent latent topic areas that are different from the GET categories and in many cases these LDA topics span multiple GET categories. PSE reports are a rich data source that, if utilized appropriately, can be used to identify safety hazards and dramatically improve the delivery of safe care. However, efficiently and effectively analyzing the narratives in PSE reports is going to be critical to achieving the promise that this data source holds. Supervised and unsupervised NLP approaches are showing signs of success, yet there are several challenges to overcome with the PSE narratives such as domain specificity, nuanced language differences, medical jargon, abbreviations, and colloquial acronyms. Similar to other application areas, specific ontologies centered on patient safety may need to be developed to address these challenges. The unsupervised approach presented here contributes to the foundational work to demonstrate the effectiveness of NLP in analyzing PSE data. © Schattauer 2015 Methods Inf Med 4/2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved. 344 345 A. Fong, R. Ratwani: An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling Acknowledgment The work presented here was supported by many Safety and Quality staff and we are grateful to them for their support. References 1. Aspden P, Corrigan JW, Erickson SM. Patient Safety Reporting Systems and Applications. In: Patient Safety: Achieving a new standard of care. Washington, D.C.: : National Academy Press 2004. pp 250 –278. 2. Rosenthal J, Booth M. Maxmizing the Use of State Adverse Event Data to Improve Patient Safety. Portlan, ME: 2005. 3. Clarke JR. How a system for reporting medical errors can and cannot improve patient safety. Am Surg 2006; 72: 1088–1091; discussion 1126 –1148. http://www.ncbi.nlm.nih.gov/pubmed/17120952 4. Pronovost P, Morlock LL, Sexton B. Improving the value of patient safety reporting systems. In: Advances in patient safety: New directions and alternative approaches. Vol 1. Assessment. Rockville, MD: Agency for Healthcare Research and Quality; 2008. 5. White J. Adverse Event Reporting and Learning Systems: A Review of the Relevant Literature. The Canadian Patient Safety Institute; 2007. 6. Longo DR, Hewett JE, Ge B, et al. The long road to patient safety: a status report on patient safety systems. JAMA 2005; 294: 2858–2865. http:// www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd= Retrieve&db=PubMed&dopt=Citation&list_uids= 16352793 7. Spyns P. Natural language processing in medicine: an overview. Methods Inf Med 1996; 35: 285–301. Methods on twitter: 8. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2011; 18: 544–551. doi:10.1136/amiajnl-2011–000464 9. Choudhury M De, Gamon M, Counts S, et al. Predicting Depression via Social Media. ICWSM 2013; 2: 128–137. http://www.aaai.org/ocs/index. php/ICWSM/ICWSM13/paper/viewFile/6124/ 6351 (accessed Sep 11, 2014). 10. Monroe BL, Colaresi MP, Quinn KM. Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict. Polit Anal 2008; 16: 372– 403. doi: 10.1093/pan/ mpn018 11. Melton G, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Informatics Assoc 2005; 12: 448 – 457. 12. Chapman WW, Nadkarni PM, Hirschman L, et al. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 2011; 18: 540 –543. doi: 10.1136/amiajnl-2011-000465. 13. Wagholikar KB, MacLaughlin KL, Henry MR, et al. Clinical decision support with automated text processing for cervical cancer screening. J Am Med Inform Assoc 2012; 19: 833–839. doi: 10.1136/amiajnl-2012-000820. 14. Doan S, Bastarache L, Klimkowski S, et al. Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc 2010; 17: 528–531. doi: 10.1136/jamia.2010.003855. 15. Ware H, Mullett CJ, Jagannathan V. Natural language processing framework to assess clinical conditions. J Am Med Inform Assoc 2009; 16: 585–589. doi: 10.1197/jamia.M3091. 16. Botsis T, Nguyen MD, Woo EJ, et al. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc 2011; 18: 631– 638. doi: 10.1136/amiajnl-2010-000022. 17. Chai KEK, Anthony S, Coiera E, et al. Using statistical text classification to identify health information technology incidents. J Am Med Informatics Assoc 2013; 20: 1– 6. doi: 10.1136/amiajnl-2012001409. 18. Ong M-S, Magrabi F, Coiera E. Automated identification of extreme-risk events in clinical incident reports. J Am Med Informatics Assoc 2012; 19: e110 –118. doi:10.1136/amiajnl-2011-000562. 19. Magrabi F, Ong M-S, Runciman W, et al. Using FDA reports to inform a classification for health information technology safety problems. J Am Med Informatics Assoc 2012; 19: 45–53. doi: 10.1136/amiajnl-2011-000369. 20. Ong M-S, Magrabi F, Coiera E. Automated categorisation of clinical incident reports using statistical text classification. Qual Saf Health Care 2010; 19: e55. doi: 10.1136/qshc.2009.036657. 21. Blei D, Ng A, Jordan M. Latent dirichlet allocation. J Mach Learn Res 2003; 3: 993–1022. http://dl.acm.org/citation.cfm?id=944937 (accessed Sep 11, 2014). 22. Roberts ME, Stewart BM, Tingley D, et al. Structural Topic Models for Open-Ended Survey Responses. Am J Pol Sci 2014; 58 (4): 1064 –1082. doi: 10.1111/ajps.12103 23. Roberts M, Stewart B, Tingley D, et al. The structural topic model and applied social science. 2013. http://mimno.infosci.cornell.edu/nips2013ws/ slides/stm.pdf (accessed Sep 11, 2014). 24. Bisgin H, Liu Z, Fang H, et al. Mining FDA drug labels using an unsupervised learning technique – topic modeling. BMC Bioinformatics 2011; 12 (Suppl 1): S11. doi: 10.1186/1471-2105-12-S10S11. 25. Chang J. Collapsed Gibbs sampling methods for topic models 1.3.2 Retrieved fromhttp://cran.rproject.org/web/packages/lda/index.html. 2014. 26. Shannon C. A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 2001; 5: 3–55. http://dl.acm.org/citation.cfm? id=584093 (accessed Sep 11, 2014). https://twitter.com/MethodsInfMed Methods Inf Med 4/2015 © Schattauer 2015 Downloaded from www.methods-online.com on 2017-07-29 | IP: 88.99.165.207 For personal or educational use only. No other uses without permission. All rights reserved.
© Copyright 2026 Paperzz