MCORES: a system for noun phrase coreference resolution for clinical records 2012 SHARPn Summit “Secondary Use” Andreea Bodnari,1 Peter Szolovits,1 Ozlem Uzuner2 2Department 1MIT, CSAIL, Cambridge, MA, USA of Information Studies, University at Albany SUNY, Albany, NY, USA 10.16.2012- Rochester, MN ! Medical coreference resolution system (MCORES) ! Experimental results ! Conclusion Page 2 ! Electronic Medical Records (EMRs) – large information repositories ! Clinical information requires processing ¡ Lower level: sentence parsing, tokenization ¡ Higher level: coreference resolution, semantic disambiguation ! Coreference resolution: a fundamental step in text processing Page 3 ! English medical corpus provided by i2b2 National Center for Biomedical Computing ¡ De-identified medical discharge summaries ▪ Source: PH & BIDMC ▪ Content: 230(PH) + 196(BIDMC) discharge summaries ¡ Annotated concepts and coreference chains ! Concept types Persons Problems Treatments Tests Pronouns Page 4 NP Instance Creation Feature Generation Classification Output Clustering Page 5 ! Markables of same semantic category are paired together ! MCORES creates positive instances only from neighboring markable pairs in a chain 1Instance creation akin to McCharty and Lehnert Page 6 Persons Problems Treatments Tests Across all categories ! Exact Textual Overlap Coreferent 3347 984 786 206 5323 Non-Coreferent 100 29 21 7 157 Partial Textual Overlap Coreferent 337 1353 764 239 2693 Non-Coreferent 711 1217 557 317 2802 No Textual Overlap Coreferent 5461 597 329 56 6443 Non-Coreferent 5403 46056 19328 6709 77496 Coreferent 9145 2934 1879 501 14459 Non-coreferent 6214 47302 19906 7033 80455 Total Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap. Page 7 ! Multi-perspective features ¡ Antecedent perspective ¡ Anaphor perspective ¡ Greedy perspective ¡ Stingy perspective ! ! ! ! ! Phrase-level lexical Sentence-level lexical Syntactic Semantic Miscellaneous Page 8 Phrase-level lexical ! ! ! ! Token overlap* Normalized token overlap Edit-distance Normalized edit-distance Sentence-level lexical ! ! ! Sentence-level token overlap* Filtered sentence-level token overlap* Left and right mention overlap ¡ stingy and greedy perspectives only * multi-perspective feature Page 9 Syntactic ! ! ! Number agreement Noun overlap* Surname match Semantic ! ! ! ! UMLS CUI overlap* UMLS CUI token overlap* UMLS semantic type overlap* Anaphor UMLS semantic type * multi-perspective feature Page 10 ! ! ! ! ! ! Token distance Mention distance All-mention distance Sentence distance Section match Section distance Page 11 ! C4.5 decision tree algorithm ¡ Flexible ¡ Readable prediction model ! Classify pairs of markables based on values of the feature vectors Page 12 ! ! Classifier makes pairwise predictions only Pairwise predictions clustered into coference chains ¡ Aggressive-merge1 clustering algorithm prediction [M1] - [M2] all preceding pairwise predictions linked to [M1]or [M2] 1Aggresive-merge algorithm proposed by McCarthy and Lehnert Page 13 ! ! ! Feature set evaluation Perspectives evaluation Performance evaluation against ¡ In house baseline ¡ Third party system (RECONCILEACL09 & BART) ! Evaluation metric: unweighted averages of Recall, Precision, and F-measures of ¡ MUC ¡ B3 ¡ CEAF ¡ BLANC Page 14 Page 15 ! MCORES’ advantage comes from linking markables with no token overlap ! Phrase-level sub-MCORES performs similarly to MCORES ! Greedy perspective system is the most favorable single-perspective system ! Multi-perspective system performs as well or better than single-perspective systems ! Error analysis ¡ MCORES fails to classify misspelled person pairs ¡ Medical problems false positives due to difference between newly and recurring events ¡ Treatments false positives due to medications presenting different routes of administration ¡ Tests false positive due to the large number of full overlap instances that did not corefer Page 16 ! Developed coreference resolution system for the medical domain (MCORES) ! MCORES innovates through a multi-perspective and knowledge-based feature set ! MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records Page 17
© Copyright 2026 Paperzz