Patient Matching Algorithm Challenge Informational Webinar Caitlin Ryan, PMP| IRIS Health Solutions LLC, Contract Support to ONC Adam Culbertson, M.S., M.S. | HIMSS Innovator in Residence, ONC Agenda • ONC Overview • Background on Matching • About the Challenge » Eligibility Requirements » Registration » Project Submissions » Winners and Prizes • Calculating Metrics • Creating Test Data • Challenge Q&A 2 Office of the National Coordinator for Health IT (ONC) • The Office of the National Coordinator for Health Information Technology (ONC) is at the forefront of the administration’s health IT efforts and is a resource to the entire health system to support the adoption of health information technology and the promotion of nationwide health information exchange to improve health care, • ONC is organizationally located within the Office of the Secretary for the U.S. Department of Health and Human Services (HHS), • ONC is the principal federal entity charged with coordination of nationwide efforts to implement and use the most advanced health information technology and the electronic exchange of health information. ONC Challenges Overview • The statutory authority for Challenges are hosted under Section 105 of the America COMPETES Reauthorization Act of 2010 (Public L. No 111-358). • ONC Tech Lab - Innovation » Spotlight areas of high interest to ONC and HHS » Direct attention to new market opportunities » Continue work with start-up community and administer challenge contests » Increase awareness and uptake of new standards and data 4 ONC Roadmap • Connecting Health and Care for the Nation: A Shared Nationwide Interoperability Roadmap » Released in 2015 » A 10 Year Vision to Achieve An Interoperable Health IT Infrastructure » Section L: Accurate Individual Data Matching, states that patient matching is fundamental requirement in achieving interoperability. 5 Patient Matching Definition Patient matching: Comparing data from multiple sources to identify records that represent the same patient. Also called merge-purge, record linkage and entity resolution in other fields. Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Significant Dates in (Patient) Matching Campbell, K et al A Comparison of Link Plus, The Link King, and a “Basic” Deterministic Algorithm RAND Health Report 1918 Soundex US Patent 1261167 Dunn Fellegi & Sunter Record Linkage A Theory of Record Linkage 1946 1959 Newcombe, Kennedy, & Axford Automatic Linkage of Vital Records 1969 Identity Crisis: An Examination of the Costs and Benefits of a Unique Patient Identifier for the US Health Care System 2002 2008 HIMSS Patient Identify Integrity Toolkit, Patient Key Performance Indicators Winkler Matching and Record Linkage 2009 Grannis, et al Grannis, et al Analysis of Identifier Performance Using a Deterministic Linkage Algorithm Privacy and Security Solutions for Interoperable Health Information Exchange HIMSS Patient Identity Integrity 2011 A Framework for Cross-Organizational Patient Identity Management Kho, Abel N., et al Design and Implementation of a Privacy Preserving Electronic Health Record Linkage Tool 2014 2015 Audacious Inquiry and ONC Patient Identification and Matching Final Report Joffe et al A Benchmark Comparison of Deterministic and Probabilistic Methods for Defining Manual Review Datasets in Duplicate Records Reconciliation Dusetzina, Stacie B., et al Linking Data for Health Services Research: A Framework and Instructional Guide HIMSS hires Innovator In Residence (IIR) focused on Patient Matching Source: Culbertson, A. & Miller, K., Patient Matching EHR Ailments: Going from Placebo to Cure, Tuesday, March 1st HIMSS 2016 Las Vegas, NV The 5 Step Data Match Process Pre-processing: characterizes the data and ensures the elements have the same structure and the content follows the same format Data preprocessing Indexing Comparison: identifying the similarity between two records seeking comparison vectors Comparison Classification Evaluation: comparing match results with the known ground truth or gold standard. Indexing: organizing the data to support better pairing (blocking and the use of a blocking key common) Classification: based on comparison results, records are found to be matches, non-matches or potential matches Evaluation Data Matching; Concepts and techniques for record linkage, entity resolutions, and duplication detection, Peter Christen 2012 Problem • Patient data matching has been noted as one key barriers to achieving interoperability in the Nations Road Map for Health IT • Patient Matching causes issues for over 50% of health information managers1 • Problem will increases as we increase the volume of health data sharing • Data quality issues make matching more complicated • Lack of knowledge of patient matching algorithms performance, adoption of metrics 1) https://ehrintelligence.com/news/patient-matching-issues-hindering-50-of-him-professionals Data Quality • Data Quality is a Key » Garbage in and Garbage out • Data entry errors are compound data matching complexity » Various algorithmic solutions to address these, not perfect • Types of errors: » Missing or Incomplete Values » Inaccurate data » Fat finger errors » Information is out of date » Transposed names » Misspelled names Solution "If you can't measure it, you can't improve it.“ -Peter Drucker 12 ONC’s Patient Matching Algorithm Challenge The goal of this challenge is to: 1. Bring about greater transparency and data on the performance of existing patient matching algorithms, 2. Spur the adoption of performance metrics for patient data matching algorithm vendors, 3. and positively impact other aspects of patient matching such as deduplication and linking to clinical data. Website: www.patientmatchingchallenge.gov 13 Eligibility Requirements There is no age requirement for this challenge. All members of a team must meet the eligibility requirements. • Shall have registered to participate in the Challenge under the challenge requirements by ONC. • Shall have complied with all the stated requirements of the Challenge. • Businesses must be incorporated in and maintained a primary place of business in the United States; Individuals must be a citizen or permanent resident of the United States. • Shall not be an HHS employee. 14 Eligibility Requirements (cont’d) • May not be a federal entity or federal employee acting within the scope of their employment. • Federal grantees may not use federal funds to develop COMPETES Act challenge applications unless consistent with the purpose of their grant award. • Federal contractors may not use federal funds from a contract to develop COMPETES Act challenge applications or to fund efforts in support of a COMPETES Act challenge submission. • Participants must also agree to indemnify the Federal Government against third party claims for damages arising from or related to Challenge activities. 15 Challenge Process • Register your team • Contestants will unlock a test data set provided by ONC to run algorithms • Run Algorithm • Submit results for evaluation which will be scored against an “answer key” • Receive performance scores and appear on a Challenge leaderboard. • Repeat submission until satisfied with the result or have hit 100 submissions or end date has passed. 16 Challenge Process Download Data in CSV File Synthetic Data Set Submit Linked Data Returns a Score Submit Results to Leader Board Scoring Server Scoring Server (Gold Standard)) Registration • Visit the challenge website and fill in all required fields of the registration form • Create a username and password (1 account per team) • Enter a team name which will be used on the leader board » Can be used to keep team identities private • Acknowledge and agree to all terms and rules of the Challenge 18 Challenge Dataset Dataset synthetically generated by Just Associates using a proprietary software algorithm • Based on real-world data in an MPI and actual data discrepancies in each field across the fields • Known potential duplicate pairs mimic real world scenarios • Does not contain PHI • Approximately 1M Patient Records • Available early June, will send out email when data set is made available 19 Challenge Dataset • Fields include: » Enterprise ID, LAST NAME, FIRST NAME, MIDDLE NAME, SUFFIX, DOB, GENDER, SSN, ADDRESS1, ADDRESS2, CITY, STATE, ZIP, PHONE, PHONE2, EMAIL, ALIAS, MOTHERS_MAIDEN_NAME, MRN, SSNs(Most are within the 800 range) » Data Format – CSV – Also available in FHIR bundle 20 Challenge Dataset 1 John Smith 1-1-1990 202-223-9910 Washington, DC 2 Carol Jones 2-4-1973 230-298-0001 Bethesda, MD 3 Bobby Johnson 3-09-1955 340-345-9234 Arlington, VA 4 Johnny Smith 1-1-1990 202-223-9910 Washington, DC Scoring Server 1 John Smith 1-1-1990 202-223-9910 Washington, DC 4 Johnny Smith 10-1-1990 202-223-9910 Washington, DC 2 Carol Jones 2-4-1973 230-298-0001 Bethesda, MD 3 Bobby Johnson 3-09-1955 340-345-9234 Arlington, VA Submission Process • One dataset will be provided to all participants • Participants will submit their matches to the ONC scoring server • The answer key, separate from the dataset provided to participants, will be used to score submissions. • Submission Data format » CSV, – Enterprise ID 1, Enterprise ID 4, 0.90 » Optionally FHIR bundle • Submit » Enterprise ID, Enterprise ID linked to, » Optional submit confidence score for probabilistic algorithms 22 Project Submissions • Teams will submit the results, matched records, of their algorithm tests » The submission period* will be open for 3 months » 100 submissions from each individual/team » Can submit at any time during the “submission period” » Challenge will open on June 12th 12:00 p.m. E.S.T. » Submissions will be allowed until 11:59pm on the last day of the submission period *Submission period dates have not been determined. Once the test data set is available we will add these dates to the challenge website. 23 Project Submissions • Calculation » Precision » Recall • • Tradeoffs between Precision and Recall » F-Score is the harmonic mean between precision and recall Returned Results • Participants will receive a » F-Score » Precision » Recall » Run ID • Month one will include a beta period » New matches found will be manually reviewed to determine match status » Previous submissions will be rescored with updated answer key and leader boards updated » After beta period all future submissions will be scored against updated answer key only 25 Leader Board Example Leader Board Example 27 Winners and Prizes The Total Prize Purse for this challenge is $75,000 Judging will be based upon the empirical evaluation of the performance of the algorithms. Highest F-Score 1st- $25,000 2nd- $20,000 3rd- $15,000 Best in Category: ($5,000 ea. Category) • Precision • Recall • Best first f-score run 28 Best in Category • Best F-Score » 1st Place » 2nd Place » 3rd Place • Best 1st Run F-Score: This prize will be awarded to the contestant/team whose first submission to the scoring server results in the highest f-score. • Precision: Best precision with recall >= 90% • Recall: Best Recall with precision >= 90% 29 Metrics for Algorithm Performance Ideal outcome of any matching exercise is correctly answering this one question hundreds or thousands of times, Are these two things the same thing? » Correctly identifying all the true positives and true negatives while minimizing the number of errors, false positives and false negatives Patient Matching Goal Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Patient Matching Terminology • True Positive- The two records represent the same patient • True Negative- The two records don't represent the same patient Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Patient Matching Terminology • False Negative: The algorithm misses a record that should be matched • False Positive: The algorithm creates a link to two records that don’t actually match Source: Culbertson, A. & Miller, K., Patient Matching EHR Ailments: Going from Placebo to Cure, Tuesday, March 1st HIMSS 2016 Las Vegas, NV Evaluation Good EHR A EHR B Truth (Gold Standard) Algorithm Match Type Jonathan Jonathan Match Match True Positive Jonathan Sally Non-Match Non-Match True Negative Jonathan Sally Non-Match Match False Positive Jonathan Jon Match Non-Match False Negative Bad Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Evaluation EHR A EHR B Truth (Gold Standard) Algorithm Match Type Jonathan Jonathan Match Match True Positive Jonathan Sally Non-Match Non-Match True Negative Jonathan Sally Non-Match Match False Positive Jonathan Jon Match Non-Match False Negative Bad Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Evaluation EHR A EHR B Truth (Gold Standard) Algorithm Match Type Jonathan Jonathan Match Match True Positive Jonathan Sally Non-Match Non-Match True Negative Jonathan Sally Non-Match Match False Positive Jonathan Jon Match Non-Match False Negative Bad Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Precision = True Positives / (True Positives + False Positives) Evaluation Truth Positive Negative Positive True Positive False Positive Negative False Negative True Negative Precision Algorithm Recall Recall = True Positives / (True Positives + False Negatives) Source: Culbertson, A. Patient Matching A-Z, Wednesday, March 2ndst HIMSS 2016, Las Vegas, NV Evaluation • Calculation » Precision » Recall • • Tradeoffs between Precision and Recall » F-Score is the harmonic mean between precision and recall Creating Test Data Sets Development of Test Data Set Patient Database Select Potential Matches (aka Adjudication Pool) Manual Reviewer 1 Manual Reviewer 2 Manual Reviewer 3 Human-Reviewed Match Decisions (Answer Key == Ground Truth Data Set) Compare Algorithm and Test Data Set Source: Culbertson, A. & Miller, K., Patient Matching EHR Ailments: Going from Placebo to Cure, Tuesday, March 1st HIMSS 2016 Las Vegas, NV Development of Ground Truth Sets • Identify data set that reflects real word use case • Develop potential duplicates • Human adjudication review and classification » Match or Non-Match • Estimate truth » Pooled methods using multiple matching methods Issues In Establishing Ground Truth • First step in evaluation is to determine why the evaluation is being conducted • Different truth for different applications » Security Applications vs Patient Health Record • What is the cost of missing a match? Summary for Healthcare Use Case » Security: Lives are lost » Health: Patient safety event, missed medications, allergies, etc… death But…this is situation today. • What is the cost of wrongly identifying a match? » Security : Passenger is inconvenienced / delayed » Health: Patient safety event, wrong medication, treatment, liability, death • Criteria for truth must be carefully established and well-understood » E.g. Question posed to annotators must be carefully phrased Issues In Establishing Ground Truth (cont’d) • Different truth for different applications » Credit check » Security applications » Customer support » De-duplication of mailing lists • What is the cost of missing a match? » New record entered into database » Irritated customer » Lives are lost • Criteria for truth must be carefully established and well-understood by annotators » Question posed to annotators must be carefully phrased Issues In Establishing Ground Truth (cont’d) • How much time / expertise is available to judge (/discount) false positives? • Needs to reflect real word test use case • Evaluation results are only as good as the truth on which they are based » And only as appropriate as the evaluation is to the task that will be performed with the operational system • Absolute recall impossible to measure without completely known test set (i.e. “You don’t know what you’re missing.”) » Estimate with pooled results Examples B Smith Bill Smythe William Smythe W Smith ?? DOB: 10/12/1972 October 11, 1972 December 10, 1972 12/10/72 October 12, 1927 Get Involved • Webinars on how to participate and challenge overview » May 24th • Kicking off Patient Data Matching Algorithm Challenge in June • Participant Discussion Board • Website: www.patientmatchingchallenge.com 46 Acknowledgments Thank you to the following individuals and organizations for their involvement in the planning and development of this challenge : » Debbie Bucci and the ONC Team » HIMSS North America, Tom Leary, HIMSS » Greg Downing, HHS Idea Lab, ONC » Jerry and Beth Just and the Just Associates Team » Keith Miller and Andy Gregorowicz, MITRE » Caitlin Ryan, IRIS Health Solutions » Capital Consulting Corporation Team 47 Additional Questions FOR ADDITIONAL QUESTIONS/INFORMATION CONTACT: Adam Culbertson, [email protected] Debbie Bucci, [email protected] (preferred) Phone: 202-690-0213 48 Thank you for your interest! The ONC Team @ONC_HealthIT @HHSONC
© Copyright 2026 Paperzz