Cracking the Code: Residents’ interpretations of written assessment comments Shiphra Ginsburg MD, MEd, PhD Cees van der Vleuten, PhD Kevin Eva, PhD Lorelei Lingard, PhD Written comments in assessment Can be rich source of information (vs scores) Can be vague and frustrating to interpret Can be reliably used to discriminate Faculty interpret by (effectively) reading between the lines What if residents don’t interpret the same way? Ask me for refs if interested Research Questions Can residents rank-order PGY1’s based on comments alone as reliably as faculty? How do residents make sense of what’s written? What do residents see as purpose(s) of ITERs? Methods Same as last presentation ;-) 12 PGY2’s in IM Rank-order sets of 16 PGY1’s ITER comments from best-worst Analyzed using G-theory Interviewed by RA Decision making, interpretation Analyzed using constructivist grounded theory Reliability PGY1 Resident PGY2 Judge nested within PGY1 Resident Estimated Variance Component 0.054 0.042 Percentage of Total Variance 56.4 43.6 Source of Variance Reliability for a single judge 0.56 Reliability based on average of four judges 0.84 Correlation between residents and faculty r=.90 (p<0.0001) Ginsburg S, Eva KW, Regehr G. Do in-training evaluation reports deserve their bad reputations? Acad Med. 2013;88(10):1539-1544. Reading between the lines Residents did not take language at face value Tried to construct meaning from language cues “If they used ‘excellent’ I felt like it was very obvious what they were trying to say but there was a little bit of reading between the lines if they said ‘good’ vs. ‘very good’ …” I know what they mean but I don’t know if they’re not saying things … if they’re avoiding saying things ... because they don’t want to hurt the resident’s feelings” Vague, generic language “Hardworking and enthusiastic”, “good knowledge base”, they say that about everyone; it doesn’t really say anything. Another common thing people would say? “Above expectations for this level of training”. I would say three quarters of the pile was above expectations and so it makes me wonder what is ”expectations”? Are the expectations too low because everyone is exceeding them? “It’s hard to write about weaknesses or stuff people can work on … it instills “Faculty may err on the side conflict” of making residents feel good when they should be commenting on how to make Time them better doctors” Pretty solid residents but nothing really stood out … Discomfort [revert to] pretty generic statements” “Person filling out the ITER Memorability may not be the one(s) you worked with the most” Why does vague language occur? Permanency of written records “There is a culture that … that’s just not what’s usually put on ITERs…” “If they’re a staff that really cares about your career, I don’t think they would include it there [in writing] but rather tell you verbally” “Some staff like to give everyone 3s, some staff like to give everyone 5s … “Some people are just more the residents themselves have been descriptive” cautioned in interpreting these, so, not to get too bummed out if you get all 3s Value of face to face discussions from a specific staff when you’re normally getting 4s and 5s, um, because it’s—Iover think itscores is very staff dependent” Relative value of comments ITERs in general? Staff-dependent variability “luck of the draw” Discussion/Implications Residents’ interpretations of ITER comments is very similar to attendings’ Replicability across multiple groups suggests there is meaning and value in ITER comments Unproblematic acceptance of staff variability + implications for education Thank you Cees van der Vleuten Kevin Eva Lorelei Lingard NBME Stemmler Fund MCC Research grant Glenn Regehr Department of Medicine, Mount Sinai Hospital and U of T Lisa St. Amant Meghan Lynch Wilson Centre
© Copyright 2026 Paperzz