Relationship Between Dean`s Letter

Relationship Between Dean’s Letter Rankings and Later Evaluations
by Residency Program Directors
Stephen J. Lurie
Department of Family Medicine
University of Rochester School of Medicine and Dentistry
Rochester, New York, USA
David R. Lambert
Department of Medicine
University of Rochester School of Medicine and Dentistry
Rochester, New York, USA
Tana A. Grady-Weliky
Department of Psychiatry
University of Rochester School of Medicine and Dentistry
Rochester, New York, USA
Background: It is not known how well dean’s letter rankings predict later performance in residency.
Purpose: To assess the accuracy of dean’s letter rankings to predict clinical performance in internship.
Method: Participants were medical students who graduated from the University
of Rochester School of Medicine and Dentistry in the classes of 2003 and 2004.
In their Dean’s Letter, each student was ranked as either “Outstanding” (upper
quartile), “Excellent” (second quartile), “Very good” (lower 2 quartiles), or “Good”
(lowest few percentile). We compared these dean’s letter rankings against results of
questionnaires sent to program directors 9 months after graduation.
Results: Response rate to the questionnaire was 58.9% (109 of 185 eligible graduates). There were no differences in response rate across the four dean’s letter ranking
categories. Program directors rated students in the top two categories of dean’s letter
rankings significantly higher than those in the very good group. Students in all three
groups were rated significantly higher than those in the good group, F (3, 105) =
13.37, p < .001. Students in the very good group were most variable in their ratings
by program directors, with many receiving similarly high ratings as students in the
upper 2 groups. There were no differences by gender or specialty.
Conclusion: Dean’s letter rankings are a significant predictor of later performance
in internship among graduates of our medical school. Students in the bottom half of
the class are most likely either to underperform or overperform in internship.
C 2007 Lawrence Erlbaum Associates, Inc.
Copyright Teaching and Learning in Medicine, 19(3), 251–256
than of recommendation.8 To further emphasize this
point, the AAMC has recently recommended changing
the name of the “dean’s letter” to the “Medical Student Performance Evaluation,” which should contain
comparative information about the student performed
relative to their classmates.8
In practice, program directors have been found
to prefer a ranking system that goes beyond simple
The dean’s letter provides a summary of students’ performance in medical school and remains a key piece
of information that program directors use to assess
candidates for residency. Partially in response to historical findings that such letters may be variable in their
quality1–6 and accuracy,7 the American Association of
Medical Colleges (AAMC) has stressed that the dean’s
letter should be a letter of accurate evaluation rather
This article was previously presented at Northeast Group on Educational Affairs Spring Meeting March 3, 2006, and 12th Annual Ottawa
International Conference on Medical Education, May 21, 2006.
Correspondence may be sent to Stephen J. Lurie, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue, Box
601, Rochester, NY 14642, USA. E-mail: Stephen [email protected]
251
LURIE, LAMBERT, & GRADY-WELIKY
pass/fail descriptions.9 It also appears that that program directors and dean’s letter writers can agree
about relative ranking of students based on accurate
information.10 Similarly, a study from 1991 found that
undergraduate grades were highly predictive of program directors’ rankings.11
There have been no recent studies, however, of how
well dean’s letter rankings predict performance in residency. It is also unclear whether program directors are
able to interpret evaluative terms such as superior, outstanding, excellent, and very good without an explicit
key that describes the relationship between such terms
and a quantitative assessment of students’ ranking relative to their classmates.4 Similarly, it is not known
how often program directors may disagree with these
rankings, and if so, whether they tend to overestimate
or underestimate student’s anticipated strengths and/or
weaknesses in internship.
We studied these questions in two recent graduating classes from the University of Rochester School of
Medicine and Dentistry, from whom we obtained survey data from program directors regarding graduates
performance during internship. We assessed agreement
between the dean’s letter ranking and program directors’ global assessments. We also examined for any
systematic differences by specialty. Finally, we examined dean’s letter rankings of students for whom the
program directors felt that the dean’s letter had either
underestimated or overestimated their abilities.
Method
Participants
Participants were 184 medical students who graduated from the University of Rochester School of
Medicine and Dentistry in the classes of 2003 and 2004.
Measures
The final sentence of the dean’s letter described
each student as either an “outstanding” (upper quartile), “excellent” (second quartile), “very good” (lower
two quartiles), or “good” (a small number of students
in the bottom of the lowest quartile) candidate for residency training. Students were assigned to these descriptors based on their grades during the mandatory
longitudinal ambulatory clerkship (completed during
Years 1 and 2) and the six mandatory 3rd-year clinical clerkships (Internal Medicine, Neurology, Obstetrics/Gynecology, Pediatrics, Psychiatry, and Surgery).
Grades were then weighted by the number of weeks
of the clerkship and the grade distribution of the entire
class in the clerkship. For example, a grade of Honors
in a long clerkship that does not give many Honors
grades carries more weight than a similar grade in a
shorter clerkship that gives many Honors grades. Our
252
dean’s letter also provides a guide to interpreting these
rankings, with approximately 20% in the outstanding
group, 25% in the excellent group, and 50 to 55% in the
very good group. Less than 5% are in the good group.
Within 1 year of graduation we sent a 15-item survey to internship program directors, asking them to rate
the graduates on a number of general clinical, interpersonal, and professional skills (see Figure 1). Items 1
through 12 were scaled on a 4-point scale, whereas
Item 13 was scaled on a 7-point scale. Program directors were also asked (Items 14 and 15) whether the
dean’s letter had overstated or understated the graduates’ strengths or weaknesses.
Statistical Analysis
We first used factor analysis to determine the best
method of grouping the survey items for further analysis. We then used analysis of variance to compare differences in program directors’ assessments across the
four dean’s letter ranking groups. An analysis of variance (ANOVA) was also used to compare dean’s letter
ranking groups across specialty. We used the Duncan
multiple range test to assess for post hoc differences
among ranking groups for which there was a significant overall F value. Because of the relatively small
number of students for whom program directors felt the
dean’s letter had under- or overestimated the students’
abilities, we used simple descriptive statistics to assess
relationships between dean’s letter ranking groups and
program directors’ assessments. Chi-square tests were
used to assess for relationships between categorical
variables. All analyses were performed with SAS version 9.1 (SPSS, Cary, NC). Our study was determined
to be exempt by institutional review by our Institutional
Review Board.
Results
Postgraduation Questionnaire
We received 109 responses from program directors
for the 185 graduates (response rate = 58.9%). There
was no significant difference in response rate by the
four dean’s letter ranking groups, χ 2 = 1.79, df =
3, n = 185, p = .69. Factor analysis of the first 13
items initially appeared consistent with a two-factor
solution—the first eigenvalue was 8.9, the second was
1.06, and the third was 0.57. The first two factors collectively accounted for 77% of the covariance between
responses. The results of varimax rotation for the
two-factor solution are displayed in Table 1. Although
the item about “interpersonal skills” appears to reside
on a separate factor, we found that this item had a
very high correlation (0.82) with the mean of the other
items and thus appears to partake of significant shared
LETTER RANKINGS AND DIRECTOR EVALUATIONS
Figure 1. Postgraduation survey.
variance. Furthermore, even if this item were analyzed
separately, as a single item we anticipate that it
would have relatively low reliability. A single 13-item
scale was highly reliable, with a Cronbach’s alpha of
0.94. Application of the Spearman–Brown prophecy
formula reveals that a shorter questionnaire with only
4 items would have a reliability of 0.8. Nonetheless,
for the purpose of further analysis, we combined all
13 items into a single overall quality score.
Relationship of Dean’s Letter Ranking Groups
to Program Directors’ Assessment. An ANOVA
revealed significant differences in program directors’
assessments across the four dean’s letter ranking
groups, F (3, 105) = 13.37, p < .001. Comparisons
between groups revealed that the means of students in
the outstanding and excellent groups were statistically
similar but that both were significantly higher then the
mean of the very good group. The mean of students in
the good group was significantly lower than that of the
other three groups. As shown in Figure 2, there was
also a marked trend for increasing variability across
the ranking groups; students in the outstanding group
were less variable in their rankings than students in the
excellent group, who in turn were less variable than
those in the very good group. As a secondary analysis, we also computed factor scores for the three items
that loaded on the second factor in our analysis, which
appears to reflect interpersonal abilities. These results
were virtually identical, F (3, 99) = 8,49, p < .001,
253
LURIE, LAMBERT, & GRADY-WELIKY
Table 1. Two-Factor Solution to Postgraduation
Questionnaire Items
Overall impression
Clinical reasoning and patient management
Lifelong learning skills
Suitability for career in clinical practice
Suitability for career in academic medicine
Bedside skills
Leadership skills
General fund of knowledge
Teaching skills
Which description best fits this resident?
Personal qualities
Attentiveness to psychosocial issues
Interpersonal skills
Factor 1
Factor 2
0.92
0.89
0.87
0.87
0.87
0.86
0.85
0.85
0.82
0.81
0.72
0.71
0.04
0.18
0.05
0.13
0.18
0.09
0.23
0.06
0.02
0.11
0.16
0.38
0.44
0.96
with the mean of the good group significantly lower
than that of the other three groups.
Disagreements Between Dean’s Letter Ranking
Group and Program Directors’ Assessments. We
asked program directors whether the dean’s letter had
either overestimated or underestimated the student’s
strengths or weaknesses. Among the 104 students
for whom we received responses to the item about
strengths, there were 19 for whom the program directors felt the dean’s letter to have been inaccurate. Of
these 19 students, 14 had been described as very good
in the dean’s letter. Within this group, seven dean’s
letters were said to have overestimated the students’
strengths, while seven were said to have overestimated
them. Of the remaining 5 students, 1 was in the outstanding group and 5 were in the excellent group. For
3 of these 5 students (including the outstanding group
student), the dean’s letter was said to have underestimated their strengths.
There was a similar pattern among the 97 students
for whom we received responses to the item about
weaknesses. Of the 10 students whose weaknesses
were felt to have inaccurately reported, 9 were in the
very good group; for 3 the dean’s letter was felt to have
overestimated their weaknesses, whereas for the other
6 the dean’s letter was reported to have underestimated
weaknesses.
We then focused on this very good group to further
explore the relationship between program directors’
overall rankings and their perceptions of the accuracy
of the dean’s letter among these students. Program directors assigned significantly lower overall scores to
students for whom the dean’s letter was perceived to
have overestimated their strengths than they did to students for whom the dean’s letter was either accurate
or had underestimated their strengths (group means =
1.9 vs. 3.3 vs. 3.6 respectively), F (2, 48) = 18.52, p <
.001. Findings were similar for weaknesses; program
directors assigned significantly lower overall scores to
students for whom the dean’s letter was perceived to
have underestimated their weaknesses than they did to
students for whom the dean’s letter was either accurate
or had underestimated their strengths (group means =
2.17 vs. 3.2 vs. 3.8, respectively), F (2, 44) = 7.55, p=
.002. For 4 of the 7 students for whom the dean’s letter
was reported to have understated their weaknesses, it
was also reported to have overstated their strengths.
Relationship Between Dean’s Letter Ranking
Groups, Program Director Assessments, Specialty,
and Gender. We subdivided graduates’ residency
fields into the seven mutually exclusive categories
of Family Medicine, Internal Medicine, Obstetrics/Gynecology Pediatrics, Psychiatry, Surgery, and
Other. We found no significant differences between
groups in overall program directors’ rankings, F (6,
Figure 2. Relationship between rankings by internship directors and dean’s letter groupings.
254
LETTER RANKINGS AND DIRECTOR EVALUATIONS
99) = 0.99, p = .45.When comparing the seven possible residency choices against the four dean’s letter
ranking categories, however, the number of observations in most of the 28 resulting cells is too small to
draw reliable conclusions. Nonetheless, there did not
appear to be any obvious trends in these results (not
shown).
There were no differences between men and women
in program directors’ rankings, F (1, 108) = 0.26, p =
.62. Similarly, there were no significant differences
in proportions of men and women in the four dean’s
letter ranking categories, χ 2 = 3.24, p = .35, df = 3,
n = 185.
Discussion
We find that dean’s letter rating groups at our institution are significant predictors of program directors’
evaluations of performance during internship. Survey
responses were received from more than 50 different
program directors, thus supporting the validity of the
dean’s letter ranking group to predict performance in a
range of clinical settings and specialties. We also found
that overall program directors’ ratings did not differ by
specialty, which would argue against any possible bias
resulting from students in different Medical Student
Performance Evaluation (MSPE) categories differentially entering various specialties.
This general conclusion must be tempered, however, with the finding that many students in the excellent and very good groups received similarly high
ratings from their program directors as did students in
the outstanding group. As a group, students in these
lower tiers of dean’s letter rankings are more variable
in their programs directors’ ratings than are those in
the upper tier. It appears that some of the students in
these lower dean’s letter ranking groups are capable of
displaying the same level of clinical skills as those in
the outstanding group but may need either more time
or a change in environment to do so. Other students in
these lower two groups, however, continue to receive
less favorable evaluations during internship, thus continuing the pattern they displayed in medical school.
By contrast, it was rare for a student in who received
a rank of outstanding to receive low scores from their
program directors.
Thus, although a rank of outstanding seems to guarantee that students will perform highly during internship, prediction becomes somewhat less certain for
those in the excellent and very good groups. These
conclusions are supported by the fact that dean’s letters for students in the very good group were most
likely to be perceived by program directors as having misrepresented the students’ strengths or weaknesses. By contrast, the few students in the lowest, or
good group receive consistently low evaluations from
their program directors, thus supporting the validity
of this ranking. In their survey internal medicine residency program directors about problem residents, Yao
and Wright13 found that the most commonly reported
difficulties were insufficient medical knowledge, poor
clinical judgment, and inefficient use of time. Presumably such difficulties would also be identifiable during
medical school clerkships and reflected both in MSPE
ratings and program directors’ evaluations. Incidentally, although many items on our program director
questionnaire reflect the competencies described by
the Accreditation Council for Graduate Medical Education (ACGME),14 program directors in our study
appeared to view clinical competence along a single
dimension, which was only marginally differentiated
from interpersonal skills. This finding is similar to the
results reported by Silber et al.,15 who found that responses to a global rating form did not differentiate the
6 ACGME competencies but rather separated into the
two factors of medical knowledge and interpersonal
skills. Although it may in principle be possible to predict interns’ levels of skills in the individual ACGME
competencies, this will require development of more
sophisticated assessment tools.
Our study has several limitations. First, it was conducted at a single medical school, and thus the results may not necessarily reflect those at other institutions. Nonetheless, we point out that our dean’s
letter ranking groups were base on clerkship grades
compiled over a range of settings, venues, and evaluators with a standardized formula and thus should
be generalizable to other schools with similar clerkships and grading systems. We also point out that
questionnaires were received by clerkship directors in
more than 50 different programs and thus reflect a
national sample of these respondents. Second, our response rate was somewhat low at 59%. There were no
systematic differences in response rates for students
in the different dean’s letter ranking categories, however, thus suggesting that responses were not biased
by how well the students had performed in medical
school.
Third, it is possible that program directors were biased by their previous knowledge of the contents of
the dean’s letter. For several reasons, however, we do
not believe that their responses were significantly influenced by this information. First, the instructions made
no mention of the dean’s letter. Rather, we stated that
we were interested in the performance of our graduates during internship. Second, the questions about the
dean’s letter came only at the end of the questionnaire.
Thus, we doubt that program directors were cued to
think about the dean’s letter as they were completing the first part of the questionnaire. Indeed, a few
program directors later complained that they had to
stop and look for our dean’s letter in their files before
completing those items. Finally, our questions asked
255
LURIE, LAMBERT, & GRADY-WELIKY
specifically about how the graduates had performed in
the course of their internship. We strongly suspect that
program directors responded on the basis of nearly a
year of lived experience with these individuals, rather
than on the basis of a letter that they may not have read
for nearly a year.
In summary, we find that dean’s letter ranking
groups are a significant predictor of programs directors’ evaluations during internship across a range of
training programs. Students in the lower half of dean’s
letter ranking are most variable in their performance
during internship, with many receiving very high evaluations from program directors. A few students in the
lower half of the dean’s letter rankings were also perceived as having been overrated in the dean’s letter.
Thus, this appears to represent a heterogenous group
for whom prediction of later performance is less certain. Further study will be needed to develop more precise predictors of later performance among this group.
References
1. Hunt DD, MacLaren C, Scott C, Marshall SG, Braddock CH,
Sarfaty S. Follow-up study of the characteristics of dean’s letters. Academic Medicine 2001;76:727–33.
2. Leiden LI, Miller GD. National survey of writers of dean’s
letters for residency applications. Journal of Medical Education
1986;61:943–53.
3. Hunt DD, MacLaren CF, Scott CS, Chu J, Leiden LI. Characteristics of dean’s letters in 1981 and 1992. Academic Medicine
1993;68:905–11.
4. Ozuah PO. Variability in deans’ letters. Journal of the American
Medical Association. 2002;288:1061.
256
5. Toewe II CH, Golay DR. Use of class ranking in deans’ letters.
Academic Medicine 1989;64:690–1.
6. Yager J, Strauss GD, Tardiff K. The quality of deans’ letters from medical schools. Journal of Medical Education
1984;59:471–8.
7. Edmond M, Roberson M, Hasan N. The dishonest dean’s letter:
An analysis of 532 dean’s letters from 99 U.S. medical schools.
Academic Medicine 1999;74:1033–5.
8. American Association of Medical Colleges. A guide to the
preparation of the Medical Student Performance Evaluation. Available at: www.aamc.org/members/gsa/mspeguide.pdf.
Accessed May 21, 2007.
9. Provan JL, Cuttress L. Preferences of program directors for
evaluation of candidates for postgraduate training. Canadian
Medical Association Journal 1995;153:919–23.
10. Hunt DD, MacLaren CF, Carline J. Comparing assessments
of medical students’ potentials as residents made by the residency directors and deans at two schools. Academic Medicine
1991;66:340–4.
11. Blacklow RS, Goepp CE, Hojat M. Class ranking models for
deans’ letters and their psychometric evaluation. Academic
Medicine 1991;66(Suppl):S10–2.
12. Lurie SJ, Nofziger A, Meldrum S, Mooney C, Epstein RE.
Temporal and group-related trends in peer assessment amongst
medical students. Medical Education, in press.
13. Yao DC, Wright SM. National survey of internal medicine residency program directors regarding problem residents. Journal
of the American Medical Association 2000;284:1099–104.
14. Accreditation Council for Graduate Medical Education. Accreditation Council for Graduate Medical Education Outcome
Project. Available at: http://www.acgme.org/outcome/comp/
compMin.asp. Accessed March 9, 2006.
15. Silber CG, Nasca TJ, Paskin DL, Eiger G, Robeson M, Veloski
JJ. Do global rating forms enable program directors to assess the
ACGME competencies? Academic Medicine 2004;79:549–56.
Final revision received on November 13, 2006