Important Elements of Rater Training:

Improving the Accuracy of Raters :Direct Observation Workshop
Approaches to Rater Training:
A. Rater Error Training (RET)
The major goal of this type of training is to improve accuracy through by decreasing
common “rater errors”, or rater biases. There are several types of rater biases:
1. Leniency error: The simple tendency to give all residents good ratings. This is most
common in most training programs.
2. Severity error: The tendency to give all residents low or poor ratings. When is the last
time you saw this ?
3. Halo error: The failure to discriminate the performance of a resident across the different
dimensions of competency. For example, a resident is perceived to have outstanding
knowledge and presentation skills that leads the attending to give high marks in all
dimensions of competence. This is a very common error with rating scales (evaluation
forms) – residents tend to get high marks in history-taking and physical examination
skills even though a) the attending rarely, if ever, actually observe the resident
performing these skills, and b) we know from multiple studies that we stink at historytaking and physical examination skills.
4. First impression error: providing a rating based on first impression and failing to
account for subsequent performance.
5. Friendship bias: tendency to give good ratings because of friendship ties.
RET usually involves exercises designed to get raters to provide greater variability in
their ratings. Participants are given the definitions of the common rater errors and then
given examples of actual ratings demonstrating the types of rating errors. Discussion and
feedback are usually included as part of the training. You could potentially use examples
of ratings from your own program to provide examples of the common rating errors.
As noted in your annotated bibliography, RET appears to be modestly effective in
reducing halo and leniency error, but when used alone may actually decrease rater
accuracy!!
B. Performance Dimension Training (PDT)
Simply stated, this type of training is designed to teach and familiarize the raters, in this case
your faculty, with the appropriate performance dimensions used in your evaluation system.
Although PDT alone probably does not improve rater accuracy, it is a critical structural
element for all rater training programs. Definitions for each dimension of performance, or
competency, should be reviewed with all evaluators.
The dimensions of performance in residency training consist of the competencies required for
the six new general competencies promoted by the Accreditation Council of Graduate
Medical Education (ACGME). In addition to defining the competencies for your faculty,
faculty should also be given the opportunity to interact with the definitions to improve their
understanding of these definitions. This can be accomplished with review of actual resident
performance, “paper cases” or video-tape.
C. Frame of Reference Training (FOR)
This type of training specifically targets accuracy in rating. The steps for FOR in a residency
training program would be:
a) Participants are given descriptions of each dimension of competence and are then instructed
to discuss what they believe the qualifications are needed for each dimension.
b) Participants are given clinical vignettes describing critical incidents of performance from
unsatisfactory to average to outstanding. (Frame of reference).
c) Participants use the vignettes to provide ratings on a behaviorally anchored rating scale.
d) The session trainer then provides feedback on what the “true” ratings should be along with an
explanation for the rating.
e) The training session wraps up with an important discussion on the discrepancies between the
participants’ ratings and the “true” ratings.
The most difficult aspect of FOR is setting the actual performance standards. As you can see,
FOR is really an extension of PDT. FOR involves establishing appropriate ratings for various
levels of performance. Hauenstein (Performance Appraisal, 1998) makes one additional
important point: with regards to the actual target scores, the goal should be to “produce
reasonable target scores without being overly concerned that the target scores represent truth
in the abstract sense.” In training programs, we should be able to define definitions and target
ratings for the basic dimensions of clinical competence.
D. Behavioral Observation Training (BOT)
Observation skills are critical to effective and accurate ratings. While RET and FOR training
is more focused on the judgmental processes involved in ratings, BOT is focused on
improving the detection, perception, and recall of actual performance.
There are 2 main strategies to improve observation. The first is simply to increase the number
of observations, or increased sampling of actual performance. This helps to improve recall of
performance, and also in essence provides multiple opportunities for skill practice in
observation by the rater. The second strategy is to provide some form of observational aide
that raters can then use to record observations. Some call these aides “behavioral diaries.” In
a sense, the mini-CEX form is an immediate “behavioral diary” to record a rating of an
observation.
We believe there is an additional component of BOT in residency training. Observation of
clinical skills require that the attending appropriately “prepare” for the observation, position
him/herself correctly to observe a particular skill, minimize interaction between him/herself
and the resident and patient, and avoid distractions. Preparation means determining what it is
you wish to accomplish during the actual observation. For example, you plan to perform a
mini-CEX of physical examination skills of an intern caring for a newly diagnosed
hypertensive patient. What are he appropriate components of a physical exam for a
hypertensive patient? How do I need to position myself in order to ensure proper technique is
used by the intern? How and when will you confirm (if you deem necessary) physical
findings? This preparation helps to maximize the value of the observation and reinforces the
need of the attending to consider the appropriate definitions of the performance dimension of
interest.
Simple rules for Observation:
1. Correct positioning. As the rater, try to avoid being in the line of sight of either the patient or
resident. Use principle of triangulation:
Desk
R
R = resident
P = patient
A = attending
P
A
2. Avoid being intrusive. Don’t interject or interrupt if at all possible. Once you interject
yourself into the resident-patient interaction, the visit is permanently altered. However, there
will be many times at some point in the visit where you need to interject yourself in order to
correct misinformation, etc. from the resident.
3. Minimize interruptions. Let your staff know you will be with the resident for 5-10 minutes,
avoid taking routine calls, etc.
4. Be prepared. Know before you enter the room what your goals are for the observation
session. For example, if a physical exam, have the resident present the history first; then you
will know what the key elements of the PE should be.
Performance Dimension Training Exercise
The purpose of this exercise is for your group to develop the definitions for a dimension
of clinical competency. The dimension we will focus on today is counseling. Counseling is an
important component of the new ACGME general competency of Patient Care. The ACGME
will be looking for evidence that training programs have developed appropriate methods to
measure the success of the curriculum and the competency of individual residents in the general
category of Patient Care.
Counseling situation: A resident needs to counsel a patient about starting a new medication at the
end of a clinic visit. What criteria will you use to judge the counseling performance of this
resident? In other words, define the essential components the resident should specifically include
in the counseling session with the patient starting a new medication.
With your group:
Define the components of an effective counseling session.
Knowledge
E.g. What should the patient be told? What should the patient be asked?
Skills
E.g. How should the questions be asked? How should the information be
presented?
Attitudes
E.g. How should the resident interact with the patient?
Developing a Checklist Form
Counseling Session: Starting a New Therapy
Resident Name: ____________________________________________
Date:___________________
Components for a Checklist
Knowledge:
1.
2.
3.
4.
5.
Other:
Skills:
1.
2.
3.
4.
5.
Other:
Attitudes:
1.
2.
3.
4.
5.
Other:
Overall Rating of Counseling (circle one):
Poor
Marginal
Good
Excellent
Outstanding
Annotated Bibliography
Why Faculty Need to Observe Clinical Skills: Examples from Physical Diagnosis
» Wiener S, Nathanson M. Physical examination. Frequently observed errors. JAMA. 1976; 236:
852-855
This article confirmed what previous authors had demonstrated in the 1960’s: the
physical examination skills of house staff suffered from multiple errors. The authors did not
quantify the number of observed errors by house staff, but they did classify the most common
errors. The five main categories of errors are displayed in the table below:
Category
Technique
Description
1.
2.
3.
4.
5.
Poor ordering and organization of the exam
Defective or no equipment
Improper manual technique or use of instrument
Performance of the examination when not appropriate
Poor bedside etiquette leading to patient discomfort, embarrassment,
or overt hostility
Omission
1. Failure to perform part of the examination
Detection
1.
2.
3.
4.
Interpretation
1. Failure to understand the meaning in pathophysiologic terms of a
sign
2. Lack of knowledge of or use of confirming signs
3. Lack of knowledge of the value of a sign in confirming/refuting Dx
Recording
1. Forgetting a finding and not recording it
2. Illegible handwriting, obscure abbreviations, incomplete recording
3. Recording a diagnosis and not the sign detected
Missing a sign that is present
Reporting detection of a sign that is not present
Interpreting normal physiological or anatomic variation as abnormal
Misidentifying a sign after detection
Comment: Problems with clinical skills have “plagued” training programs for decades. Despite
the pleas of numerous educators over time, adequate assessment of history-taking, physical
examination, and communication skills remains sub-optimal in most residency programs today.
» Wray NP, Friedland JA. Detection and correction of house staff error in physical diagnosis.
JAMA. 1983; 249: 1035-37.
This study sought to quantify the amount of errors committed by house staff.
Disturbingly, residents committed at least one error in 58% of the patients they examined, and
interns committed at least one error in 62% of their patients. The gold standard was an “expert”
faculty member. The majority were errors of omission (72% of all errors).
Comment: Further confirmation of the scope of the problem. This article also highlights the
importance of faculty observation and that through direct observation faculty can correct
deficiencies in “real time.”
» Mangione S, Nieman LZ. Cardiac auscultatory skills of Internal Medicine and Family Practice
trainees: A comparison of diagnostic proficiency. JAMA. 1997; 278: 717-22
Thirty-one training programs with 453 residents and 88 medical students participated in
this trial. All of the participants listened to 12 cardiac sounds taken directly from actual patients,
then completed a multiple choice questionnaire. On average the residents identified only 20% of
the sounds correctly. Internal medicine residents were only slightly better than family practice
residents. Level of training had little to no effect on correct identification.
Comment: This well done study documented highly deficient auscultatory skills among a large
group of trainees. This study does not answer the question of the best teaching method and the
role of direct observation by faculty in auscultatory skills. However, when these results are
considered in the context of the studies by Weiner and Wray, faculty can address and correct
problems and errors with technique through direct observation.
Faculty Skill in the Observation of Clinical Skills
» Kroboth FJ, Hanusa BH, Parker S, et al. The inter-rater reliability and internal consistency of a
clinical evaluation exercise. J Gen Intern Med. 1992; 7: 174-9.
This study examined the reliability of the traditional clinical evaluation exercise (CEX).
Each of 32 interns was observed twice with two raters present for each patient interaction.
Faculty completed a standardized rating scale form at the end of the exercise. Overall, inter-rater
agreement for scores was poor on the three main domains of competence assessed: historytaking, physical examination, and cognitive skills.
Comment: This study examined the reliability of performance observations by faculty of interns
during an inpatient CEX. Validity was not assessed in this study. Interestingly, prior experience
of the faculty member with the CEX did not lead to better reliability.
» Noel GL, Herbers JE, Caplow MP, Cooper MS, Pangaro LN, Harvey J. How well do Internal
Medicine faculty members evaluate the clinical skills of residents. Ann Intern Med. 1992; 117:
757-765.
In this study 203 faculty members participated in the trial designed to assess faculty
ratings skills in the CEX. One group, 69 participants, received a brief educational intervention
that consisted of a 15-minute videotape explaining the purpose of the CEX and the need for
detailed observation and feedback. All 203 faculty viewed two CEX case simulations on tape.
For each taped clinical scenario, a resident was trained to omit or perform improperly important
aspects of the history and/or physical examination. Enough “errors” were inserted so that the
resident should be rated as marginal.
Accuracy scores were calculated for each attending. The overall accuracy for the cohorts
of faculty ranged from only 32% for those using an open-ended evaluation form to ~ 60% for
faculty given a structured evaluation form. Regarding overall ratings of competence for the two
scenarios, over 50% of the faculty rated each of the two scenarios as satisfactory or superior. Use
of the 15-minute instructional videotape did not improve accuracy.
Comment: This study is important because it is one of the few that investigated the accuracy of
actual observation skills. Despite use of a structured form for a standardized videotape clinical
encounter that was carefully staged, overall accuracy remained at best 60%. More concerning is
the high overall competency ratings given by the majority of the faculty for clinical scenarios
specifically designed to depict marginal performance. This study also helps to highlight the
important distinction between reliability and accuracy/validity. Highly reliable ratings of
competence are essentially useless if they fail to possess reasonable levels of accuracy and
validity.
Although an attempt was made to provide a subgroup of the faculty with some degree of
“rater training,” the educational intervention was brief and not designed to improve observation
skills. Thus, we cannot extrapolate from this study any significant information about rater
training programs other than to note that more research in this area is desperately needed.
» Kalet A, Earp JA, Kowlowitz V. How well do faculty evaluate the interviewing skills of
medical students? J Gen Intern Med. 1992; 7: 499-505.
This study utilized an Objective Structured Clinical Exam (OSCE) with second year
medical students to examine the reliability and accuracy of faculty ratings. Videotapes were
made of 21 of 159 total encounters for review by a) the original faculty member who observed
the OSCE in person and b) comparison with other faculty and outside “experts.” With regards to
accuracy, the faculty scores after observing the OSCE averaged 80% correct (range 41-100%).
Intra-rater agreement (faculty who observed the OSCE then rated the videotape of the
same encounter) was quite poor (Pearson correlation coefficients 0.12 to 0.54 depending on the
domain of competence rated). Inter-rater agreement between faculty and outside experts was
equally poor, with correlation coefficients ranging from 0.11 to 0.37.
Comment: This study with a group of medical students is consistent with the findings of Noel, et
al. Although accuracy was better (using a checklist for process variables only), reliability was
quite poor. Rater training was not part of the study.
» Marin JA, Reznick RK, Rothman A, Tamblyn RM, Regehr G. Who should rate candidates in
an objective structured clinical examination? Acad Med. 1996; 71: 170-75.
This study compared the ratings of history-taking sessions on an OSCE among
certification candidates for Canada’s qualifying examination. Candidates were evaluated by three
different raters: a physician rater, a standardized patient observer, and a standardized patient
rating from recall, all raters using the same checklists. The gold standard was a panel of 3
“expert” physicians who had rated the candidates using identical checklists. The physician rater
was less likely to deviate from the ratings of the expert panel compared to the standardized
patients.
Comment: This study highlights several points. One, direct observation of clinical skills by
faculty physicians is important and in some instances may be more reliable depending on the
purpose of the evaluation and the gold standard used. It is also important to note that a standard
checklist was used for the ratings. The checklist helped to frame the observation of these raters.
Noel, et al also found that a more detailed, specific rating form improved reliability in their study
of the CEX (see above).
Rater Training: A Few Lessons from Industry
» Murphy KR, Balzer WK. Rater errors and rating accuracy. J Appl Psych. 1989; 74: 619-24.
Using meta-analysis, this study sought to determine the relationship between rating errors
and rating accuracy. Some in the field of performance assessment have argued that the absence
of errors by raters indicates better accuracy in rating performance. The three main rating errors
are halo effect, leniency, and range restriction. Halo error is the failure to discriminate among
the different dimensions of clinical competence, usually because a rating in one domain affects
the ratings of all other domains. Leniency error is the tendency to give everyone good ratings
regardless of actual performance. Range restriction is simply the tendency to use a “restricted”
portion of the scale (e.g. giving a resident all 5’s or more likely in medicine, all 9’s on a 9-point
scale. This is a potential example of all three errors!).This study found a weak correlation
between rating errors and rating accuracy.
Comment: This is an important finding given other work in the field has found that rater training
specifically focused on reducing rater errors may actually reduce accuracy.
» Murphy KR, Garcia M, Kerkar, Martin C, Balzer WK. Relationship between observational
accuracy and accuracy in evaluating performance. J Appl Psych. 1982; 67: 320-25.
This study examined the relationship between observational accuracy and performance
rating accuracy. The first step in any rating process should involve observation of the
performance dimension of interest (e.g. physical exam skills). Hopefully accurate ratings of
observations lead to better judgments of performance ratings. The study also highlights that there
are four separate components of rating accuracy used by industrial psychologists (definitions
from Murphy, et al):
1. Elevation: accuracy due to the average rating, over all ratees , given by a rater. The rater with
an overall accuracy score closer to the true score is more accurate than someone whose
average rating is far from the true score.
2. Differential elevation: the component associated with the average rating for each ratee,
across all performance dimensions. A rater with good differential elevation will correctly
rank order ratees on the basis of their overall performance.
3. Stereotype accuracy: component associated with the average rating given to each
performance dimension across all ratees. A rater with good stereotype accuracy will
correctly assess the relative strengths of ratees across multiple performance dimensions (e.g.
clinical judgment vs. interviewing skills vs. knowledge, etc.).
4. Differential accuracy: Component of accuracy that reflects the rater’s sensitivity to ratee
differences in patterns of performance (e.g. think settings, ward versus clinic, etc.)
This study used videotapes of a teaching encounter as the unit of performance to be
evaluated. The main finding was that frequency ratings (accurate recording of observed
behaviors) and performance evaluations probably involve to some degree different cognitive
processes. Performance evaluation involves, in the words of the authors, “complex, abstract
judgments about the quality of performance”. The other finding was that accuracy in
observation had a modest association with performance rating accuracy. Likewise, “errors”
in observation also were more likely to lead to less accurate performance ratings.
Comment: This study highlights the multiple and complex components of accuracy. This
early study also showed only a modest link between accuracy in observing behaviors and
accuracy in actual performance ratings. The results of this study in essence “foreshadow” the
results seen in the study by Noel, et al of the CEX (see above). One, less accuracy in
observations of the CEX led to a more “inaccuracies” in the overall performance rating, and
two, accurate recording of behaviors are not always “correctly” incorporated in the complex
task of overall performance ratings.
» McIntyre RM, Smith DE, Hassett CE. Accuracy of performance ratings as affected by rater
training and perceived purpose of rating. J Appl Psych. 1984; 69: 147-56.
One of the early studies using video-tape to train raters using FOR – frame of reference
training. FOR contains the following elements:
f) Participants are given job descriptions and are then instructed to discuss what they believe
the qualifications are needed for the job.
g) Participants are given job (e.g. clinical ) vignettes describing critical incidents of
performance from unsatisfactory to average to outstanding. (Frame of reference).
h) Participants use the vignettes to provide ratings on a behaviorally anchored rating scale.
i) The session trainer then provides feedback on what the “true” ratings should be along with an
explanation for the rating.
j) The training session wraps up with an important discussion on the discrepancies between the
participants’ ratings and the “true” ratings.
This study also studied the impact of the purpose of the rating; ratings meant to provide
feedback for improvement or for a hiring decision (“high stakes” evaluation).
Participants who completed the FOR training demonstrated greater accuracy. Interestingly,
the group who received only error avoidance training (see study above) actually did worse
with regards to accuracy. Interestingly, the purpose of the rating did not make a difference in
this study. Several caveats need to be noted:
First, the improvement in ratings was modest at best. Second, the target group was college
students rating a video-taped lecture.
Comment: Although we can learn from such studies, we cannot directly extrapolate this type
of training to the more complex task of rating clinical performance across multiple domains.
We clearly need more rigorous research in rater training programs for teaching faculty.
» Hauenstein NMA. Training raters to increase the accuracy of appraisals and the usefulness of
feedback. Pgs. 419-21. In Performance Appraisal, Smither JW, editor. Jossey-Bass, San
Francisco. 1998.
» Woehr DJ, Huffcutt AI. Rater training for performance appraisal: a quantitative review. J
Occupational Organizational Psych 1994; 67: 189-205.
One particular form of rater training is behavioral observation training (BOT). As stated
by Hauenstein, BOT “is designed to improve the detection, perception, and recall of performance
behaviors.” Thus this type of training is particularly pertinent to faculty development in clinical
competence evaluations. BOT has two main components. The first is to encourage the rater to
increase the amount of observations, or “increase the sampling of behaviors.” Along with
enhanced sampling is training to avoid observational errors. The focus is on accurate recall of
performance behavior. As an example, the mini-CEX is in essence an attempt to promote more
direct observation by structuring the purpose and form of the observation.
The second key component is to encourage raters to utilize aides, or “aides-de-memoir”,
to document witnessed behaviors. The purpose of this diary or log is to help the rater track those
dimensions of performance actually observed. The rater can periodically assess what dimensions
of performance have NOT been observed and thus make plans to correct these “observational
deficiencies”. Experts suggest that raters define in advance what, how many, and how frequent
specific performance behaviors will be observed. The mini-CEX is one tool that can help raters
with these decisions. BOT training has been found to improve rater accuracy, and higher
numbers of observations also appears to improve accuracy.
Comment: We believe BOT type training is highly relevant to residency training. Lack of direct
observation is already a well-recognized problem, and research is critically needed to define the
optimal approach for BOT in medical training. Existing tools such as the mini-CEX and OSCE’s
are well-suited instruments for this task. Most BOT programs described in the psychology
literature did not involve more than several hours of training time. This is a reasonable time
commitment for those faculty serving a key role in the evaluation process in residency programs.
References: Resident Clinical Skills
(Compiled by Richard Hawkins, MD; Director, USUHS Clinical Simulation Center)
Beckman HB, Frankel RM. The Use of Videotape in Internal Medicine Training. J Gen Intern
Med 1994;9:S17-S21.
Burdick WP, Friedman Ben-David M, Swisher L, Becher J, Magee D, McNamara R, Zwanger
M. Reliability of Performance-based Clinical Skill Assessment of Emergency Medicine
Residents. Acad Emerg Med 1996;3:1119-23.
Chalabian J, Garman K, Wallace P, Dunnington G. Clinical Breast Evaluation Skills of House
Officers and Students. Am Surg 1996;6_:840-5
Chalabian J, Dunnington G. Do Our Current Assessments Assure Competency in Clinical Breast
Evaluation Skills? Am J Surg 1998;175:497-502.
Day SC, Grosso LJ, Norcini JJ, Blank LL, Swanson DB, Horne MH. Residents Perception of
Evaluation Procedures Used by Their Training Program. J Gen Intern Med 1990;5:421-6.
Duffy DF. Dialogue: The Core Clinical Skill. Ann Intern Med 1998;128:139-41.
Dupras DM, Li JTC. Use of an Objective Structured Clinical Examination to Determine Clinical
Competence. Acad Med 1995;70:1029-34.
Eggly S, Afonso N, Rojas G, Baker M, Cardozo L, Robertson RS. An Assessment of Residents'
Competence in Delivery of Bad News to Patients. Acad Med 1997;72:397-9.
Elliot DL, Hickam DH. Evaluation of Physical Examination Skills: Reliability of Faculty
Observers and Patient Instructors. JAMA 1987;258:3405-8.
Fletcher RH, Fletcher SW. Has Medicine Outgrown Physical Diagnosis? Ann Intern Med
1992;117:786-7.
Fox RA, Clark CLI, Scotland AD, Dacre JE. A Study of Pre-registration House Officers’
Clinical Skills. Med Educ 2000;34:1007-12.
Hawkins R, Gross R, Gliva-McConvey G, Haley H, Beuttel S, Holmboe E. Use of Standardized
Patients for Teaching and Evaluating the Genitourinary Examination Skills of Internal Medicine
Residents. Teach Learn Med 1998;10:65-8.
Herbers JE, Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How Accurate Are
Faculty Evaluations of Clinical Competence? J Gen Intern Med 1989;4:202-8.
Hilliard RI, Tallett SE. The Use of an Objective Structured Clinical Examination with
Postgraduate Residents in Pediatrics. Arch Pediatr Adolec Med 1998;152:74-8.
Holmboe ES, Hawkins RE. Methods for Evaluating the Clinical Competence of Residents in
Internal Medicine: A Review. Ann Intern Med 1998;129:42-8.
Johnson JE, Carpenter JL. Medical House staff Performance in Physical Examination. Arch
Intern Med 1986;146:937-41.
Johnston BT, Boohan M. Basic Clinical Skills: Don’t Leave Teaching to the Teaching
Hospitals. Med Educ 2000;34:692-9.
Joorabchi B, Devries JM. Evaluation of Clinical Competence: The Gap Between Expectation
and Performance. Pediatrics 1996;97:179-84.
Kalet A, Earp JA, Kowlowitz V. How Well Do Faculty Evaluate the Interviewing Skills of
Medical Students? J Gen Intern Med 1992;7:499-505.
Kern DC, Parrino TA, Korst DR. The Lasting Value of Clinical Skills. JAMA 1985;254:70-6.
Klass D, De Champlain A, Fletcher E, King A, Macmillan M. Development of a Performancebased Test of Clinical Skills for the United States Medical Licensing Examination. Federation
Bulletin 1998;85:177-84.
Kroboth FJ, Kapoor W, Brown FH, Karpf , Levey GS. A Comparative Trial of the Clinical
Evaluation Exercise. Arch Intern Med 1985;145:1121-3.
Kroboth FJ, Hanusa BH, Parker S, Coulehan JL, Kapoor WN, Brown FH, Karpf M, Levey GS.
The Inter-rater Reliability and Internal Consistency of a Clinical Evaluation Exercise. J Gen
Intern Med 1992;7:174-9.
Lane JL, Gottleib RP. Structured Clinical Observations: A Method to Teach Clinical Skills with
Limited Time and Financial Resources. Pediatrics 2000;105(4):__
Lee KC, Dunlop D, Dolan NC. Do Clinical Breast Examination Skills Improve during Medical
School? Acad Med 1998;73:1013-9.
Li JTC. Assessment of Basic Examination Skills of Internal Medicine Residents. Acad Med
1994;69:296-9.
Mangione S, Peitzman SJ, Gracely E, Nieman LZ. Creation and Assessment of a Structured
Review Course in Physical Diagnosis for Medicine Residents. J Gen Intern Med 1994;9:213-8
Mangione S, Burdick WP, Peitzman SJ. Physical Diagnosis Skills of Physicians in Training: A
Focused Assessment. Acad Emerg Med 1995;2:622-9.
Mangione S, Nieman LZ. Cardiac Auscultatory Skills of Internal Medicine and Family Practice
Trainees: A Comparison of Diagnostic Proficiency. JAMA 1997;278:717-22.
Mangione S, Peitzman SJ. Revisiting Physical Diagnosis during the Medical Residency: It is
Time for a Logbook – and More. Acad Med 1999;74:467-9.
Mangrulkar RS, Judge RD, Stern DT. A Multimedia CD-ROM Tool to Improve Residents’
Cardiac Auscultation Skills. Acad Med 1999;74:572.
Marin JA, Reznick RK, Rothman A, Tamblyn RM, Regehr G. Who Should Rate Candidates in
an Objective Structured Clinical Examination? Acad Med 1996;71:170-5.
Noel GL, Herbers JE, Caplow MP, Cooper GS, Pangaro LN, Harvey J. How Well Do Internal
Medicine Faculty Members Evaluate the Clinical Skills of Residents? Ann Intern Med
1992;117:757-65.
Peterson MC, Holbrook JH, Hales DV, Smith NL, Staker LV. Contributions of the History,
Physical Examination, and Laboratory Investigation in Making Medical Diagnoses. West J Med
1992;156:163-5.
Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and Validity of an Objective Structured
Clinical Examination for Assessing the Clinical Performance of Residents. Arch Intern Med
1990;150:573-7.
Pfeiffer C, Madray H, Ardolino A, Willms J. The Rise and Fall of Students’ Skill in Obtaining a
Medical History. Med Educ 1998;32:283-8.
Poenaru D, Morales D, Richards A, O’Connor M. Running and Objective Structured Clinical
Examination on a Shoestring Budget. Am J Surg 1997;173:538-41.
Ramsey PG, Curtis R, Paauw DS, Carline JD, Wenrich MD. History-taking and Preventive
Medicine Skills among Primary Care Physicians: An Assessment Using Standardized Patients.
Am J Med 1998;104:152-8.
Remmen R, Derese A, Scherpbier A, Denekens , Hermann I, van der Vleuten C, Van Royen P,
Bossaert L. Can Medical Schools Rely on Clerkships to Train Students in Basic Clinical Skills?
Med Educ 1999;33:600-5.
Sachdeva AK, Loiacono LA, Amiel GE, Blair PG, Friedman M, Roslyn JJ. Variability in the
Clinical Skills of Residents Entering Training Programs in Surgery. Surgery 1995;118:300-9.
Schechter GP, Blank LL, Godwin HA, LaCombe JA, Novack DH, Rosse WF. Refocusing on
History-taking Skills During Internal Medicine Training. Am J Med 1996;101:210-6.
Schwartz RW, Donnelly MB, Sloan DA, Johnson SB, Strodel WE. The Relationship Between
Faculty Ward Evaluations, OSCE and ABSITE as Measures of Surgical Intern Performance.
Sloan DA, Donnelly MB, Johnson SB, Schwartz RW, Strodel WE. Assessing Surgical
Residents' and Medical Students' Interpersonal Skills. J Surg Res 1994;57:613-8.
Sloan DA, Donnelly MB, Schwartz RW, Strodel WE. The Objective Structured Clinical
Examination: The New Gold Standard for Evaluating Postgraduate Clinical Performance. Ann
Surg 1995;222:735-42.
Stillman PL, Swanson, DB, Smee S. et al. Assessing Clinical Skills of Residents with
Standardized Patients. Ann Intern Med 1986;105:762-71.
Stillman P, Swanson D, Regan MB. Assessment of Clinical Skills of Residents Utilizing
Standardized Patient: A Follow-up Study and Recommendations for Application. Ann Intern
Med 1991;114:393-401.
Stillman PL, Regan MB, Swanson DB, Case S, McCahan J, Feinblatt J, Smith SR, Willms J,
Nelson DV. An Assessment of the Clinical Skills of Fourth Year Students at Four New Englan
Medical Schools. Acad Med 1990;65:320-6.
Suchman A, Markakis K, Beckman HB, Frankel R. A Model of Empathic Communication in
the Medical Interview. JAMA 1997;277:678-82.
Todd IK. A Thorough Pulmonary Exam and Other Myths. Acad Med 2000;75:50-1.
Turnbull J, Gray J, MacFadyen J. Improving In-Training Evaluation Programs. J Gen Intern
Med 1998;13:317-23.
Van Thiel J, Kraan HF, van der Vleuten C. Reliability and Feasibility of Measuring Medical
Interviewing Skills: The Revised Maastricht History-taking and Advice Checklist. Med Educ
1991;25:224-9.
Warf BC, Donnelly MB, Schwartz RW, Sloan DA. The Relative Contributions of Intepersonal
and Specific Clinical Skills to the Perception of Global Clinical Competence. J Surg Res
1999;86:17-23.
Wiener S, Nathanson M. Physical Examination. Frequently Observed Errors. JAMA
1976;236:852-5.
Williamson PR, Smith RC, Kern DE, Lipkin M, Barker LR, Hoppe RB, Florek J. The Medical
Interview and Psychosocial Aspects of Medicine. J Gen Intern Med 1992;7:235-42.
Woolliscroft JO, Stross JK, Silva J. Clinical Competence Certification: A Critical Appraisal. J
Med Educ 1984;59:799-805.
Woolliscroft JO, Howell JD, Patel BP, Swanson DB. Resident-Patient Interactions: The
Humanistic Qualities of Internal Medicine Residents Assessed by Patients, Attending Physicians,
Program Supervisors, and Nurses. Acad Med 1994:69:216-224.
Wray NP, Friedland JA. Detection and Correction of House staff Error in Physical Diagnosis.
JAMA 1983;249:1035-7.