Factors that Impact Teaching Evaluations

Factors that Impact Teaching Evaluations
Kathryn Dimiduka
Executive Summary
Faculty teaching is partially evaluated based on student responses to questions 5, 8 and 13 on the end-of-semester
course evaluations (see question text in Appendix A). To determine if other factors, besides teaching skill, are
associated with the evaluation scores, we developed a mixed model ANOVA using ten semesters (Fall ’06—
Spring ’11) of lecture course evaluation data (1,916 courses) for tenure-track faculty. We also used a bivariate
analysis to study pairs of factors. The results show there are factors that significantly affect scores besides
teaching skill. We found five fixed factors to be significantly associated with course evaluation scores: (1) faculty
gender, (2) class size, (3) course level, (4) whether the course is required, and (5) faculty rank.
The cumulative effect of several negative factors versus several positive factors can be slightly more than a full
point spread within the range of 1-5 on which the questions are scored. A formula and table of factor effects is
provided to provide mean references for courses with various attributes.
Results Summary
When holding all other factors constant from the ANOVA, we found that:
•
•
•
•
•
•
Courses taught by female professors on average receive lower scores than those taught by men.
Small classes receive on average higher scores. The larger the class, the lower the average scores.
Sophomore classes receive on average the lowest scores while graduate classes are associated with the
highest evaluation scores.
Required undergraduate courses on average receive lower scores.
Associate Professors receive on average the highest evaluation scores as compared to other ranks.
Time of day, department, and semester for a class was not found to affect scores with any significance.
We also discovered (via bivariate analysis) that:
•
•
At the graduate level, the difference in scores by gender is miniscule.
Women are more than twice as likely to teach a sophomore course as men (17% v. 8%) and less likely to
teach a graduate course (36% v. 43%). This is significant as the sophomore classes have the lowest
average rating. See Figure 1 which displays the percent of the courses taught by each gender that are at
each course level. For example, 5% of the total courses taught by female faculty were at the freshman
level, 17% at the sophomore level and so on.
Comparison of Courses Taught
50%
Male
40%
25%
30%
20%
10%
0%
Female
17%
7% 5%
freshman
8%
sophomore
29%
43%
36%
16% 15%
junior
senior
grad
Figure 1. Comparison of courses taught by male and female faculty.
Recommendation
Based on these findings, we recommend taking these factors into account when using teaching evaluations,
especially for tenure and promotion cases and for teaching awards. This study only looked at actual scores; it
does not take into account the additional effort required to achieve a particular score which has been reported in
the literature1-3 for female professors.
Detailed Methods and Analysis
This ANOVA study considered the effect of each parameter in isolation as a linear model, but did not consider
higher order effects where the combined effect of two parameters could be larger, or smaller, than the sum. The
model used log transformed reflected values to calculate the ANOVA coefficients. Each coefficient is the
estimate as to how much that particular parameter would affect the predicted geometric mean scores for a
particular question, in isolation from the effect of the other parameters. By using a log transformation, the
coefficients can be linearly combined to represent a particular type of class and then back transformed to give a
predicted geometric mean score value for that type of class.
We considered pairs of effects in separate bivariate studies. The most significant paired interactions were that
women disproportionately teach more sophomore and fewer graduate courses as shown in Figure 1 and that the
gender effect on scores nearly disappears at the graduate level. There is some evidence of a higher order effect on
class size and gender, showing large classes more negatively affect female faculty scores, but we did not have a
sufficient sample size of very large, female taught classes to fully explore this effect. Faculty ethnicity was not
studied due to the small sample size of courses taught by minority faculty (Ref. 4 has some ethnicity information).
We found five fixed factors to be significantly associated with course evaluation scores: (1) faculty gender, (2)
class size, (3) course level, (4) whether the course is required, and (5) faculty rank
Description of ANOVA Coefficients and Back-Transformation Equation
The mixed-model ANOVA used log transformed reflected values and considered the fixed factors: faculty gender,
class size, class level, faculty rank and whether the course was required for an undergraduate major (required
means absolutely required or one of a selected set of courses where more than half would need to be taken). The
model controlled for the following random factors: department, term, and individual faculty. Time taught was not
statistically significant. Student centric data was not available and therefore not considered as factors. Table 1
gives the matrix of coefficients determined by the model. To predict the expected median score for a particular
set of class and faculty attributes use the table coefficients and the transformation in Equation 1:
Equation 1: predicted median rating = 6 – exp (intercept + ∑relevant coefficients)
Table 1. Summary statistics for mixed model ANOVA entries in the table are the estimates of the effect of each fixed factor on the course
evaluation, while holding all other factors constant. (** p-value < 0.01, * p-value < 0.05
Factor
Coefficient(standard error)
Intercept (Reference group^ median)
Q5: interest
Q8: faculty effectiveness
Q13: course comparison
+0.747
(0.023)
+0.745 (0.026)
+0.061
(0.031)
+0.117** (0.037)
+0.091** (0.032)
students ≤ 19
-0.090** (0.013)
-0.056** (0.013)
-0.069** (0.013)
50 ≤ students ≤ 79
+0.043** (0.016)
+0.044** (0.016)
+0.049** (0.016)
80 ≤ students ≤ 119
+0.085** (0.022)
+0.082** (0.022)
+0.077** (0.023)
120 ≤ students
+0.155** (0.031)
+0.147** (0.030)
+0.128** (0.031)
Required ugrad
+0.045* (0.021)
+0.013 (0.022)
+0.016
(0.021)
Course
Level
freshman
-0.003
(0.027)
+0.039 (0.028)
+0.011
(0.028)
sophomore
+0.077** (0.026)
+0.063* (0.026)
+0.046
junior
+0.048* (0.022)
+0.036 (0.022)
grad
-0.074** (0.015)
-0.045** (0.015)
-0.052** (0.015)
Associate
-0.063** (0.023)
-0.058* (0.025)
-0.062** (0.023)
Assistant
-0.037
-0.021 (0.028)
Faculty Gender: Female
Size
Faculty
Rank
(0.024)
+0.753
+0.043
-0.044
(0.023)
(0.026)
(0.022)
(0.025)
^ Reference group is defined as follows: senior level, not required, 20 – 49 enrollment, taught by a male, full professor.
Table 2 provides some example results for particular course and faculty attributes. Table 3 provides a visual
representation of the effect of various factors. The back transformation equation to geometric mean scores is
given in Appendix A Additional cases can be computed based on Appendix A or by using the websiteb:
http://vlsi.cornell.edu/~rajit/ratings/index.php (username: teaching and password: rateprofile).
Table 2. Sample predicted scores for various course types and faculty attributes
Sample results
Predicted mean ratings for
Q8: faculty effectiveness
Reference class: Non-required, senior class of 20-49 students
3.89
taught by a male, full professor
Lowest predicted type of class: Required, sophomore class of
3.04
> 120 students taught by a female full professor
Highest predicted type of class: Non-required, graduate course of
4.20
< 19 students taught by a male, associate professor.
Example: Required, sophomore class of 80-119 students taught
3.28
by a female, assistant professor
Example: Required, freshman level class of 120 or more
3.27
students taught by a female, associate professor
Table 3. Visual representation of the effect of various class and faculty attributes on teaching evaluation scores
for questions 5, 8 and 13. The arrows show the effect on the predicted geometric mean scores.
Data from 10 semesters of Cornell Engineering courses taught by tenure track faculty = 1916 courses
Acknowledgements
a.
Yoanna Ferrara introduced the ANOVA approach to this study and provided valuable assistance in developing and interpreting
the statistical results. Jay Barry of the Statistical Consulting Group at Cornell University developed the final mixed model
ANOVA model using log transformed values.
b. Rajit Manohar created the website to calculate predicted geometric mean scores based on a set of course attributes.
References
1. K. Andersen and E. D. Miller, “Gender and Student Evaluations of Teaching”, PS: Political Science and Politics, V.30 (2), June 1997,
pp216-219.
2. H. Laube, K. Massoni, J. Sprague, and A. Ferber, “The Impact of Gender on the Evaluation of Teaching: What We Know and What We
Can Do”, NWSA Journal, V. 19 (3), Fall 2007, pp87-104.
3. http://www.awm-math.org/newsletter/199409/basow.html, written in 1994. retrieved Dec. 12, 2012. “Student Ratings of Professors are
Not Gender Blind.”
4. Recent literature review: T. Huston, “Research Report: Race and Gender Bias in Student Evaluations of Teaching”
http://sun.skidmore.union.edu/sunNET/ResourceFiles/Huston_Race_Gender_TeachingEvals.pdf, retrieved Dec. 12, 2012
Appendix A
Text of end-of-semester teaching evaluation questions used in this study.
Question Text
number
Did the lecturer stimulate your interest in the subject? 1=not at all;
5
5=stimulated great interest, inspired independent effort
Rate the overall teaching effectiveness of your lecturer compared to others
8
at Cornell. 1=worse than average; 5=much better than average
Overall, how does course compare with other technical courses you've
13
taken at Cornell? 1=poorly, not educational; 5=excellently, extremely
educational