Factors that Impact Teaching Evaluations Kathryn Dimiduka Executive Summary Faculty teaching is partially evaluated based on student responses to questions 5, 8 and 13 on the end-of-semester course evaluations (see question text in Appendix A). To determine if other factors, besides teaching skill, are associated with the evaluation scores, we developed a mixed model ANOVA using ten semesters (Fall ’06— Spring ’11) of lecture course evaluation data (1,916 courses) for tenure-track faculty. We also used a bivariate analysis to study pairs of factors. The results show there are factors that significantly affect scores besides teaching skill. We found five fixed factors to be significantly associated with course evaluation scores: (1) faculty gender, (2) class size, (3) course level, (4) whether the course is required, and (5) faculty rank. The cumulative effect of several negative factors versus several positive factors can be slightly more than a full point spread within the range of 1-5 on which the questions are scored. A formula and table of factor effects is provided to provide mean references for courses with various attributes. Results Summary When holding all other factors constant from the ANOVA, we found that: • • • • • • Courses taught by female professors on average receive lower scores than those taught by men. Small classes receive on average higher scores. The larger the class, the lower the average scores. Sophomore classes receive on average the lowest scores while graduate classes are associated with the highest evaluation scores. Required undergraduate courses on average receive lower scores. Associate Professors receive on average the highest evaluation scores as compared to other ranks. Time of day, department, and semester for a class was not found to affect scores with any significance. We also discovered (via bivariate analysis) that: • • At the graduate level, the difference in scores by gender is miniscule. Women are more than twice as likely to teach a sophomore course as men (17% v. 8%) and less likely to teach a graduate course (36% v. 43%). This is significant as the sophomore classes have the lowest average rating. See Figure 1 which displays the percent of the courses taught by each gender that are at each course level. For example, 5% of the total courses taught by female faculty were at the freshman level, 17% at the sophomore level and so on. Comparison of Courses Taught 50% Male 40% 25% 30% 20% 10% 0% Female 17% 7% 5% freshman 8% sophomore 29% 43% 36% 16% 15% junior senior grad Figure 1. Comparison of courses taught by male and female faculty. Recommendation Based on these findings, we recommend taking these factors into account when using teaching evaluations, especially for tenure and promotion cases and for teaching awards. This study only looked at actual scores; it does not take into account the additional effort required to achieve a particular score which has been reported in the literature1-3 for female professors. Detailed Methods and Analysis This ANOVA study considered the effect of each parameter in isolation as a linear model, but did not consider higher order effects where the combined effect of two parameters could be larger, or smaller, than the sum. The model used log transformed reflected values to calculate the ANOVA coefficients. Each coefficient is the estimate as to how much that particular parameter would affect the predicted geometric mean scores for a particular question, in isolation from the effect of the other parameters. By using a log transformation, the coefficients can be linearly combined to represent a particular type of class and then back transformed to give a predicted geometric mean score value for that type of class. We considered pairs of effects in separate bivariate studies. The most significant paired interactions were that women disproportionately teach more sophomore and fewer graduate courses as shown in Figure 1 and that the gender effect on scores nearly disappears at the graduate level. There is some evidence of a higher order effect on class size and gender, showing large classes more negatively affect female faculty scores, but we did not have a sufficient sample size of very large, female taught classes to fully explore this effect. Faculty ethnicity was not studied due to the small sample size of courses taught by minority faculty (Ref. 4 has some ethnicity information). We found five fixed factors to be significantly associated with course evaluation scores: (1) faculty gender, (2) class size, (3) course level, (4) whether the course is required, and (5) faculty rank Description of ANOVA Coefficients and Back-Transformation Equation The mixed-model ANOVA used log transformed reflected values and considered the fixed factors: faculty gender, class size, class level, faculty rank and whether the course was required for an undergraduate major (required means absolutely required or one of a selected set of courses where more than half would need to be taken). The model controlled for the following random factors: department, term, and individual faculty. Time taught was not statistically significant. Student centric data was not available and therefore not considered as factors. Table 1 gives the matrix of coefficients determined by the model. To predict the expected median score for a particular set of class and faculty attributes use the table coefficients and the transformation in Equation 1: Equation 1: predicted median rating = 6 – exp (intercept + ∑relevant coefficients) Table 1. Summary statistics for mixed model ANOVA entries in the table are the estimates of the effect of each fixed factor on the course evaluation, while holding all other factors constant. (** p-value < 0.01, * p-value < 0.05 Factor Coefficient(standard error) Intercept (Reference group^ median) Q5: interest Q8: faculty effectiveness Q13: course comparison +0.747 (0.023) +0.745 (0.026) +0.061 (0.031) +0.117** (0.037) +0.091** (0.032) students ≤ 19 -0.090** (0.013) -0.056** (0.013) -0.069** (0.013) 50 ≤ students ≤ 79 +0.043** (0.016) +0.044** (0.016) +0.049** (0.016) 80 ≤ students ≤ 119 +0.085** (0.022) +0.082** (0.022) +0.077** (0.023) 120 ≤ students +0.155** (0.031) +0.147** (0.030) +0.128** (0.031) Required ugrad +0.045* (0.021) +0.013 (0.022) +0.016 (0.021) Course Level freshman -0.003 (0.027) +0.039 (0.028) +0.011 (0.028) sophomore +0.077** (0.026) +0.063* (0.026) +0.046 junior +0.048* (0.022) +0.036 (0.022) grad -0.074** (0.015) -0.045** (0.015) -0.052** (0.015) Associate -0.063** (0.023) -0.058* (0.025) -0.062** (0.023) Assistant -0.037 -0.021 (0.028) Faculty Gender: Female Size Faculty Rank (0.024) +0.753 +0.043 -0.044 (0.023) (0.026) (0.022) (0.025) ^ Reference group is defined as follows: senior level, not required, 20 – 49 enrollment, taught by a male, full professor. Table 2 provides some example results for particular course and faculty attributes. Table 3 provides a visual representation of the effect of various factors. The back transformation equation to geometric mean scores is given in Appendix A Additional cases can be computed based on Appendix A or by using the websiteb: http://vlsi.cornell.edu/~rajit/ratings/index.php (username: teaching and password: rateprofile). Table 2. Sample predicted scores for various course types and faculty attributes Sample results Predicted mean ratings for Q8: faculty effectiveness Reference class: Non-required, senior class of 20-49 students 3.89 taught by a male, full professor Lowest predicted type of class: Required, sophomore class of 3.04 > 120 students taught by a female full professor Highest predicted type of class: Non-required, graduate course of 4.20 < 19 students taught by a male, associate professor. Example: Required, sophomore class of 80-119 students taught 3.28 by a female, assistant professor Example: Required, freshman level class of 120 or more 3.27 students taught by a female, associate professor Table 3. Visual representation of the effect of various class and faculty attributes on teaching evaluation scores for questions 5, 8 and 13. The arrows show the effect on the predicted geometric mean scores. Data from 10 semesters of Cornell Engineering courses taught by tenure track faculty = 1916 courses Acknowledgements a. Yoanna Ferrara introduced the ANOVA approach to this study and provided valuable assistance in developing and interpreting the statistical results. Jay Barry of the Statistical Consulting Group at Cornell University developed the final mixed model ANOVA model using log transformed values. b. Rajit Manohar created the website to calculate predicted geometric mean scores based on a set of course attributes. References 1. K. Andersen and E. D. Miller, “Gender and Student Evaluations of Teaching”, PS: Political Science and Politics, V.30 (2), June 1997, pp216-219. 2. H. Laube, K. Massoni, J. Sprague, and A. Ferber, “The Impact of Gender on the Evaluation of Teaching: What We Know and What We Can Do”, NWSA Journal, V. 19 (3), Fall 2007, pp87-104. 3. http://www.awm-math.org/newsletter/199409/basow.html, written in 1994. retrieved Dec. 12, 2012. “Student Ratings of Professors are Not Gender Blind.” 4. Recent literature review: T. Huston, “Research Report: Race and Gender Bias in Student Evaluations of Teaching” http://sun.skidmore.union.edu/sunNET/ResourceFiles/Huston_Race_Gender_TeachingEvals.pdf, retrieved Dec. 12, 2012 Appendix A Text of end-of-semester teaching evaluation questions used in this study. Question Text number Did the lecturer stimulate your interest in the subject? 1=not at all; 5 5=stimulated great interest, inspired independent effort Rate the overall teaching effectiveness of your lecturer compared to others 8 at Cornell. 1=worse than average; 5=much better than average Overall, how does course compare with other technical courses you've 13 taken at Cornell? 1=poorly, not educational; 5=excellently, extremely educational
© Copyright 2026 Paperzz