Some Guidelines for Academic Quality Rankings

Higher Education in Europe, Vol. XXVII, No. 4, 2002
Some Guidelines for Academic Quality Rankings
MARGUERITE CLARKE
While rankings are a popular method for comparing the relative quality of higher education
institutions, there is much confusion and debate over which indicators to use and how to present
the information in ranked format. This article offers some guidelines in both areas as well as
questions to consider at each stage of the ranking process.
Nonetheless, just as democracy, according to Winston Churchill, is the worst form
of government except for all the others, so quality rankings are the worst device
for comparing the quality of … colleges and universities, except for all the others
(Webster, 1986, p. 6).
Academic quality rankings are a controversial, but enduring, part of the educational
landscape—controversial because not everyone agrees that the quality of a school or a
programme can be quantified (Casper, 1996), and enduring because of the lack of other
publicly attractive methods for comparing institutions (Sanoff, 1998). While the lack of
appealing alternatives has legitimated the use of rankings in the eyes of many (but not all),
there is still a lively debate over the issue of how to rank (Hattendorf, 1993).
This article addresses the “how to” issue by offering some general guidelines for
academic quality rankings. The discussion is presented in three parts and is couched in an
American context. The first section of the paper outlines the conceptual categories that
underpin many ranking efforts. It describes the strengths and limitations of some of the
methods used to collect information on each, and offers some guidelines for the selection
of ranking indicators. The second section examines the popular weight-and-sum approach
to presenting this information in ranked format and explores its limitations for the
evaluation of educational quality.1 The third section draws on the findings of the previous
two and presents a list of things to consider at each stage of the ranking process.
Before proceeding, it is worth defining what an academic quality ranking means.
According to Webster (1986), an academic quality ranking:
[M]ust be arranged according to some criterion or set of criteria which the
compiler(s) of the list believed measured or reflected academic quality[; and] it
must be a list of the best colleges, universities, or departments in a field of study,
in numerical order according to their supposed quality, with each school or
department having its own individual rank, not just lumped together with other
schools into a handful of quality classes, groups, or levels (p. 5).
This definition reveals two characteristics of academic quality rankings that are the key
to the discussion that follows. The first characteristic is that the choice of indicators rests
with those doing the ranking. Consequently, while certain normative views of academic
quality exist, the set of indicators used will vary according to the value system of the person
1
While this section may seem more of a “how not to” than a “how to” one, the discussion raises important issues
that should be kept in mind no matter what ranking approach is used.
ISSN 0379-7724 print/ISSN 1469-8358 online/02/040443-17  2002 UNESCO
DOI: 10.1080/0379772022000071922
444
M. CLARKE
or group doing the ranking. The second characteristic is that schools or programmes must
be placed in order on the basis of their relative performance on these indicators. Thus, when
more than one indicator is involved, the information must either be combined in order to
produce a single ranked list of schools, or schools must be ranked separately on each
indicator. The next section focuses on the first of these ranking characteristics.
CONCEPTUALIZING AND MEASURING ACADEMIC QUALITY
The conceptualizations of quality that underpin most ranking efforts can be organized into
three categories: student achievements, faculty accomplishments, and institutional academic
resources. These categories are portrayed in Table 1 along with some of the methods
traditionally used to collect information on each. The data obtained through these methods
are organized into indicators. For instance, the information obtained through a survey of
reputation is usually recorded in the form of a score for each institution. Together, these
scores form a “reputation” indicator that may be used—either alone, or in conjunction with
other indicators—to rank schools.
As indicated in the third column of Table 1, each indicator has strengths and weaknesses
that should be kept in mind when determining the appropriateness of its use (see also
Hattendorf, 1993). For example, while surveys of reputation can produce rankings with high
public credibility, they can also be misleading, if the overall reputation of an institution
clouds the evaluation of a particular department by a rater (e.g., if a university has a strong
reputation, this fact might lead a rater to give a higher than deserved score to a weak
department within the university).
Another strength of surveys of reputation is that they tend to be good at identifying the
strongest programmes in a particular area. For instance, if one wanted to identify the top-ten
graduate schools of business, a survey of reputation might serve the purpose very well.
However, these surveys are not effective when trying to sort out all the programmes in a
particular area since not all schools have well-known reputations. Consequently, using this
approach to rank all 352 accredited business programmes in the United States could be
problematic since raters may have little or no knowledge of many of the programmes they
are being asked to evaluate. The point to keep in mind is that every indicator comes with
strengths and weaknesses that need to be considered before including the given indicator in
a particular ranking effort.
The set of indicators shown in Table 1 does not reflect some of the more recent changes
in higher education (e.g., distance learning and computers in the classroom) that may affect
how quality is conceived and measured. Thus, it is useful, when choosing indicators, to
think in terms of the more general and desirable measurement properties of validity (does
the indicator measure what it purports to measure?), reliability (does it do so in a
consistent/error-free fashion?), and comparability (can it be interpreted in a similar way
across different kinds of programmes or institutions?) (Linn, 1993). Basically, if an
indicator is to be included in a ranking, there should be some evidence (either in the form
of existing data or data collected by those doing the ranking) that shows that it is
appropriate—i.e., valid, reliable, and comparable—for the intended purpose. It is not easy
to establish these properties since there can be conflicting evidence, and opinion, as to the
appropriateness of an indicator for a given purpose. For example, among the indicators
portrayed in Table 1, the validity of the test scores of incoming students as a measure of
institutional quality has been criticized on the grounds that it measures only what students
bring with them and not what the institution does for students (Seaman, 1998). However,
others argue that it is a valid, albeit indirect, indicator of institutional quality since “higher
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
445
TABLE 1. Categories of academic quality
Category
Method/indicator
Advantage
Faculty
accomplishments
Surveys of reputation
(e.g., ratings of faculty or
programme reputation)
They produce results with face validity, i.e.,
results that most nearly match what the educated
public considers the hierarchy of colleges and
universities to be
Disadvantage
The overall reputation of an institution may
influence the assessments by raters of the
particular department(s) they are being
asked to rank
They are useful for ranking the best or the
better institutions
Disadvantage
They may be years behind or ahead of reality
Counts of faculty awards,
honours, and prizes
Student
achievements
Institutional
academic
resources
Counts of faculty citations
in citation indexes
Useful in assessing the influence and
importance of the publications of faculty
members, and not just their sheer volume
Disadvantage
The citation indexes on which the rankings
are based do not distinguish between
“good”, “neutral”, or “bad” citations
Distinguished alumni
and the achievements
of graduates after
graduation
While only a small percentage of colleges
and universities have faculties that produce
much research, almost all of them attempt to
prepare their students for
rewarding careers in later life
Disadvantage
Usually years, if not decades, behind reality
The scores of incoming
students on standardized
tests
The data are easy to obtain and are a measure
by which most institutions can be ranked
Disadvantage
Based on the academic abilities of students
before they enter college and thus fail to
consider anything that these institutions do
to educate their students once
they enroll
Compilation of measures
of institutional resources,
including educational
expenditure per student,
faculty–student ratios,
and library resources
The data are easy to obtain and are a measure
on which all institutions can be compared
Disadvantage
Offers little or no information about how
often and how beneficially students use
these resources
Source: Adapted from Webster (1986).
quality” institutions tend to attract the most talented students, and one way of measuring
this attraction is through the scores of incoming students on standardized tests (Morse and
Flanigan, 2001).
Another way of thinking about the choice of indicators is in terms of inputs, processes,
and outputs. All else being equal, process (e.g., teaching quality) and output (e.g.,
effectiveness of graduates in the workplace) measures are preferred since they are better
446
M. CLARKE
indications of the quality of the instruction, preparation, and resources offered by an
institution. The measures in Table 1 can be grouped into inputs (e.g., the scores of incoming
students on standardized tests and library resources) and outputs (e.g., faculty citations and
awards). The lack of process measures in this table, and more generally, is explained by
their being more difficult to identify and more difficult and costly to measure. However,
when available, they can provide very useful information on aspects such as classroom
environment and teaching effectiveness. For instance, until recently, the rankings of The
Times Higher Education Supplement of British universities included a “teaching quality”
measure that indicated the effectiveness of instruction in different undergraduate departments at these universities (see ⬍ http://www.thesis.co.uk/ ⬎ ).
Another guideline to consider when choosing indicators is the objective or subjective
nature of the indicators themselves. Objective indicators are those not dependent on the
person doing the counting. For example, if two people were asked to compute the
student–faculty ratio or the number of books in the library of a particular institution, they
would come up with the same result (if given the same formula and no computational errors
were made). Subjective indicators are those that can vary depending on who is responding.
For example, if two people were asked to rate the overall academic quality of a particular
institution (as in a survey of reputation) using the same set of criteria, they might come up
with two very different scores because their subjective opinion would enter the process. Not
surprisingly, reliability can be more of an issue with subjective indicators.
Other guidelines could be added, but the above list covers some of the more important
ones to keep in mind. In addition to these considerations, standardized procedures should
be used to collect, store, analyze, and present the information. These controls are necessary
because errors at any stage of the process will reduce or eliminate the usefulness of an
indicator as a potential measure of academic quality. The next section addresses the second
characteristic of rankings outlined at the start of this article—i.e., the choice of method for
presenting the information in ranked format.
PRESENTING THE INFORMATION IN RANKED FORMAT
Once the set of indicators has been chosen, a method must be selected for presenting the
information in a ranked format. Several options exist, some of which are discussed in other
articles in this issue of Higher Education in Europe. Instead of re-examining those
approaches, this section explores the popular weight-and-sum approach and discusses its
limited usefulness for representing the relative quality of institutions or programmes (see
Clarke, 2002a; also Clarke, 2002b). In doing so, some general issues are raised that should
be kept in mind when presenting ranked information.
The Weight-and-Sum Approach
The weight-and-sum approach involves assigning a weight to each indicator according to
its perceived importance and then using the weights to combine the indicator information
into an overall score (Scriven, 1991). While the result of this process is one easy-to-digest
number, critics have pointed out several problems, including the fact that the choice of
weights is itself a value judgment and thus can vary depending on who is making the
decision (Camilli and Firestone, 2000). Depending on the number of criteria and their
weights, one dimension may dominate all the others, or several trivial dimensions may
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
447
swamp more crucial ones (Evaluation News, 1981). Nonetheless, the approach works quite
well in certain contexts and has been used for years in the area of product evaluations (see
⬍ www.consumerreports.org ⬎ for an example).
Despite the popularity of this method for ranking cars and toasters, there is divided
opinion as to whether it works for the evaluation of educational quality (Hattendorf, 1993;
Scriven, 1991). This lack of consensus is partly due to the basing of product ratings on
easily observable and quantifiable parts of the product or its performance (e.g., in a car, fuel
economy, the warranty, acceleration, and safety features), but academic quality ratings tend
to be based on institutional components that are harder to observe or to quantify (e.g.,
reputation and distinguished alumni). Consequently, it is not as easy to reach consensus as
to which indicators to use and how to measure them when assessing academic quality as
it is in the case of assessing car performance.
Even if one is conceptually comfortable with using this approach to rank institutions,
there is little research on the relationships that exist among the various indicators of
academic quality, or on their relative importance in the assessment of institutional or
programme quality. Therefore, it is difficult to know how to weight them in order to come
up with an overall score. This problem is less pronounced for product ratings since there
is a clearer sense of the relative importance of different aspects of product performance as
well as how these relate to each other. In addition, it is easier to validate a formula used
to rank products since there are more easily observable outcome indicators to rely on—e.g.,
do cars that are ranked highest actually perform better/break down less frequently?
Limitations of the Weight-and-Sum Approach for the Evaluation of Educational
Quality
One way to look at the implications of these issues for academic quality rankings is by
examining rankings that use the weight-and-sum approach. The data presented here are
from the weekly magazine US News and World Report. It is one of the most popular
sources of college and graduate school rankings in the United States and has been ranking
schools since 1983.2 While the indicators used for each of the college and graduate school
rankings vary, the methodology is the same: i.e., the chosen indicators are standardized,
weighted, and summed to produce an overall score on which to rank schools or programmes
in each category against their peers (Garrett, 2002; Morse and Flanigan, 2001). No research
is cited as the basis for the indicators and weights used, and there is no indication that the
properties of the indicators or how they are related to each other have been examined.
However, the importance of obtaining this type of information, particularly if the
indicators are going to be combined, can be seen by looking at Tables 2 and 3. These tables
are based on data from the US News and World Report 2001 calendar year rankings of the
top-fifty business schools and the top-fifty schools of education in the United States (data
tend not to be published for schools below the top-fifty). In each table, the set of indicators
used for that ranking is presented across the top and down the side of the table. The
numbers in the table represent the strength of the relationships (or correlations) among the
various indicators. The magnitude of the correlation can vary between 0 and 1, with 0 being
the weakest and 1 the strongest relationship. In addition, the direction of the relationship
2
The college rankings can be obtained at ⬍ http://www.usnews.com/usnews/edu/college/rankings/rankindex.
htm ⬎ . The graduate school rankings can be obtained at ⬍ http://www.usnews.com/usnews/edu/beyond/bcrank.
htm ⬎ .
1
0.90
1
Reputation
(recruiter)
0.90
0.86
1
Starting
salary
* Based on data for the top-fifty schools and rounded to two decimal places.
Reputation (academic)
Reputation (recruiter)
Starting salary
Employed at graduation
Employed three months later
Average GMAT scores
Average undergraduate GPA
Percent accepted
Reputation
(academic)
0.32
0.32
0.52
1
Employed at
graduation
0.24
0.18
0.43
0.49
1
Employed
three months
later
TABLE 2. Correlations for 2001 business school indicators*
0.81
0.70
0.80
0.30
0.24
1
Average
GMAT
scores
0.57
0.50
0.49
0.03
0.04
0.66
1
Average
undergraduate
GPA
⫺ 0.73
⫺ 0.68
⫺ 0.68
⫺ 0.20
⫺ 0.05
⫺ 0.69
⫺ 0.58
1
Percent
accepted
448
M. CLARKE
1
0.78
1
Reputation
(superintendent)
0.27
0.21
1
0.45
0.23
0.61
1
Quantitative
GRE scores
* Based on data for the top-fifty schools and rounded to two decimal places.
Reputation (academic)
Reputation (superintendent)
Verbal GRE scores
Quantitative GRE scores
Percent accepted
Ratio of students
to faculty
Number of
doctoral degrees granted
Proportion in
doctoral programmes
Total research
Research per
faculty member
Reputation
(academic)
Verbal
GRE
scores
⫺ 0.28
⫺ 0.15
⫺ 0.46
⫺ 0.41
1
Percent
accepted
0.16
0.13
0.14
0.01
⫺ 0.34
1
Ratio of
students
to faculty
0.21
1
1
0.44
0.39
⫺ 0.04
0.10
0.04
0.02
Proportion
in doctoral
programmes
0.39
0.21
⫺ 0.29
⫺ 0.10
⫺ 0.11
0.10
Number of
doctoral
degrees
granted
TABLE 3. Correlations for 2001 education school indicators*
0.01
⫺ 0.13
0.63
1
⫺ 0.12
1
0.001
0.05
0.14
0.04
⫺ 0.46
0.19
Research
per faculty
member
0.49
0.30
0.16
⫺ 0.08
0.09
⫺ 0.48
0.09
Total
research
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
449
450
M. CLARKE
can be positive or negative. A positive number means that as values for one indicator
increase, the values for the other indicator also tend to increase. A negative number means
that as values for one indicator increase, the values for the other indicator tend to decrease.
The strength of the relationship between any two indicators in Table 2 or 3 can be
determined by finding the number at which the column location for one and the row
location for the other intersect. For example, in Table 2 the correlation between “Starting
salary” and “Average Graduate Management Admissions Test (GMAT) scores” is 0.8. This
means that schools with higher average GMAT scores tend to have graduates with higher
starting salaries. The correlation between “Average GMAT scores” and “Percent accepted”
is ⫺ 0.69, which means that schools that accept fewer applicants tend to have incoming
students with higher average test scores (note that the relationship is not as strong as that
between “Starting salary” and “Average GMAT scores”).
The most obvious difference between Tables 2 and 3 is in the sizes of the correlations.
Specifically, the correlations among the business indicators tend to be larger than those
among the education indicators, meaning that there are stronger relationships among the
indicators used to assess quality in graduate schools of business. A business school that
performs well on one of these quality indicators is also likely to perform well on the rest.
It is harder to make this prediction with schools of education since the relationships are not
as strong. For instance, a school of education with high average “Quantitative Graduate
Record Examination (GRE) scores” for its entering students may or may not also have high
“Total research” expenditures (this refers to the amount of funded research being conducted
at the school). It is hard to predict, since the relationship between these indicators is quite
weak (i.e., r ⫽ 0.09). The reason for the different correlation patterns in Tables 2 and 3 is
not clear. For example, these differences could be the result of variations in what quality
looks like in a school of education versus what it looks like in a school of business, or they
may be due to the differential availability of quality-related information for these school
types.
Either way, these correlations have implications for the weighting of the indicators used
to produce an overall score. For instance, the relative sizes of the weights assigned to each
of the business indicators are less likely to have an impact upon the final ordering of schools
in this ranking since schools tend to perform similarly on each indicator (i.e., if they do well
on one, they also tend to do well on another). Therefore, whether the heaviest weight is
given to, for example, “Reputation (academic)” (the reputation scores given to schools by
academics) or “Starting salary” will have little effect on the final outcome since schools
tend to perform similarly in regard to both indicators (r ⫽ 0.9). The same cannot be said for
the education indicators where, depending on which indicator receives the most weight,
there can be a very different rank ordering of schools. For example, giving the largest
weight to “Reputation (superintendent)” (the reputation scores given to schools by superintendents) versus “Total research” (the amount of funded research being conducted at the
school) could produce quite different rankings since schools may not perform similarly on
both (r ⫽ 0.16).
These differences also raise questions about the choice of ranking indicators in general.
Should the indicators have high or low correlations? Are high correlations a sign of validity
(i.e., are they all measuring quality)—or of redundancy (are they all measuring the same
thing, and are therefore not all needed?) Are low correlations a sign of invalidity (i.e., are
some or all of the indicators not measuring quality?) or unique information (i.e., are they
all measuring different aspects of academic quality?). In addition, if efficiency is of value,
and if the indicators are highly correlated, should some of the indicators be dispensed with
since doing so would probably not affect the final ordering of schools? For instance, since
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
451
“Reputation (academic)” correlates highly with most of the other indicators used to rank
business schools, one could theoretically dispense with the seven other indicators and still
arrive at a very similar ordering of schools.
Another implication of the relationships shown in Tables 2 and 3 is seen in the sensitivity
of each ranking to changes in the ranking formula. US News and World Report, like many
organizations involved in rankings efforts, makes frequent changes in its ranking formulae.
These changes are meant to improve the rankings and may involve the addition or removal
of indicators, changes to the methodology/definition for an indicator, or changes to the
weights used. Any of these changes can have a significant impact on the relative ordering
of schools, an impact that is quite separate from real change in the relative performance on
the indicators of the school. The impact is magnified with the weight-and-sum approach,
particularly when the indicators used are not highly correlated, since this method produces
only one score on which to rank schools.
For example, the correlation between the top-fifty business schools according to the US
News and World Report in 1995 and 2001 is 0.88, while the correlation between the
top-fifty schools of education in 1995 and 2001 is only 0.78. Thus, the list of the top-fifty
business schools in 1995 is quite similar to the list in 2001, but the list of the top-fifty
schools of education in 1995 varies somewhat from the list published in 2001. There were
quite a few changes to the formula for the education rankings during this six-year period
(twenty changes in total), far more than to the formula for the business rankings (eight
changes in all).
While this phenomenon is one of the reasons behind the greater movement of schools in
the education rankings, it is not the only one. For instance, the US News and World Report
ranking formula for law schools also experienced a large number of changes during this
time period (fourteen changes in all), but the list of the top-fifty law schools in 1995 is very
similar to that for 2001 (r ⫽ 0.92). The reason for this situation is that the indicators used
for the law school rankings tend to be quite highly correlated. Thus, formula changes tend
not to affect the final outcome substantially in terms of the relative ordering of schools.
The weight-and-sum approach attempts to capture in one number an aggregate evaluation
of the worth of an institution relative to others. But this overall score can be misleading for
several reasons. It implies a comprehensive measure of the quality of the colleges or
programmes being ranked even though no currently available data offer sufficient coverage
to accomplish this task (Lombardi et al., 2001). In addition, the indicators used may be
problematic in terms of their validity, reliability, and comparability for the schools being
ranked (Cantor, 1996). All of these issues are magnified when weights are applied to create
an overall score (Casper, 1996). The usefulness of the information is further reduced when
frequent changes to the formula make it difficult to interpret movement in the relative
positions of schools in the rankings from year to year.
One way to reveal the uncertainty behind this overall score is shown in Tables 4 and 5.
The formulae used to produce these tables are given in the explanation that follows them,
and more detail is provided in Clarke (2002a). Basically, the tables show what happens to
the score awarded a school by US News and World Report when slight changes are made
to the indicators and weights being used. The amount of movement in the score awarded
to a school owing to these changes is used to generate a “standard error” band around the
score. This error band can then be used in a t-test in order to assess whether the overall
score of the school is significantly different statistically from that of another.
The results of these school-by-school comparisons for each top-fifty ranking (i.e.,
business and education) are shown in Tables 4 and 5. In each table, schools are ordered by
their US News and World Report overall score across the heading and down the rows (the
TABLE 4. Business school ratings in 2001 as per the US News and World Report methodology
452
M. CLARKE
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
453
TABLE 5. Education school ratings in 2001 as per the US News and World Report methodology
454
M. CLARKE
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
455
456
M. CLARKE
overall score is located in parentheses after the name of each school). One must read across
the row for a school in order to compare its performance with that of the schools listed in
the heading of the chart. The symbols indicate whether or not the overall score of the school
in the row is significantly higher than that of the comparison school in the heading (arrow
pointing up), significantly lower than that of the comparison school in the heading (arrow
pointing down), or if there is no statistically significant difference between the two schools
(shaded cell with circle). A blank diagonal represents the comparison of a school against
itself.
Regarding Tables 4 and 5: The Jackknife Technique
A regression model is substituted for the US News and World Report formula by using the
overall scores for schools in a ranking as the outcome variable and the indicators as the
predictor variables (this process basically replaces one linear model with another). The
jackknife procedure then removes one indicator at a time from the regression model,
recalculating the overall score for each school with the remaining indicators before
replacing the indicator and repeating the procedure. The jackknife standard error for a
school is obtained from these values, using the following formula (Efron and Tibshirani,
1993):
sêjacknife ⫽
冑n ⫺n 1 冘冉ˆ ⫺ 冘ˆn 冊 ,
(i)
(2)
(i)
where n is the number of regression models to be estimated and ˆi is the predicted score
for a school from the ith regression model with one indicator removed.
This standard error can be used in a t-test to assess whether the score of one school is
significantly different statistically from that of another. In order to control for the increased
probability of Type I error (finding a significant difference when there is none) owing to
the number of comparisons being made, the Bonferroni method for multiple comparisons
is used (see Glass and Hopkins, 1996).
If there were no errors around the overall scores for schools, Tables 4 and 5 would
consist only of arrows pointing up and down, except for instances in which two schools
have the same overall score and are tied for rank. Such is not the case, as evidenced by the
amount of shaded area in each table. The fact that there is more shaded area in Table 5
means that it is harder to find “real” differences between the overall scores for schools of
education since the scores of these schools can change quite a bit depending on the
indicators and weights being used. The fact that there is less shaded area in Table 5 means
that it is easier to find “real” differences between the overall scores for business schools
since the scores of these schools tend to be fairly stable. Even so, it is evident that there
are far fewer “real” differences between business schools than is implied by the overall
score attributed by the US News and World Report.3
CONCLUSION
Any effort at creating an academic quality ranking system must grapple with the difficulty
of trying to quantify the intangibles of a set of complex teaching, learning, resource, and
3
While the grouping patterns in Tables 4 and 5 are an artifact of the number of schools included in the analyses
(e.g., if data for all 341 business schools surveyed were included, slightly different groupings would result), it illustrates
the “false precision” of the overall score produced using a weight-and-sum approach.
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
457
research phenomena. It is evident that the choice of indicators is a non-arbitrary process that
should be guided by, among other things, a knowledge of the strengths and limitations of
the indicators being considered as well as their validity, reliability, and comparability for
the schools or programmes to be ranked. The choice of a method for presenting this
information in ranked format must be guided by, among other things, an understanding of
the nature of quality in the schools or programmes being ranked as well as the relationships
among the quality indicators that are to be used. In addition to these guidelines, the
following questions are offered as a way to think about the methodological issues raised at
each stage of the ranking process:
What is Being Ranked?
Are institutions, departments, or programmes being ranked? Depending on the answer,
not only will the choice of indicators vary; so too will the unit of measurement.
For instance, if graduate schools of business are being ranked, then the indicators used
should reflect information on graduate schools of business, and not on the institutions in
which they are located. Thus, a student–faculty ratio indicator should be based on
information for the business school and not for the institution as a whole. In addition, it
should be recognized that all schools in a particular area may not be similar and therefore
should not be “lumped” into the same ranking. For example, among schools of education
there are schools the mission of which is primarily teacher training, and others, the primary
mission of which is research. Combining both kinds of schools into the same ranking may
be very misleading in terms of showing their relative quality. Two separate rankings would
be better.
Why is it Being Ranked? Who is the Intended Audience?
The reasons for rankings are many. For example, the purpose of the ranking may be to
inform, to act as a spur for improvement, or to provide benchmarks. The audience for
rankings also varies, and may include students, parents, institutions, or the general public.
Depending on the reason for a ranking and its intended audience, the choice of indicators
and approach will vary. For instance, a college ranking that is meant to help students decide
where to go to school will most likely be different from a ranking that is meant to provide
information to college administrators or to higher education policy-makers.
What Can I Do to Improve the Quality of My Indicators?
In addition to the above guidelines, information utility and reliability can be improved by
using multi-year data. Doing so tends to reduce any anomalies (e.g., spikes or dips) in the
performance of an institution that may throw off its ranking. Using score ranges (e.g.,
percentile ranges) rather than point estimates (e.g., averages) can also provide a better
picture of the spread of performance in a particular institution or programme.
Indicator quality may also be adversely affected if institutions manipulate the information
being used. For instance, schools may inflate their application pools in order to look more
selective, thereby improving their performance on a selectivity indicator. Standardized data
collection, processing, and reporting techniques can reduce the occurrence of some of these
problems. However, the bigger issue to be addressed may be how to reduce the pressures
that institutions feel to do well in rankings.
458
M. CLARKE
How Will I Present the Information to My Audience?
The choice of a specific ranking methodology is dependent on the particular context and
goals of the person or group doing the ranking. Nonetheless, the usefulness of the ranking
will be increased if the chosen methodology allows the values and needs of the eventual
user to be incorporated into the outcome. For example, the indicator information could be
made available on a Web site that allows the user to decide on a formula for creating a final
ranked list of schools. In addition to allowing the user to specify the indicators and their
relative importance in the final outcome, there could also be an option that allows the user
to specify a minimum required performance level on some or all of the indicators.
How Often Will I Present the Information to My Audience?
Rankings can be as frequent or as infrequent as one wishes. However, since institutions tend
to be stable rather than unstable systems, and since their quality tends to change slowly over
time, it is doubtful that annual rankings offer any educational value to the average
consumer.
How Often Will I Change My Approach?
Rankings need to be responsive to changes in education. Thus, as education changes, so too
should the indicators used to represent it. The rise of distance learning and classroom-based
technology are examples of developments that have also changed some aspects of how we
think about and measure academic quality. Nonetheless, it is useful to maintain a subset of
ranking indicators that are constant across years so that users can gain some sense of the
degree of stability in relative institutional quality over time.
As a final consideration, it should be remembered that transparency is essential to the
success of any ranking system. Thus, the openness of the process in terms of how the
indicators were chosen, the approach taken to present this information in ranked format, and
access to the original data should always be maintained.
REFERENCES
CAMILLI, G., and FIRESTONE, W. A. “Values and State Ratings: An Examination of the State-by-State
Education Indicators in Quality Counts”, Educational Measurement: Issues and Practice 18 4
(2000): 17–25.
CANTOR, G. “Universities Increasingly Question the Criteria of College Rankings”, The Detroit News
(1 December 1996): 3.
CASPER, G. “Letter to the Editor of US News and World Report” (23 September 1996), retrieved
17 February 1999 from ⬍ http://www-portfolio.stanford.edu:8050/documents/president/961206gc
fallow.html ⬎ .
CLARKE, M. “Quantifying Quality: What Can the US News and World Report Rankings Tell Us about
the Quality of Higher Education?”, Education Policy Analysis Archives 10 16 (2002a), ⬍ http://
epaa.asu.edu/epaa/v10n16/ ⬎ .
CLARKE, M. “News or Noise: An Analysis of US News and World Report’s Ranking Scores”,
Educational Measurement: Issues and Practice 21 4 (2002b): 39–48.
EFRON, B., and TIBSHIRANI, R. J. An Introduction to the Bootstrap. New York: Chapman and Hall,
1993.
Evaluation News 2 1 (February 1981): 85–90.
GARRETT, G. “Our Method Explained”, US News and World Report Best Graduate Schools 2003
Edition. Washington, D.C.: US News and World Report, 2002, pp. 34, 35.
GUIDELINES FOR ACADEMIC QUALITY RANKINGS
459
GLASS, G. V., and HOPKINS, K. D. Statistical Methods in Education and Psychology, 3rd edn. Needham
Heights, Massachusetts: Simon & Schuster, 1996.
HATTENDORF, L. C., ed. Educational Rankings Annual. Detroit: Gale Research, 1993.
LINN, R., ed. Educational Measurement, 3rd edn. Washington, D.C.: American Council on Education,
1993.
LOMBARDI, J., CRAIG, D., CAPALDI, E., GATER, D., and MENDON, S. The Top American Research
Universities. Gainesville, Florida: The Center, University of Florida, 2001.
MORSE, R. J., and FLANIGAN, S. M. “How We Rank Schools”, US News and World Report America’s
Best Colleges 2002 Edition. Washington, D.C.: US News and World Report, 2001, pp. 67–70.
SANOFF, A. “Rankings Are Here to Stay: Colleges Can Improve Them”, Chronicle of Higher
Education 45 2 (4 September 1998): A96.
SCRIVEN, M. Evaluation Thesaurus, 4th edn. Newbury Park: Sage, 1991.
SEAMAN, B. “What Makes a Good College”, Time 152 17 (26 October 1998): 1–2.
WEBSTER, D. S. Academic Quality Rankings of American Colleges and Universities. Springfield:
Charles C. Thomas, 1986.