13 Social Choice and Rank Aggregation

13 Social Choice and Rank Aggregation
You are asked by the head of the committee organizing this year’s end of year
party at your university to design a voting protocol with which to allow students
to decide the entertainment and menu for the event. Which of many possible
designs should you use? Separately, a news aggregator platform approaches you,
looking for a new way to use feedback to rank new stories.
Social choice is the problem of how to aggregate the preferences of members of a population in order to select an alternative, or a single rank order on alternatives, where this
outcome is relevant in some way to the population. The election of public officials provides
an every-day example. Based on the votes placed on candidates, a voting rule is used to determine the winner. In the simple case, the plurality rule elects the candidate with the most
votes, but more complicated rules are used when votes are placed on multiple candidates.
Part of what makes social choice interesting is that there is no easy sense of a best voting
rule. Rather, there are many di↵erent properties (or axioms) that we might want for a rule,
reflecting concerns about fairness and a desire to respect consensus opinions, and di↵erent
rules satisfy di↵erent combinations of axioms. In fact, a famous impossibility theorem
(Arrow’s) states that a particular combination of axioms are only satisfied by dictatorial
rules that just go with the opinion of the same person all the time.
Computational considerations also play a role. Some rules are intractable, and cannot
be solved in worst-case polynomial time in the number of alternatives (unless P=NP),
while other rules are simple to compute. Even simple rules such as plurality become hard
in domains where the set of alternatives have a combinatorial structure; e.g., when an
alternative represents a decision about each of many issues. Just representing preferences
is hard in these domains, because of the large number of alternatives.
We will mainly ignore strategic considerations. Indeed, there’s a general sense in which
the only truthful voting rules are dictatorial. One special case that allows for interesting
truthful rules is when preferences are single peaked, meaning that there’s a way of lining up
alternatives so that every agent’s preferences increase to a best choice and then decrease. We
will also explore whether computational complexity can be helpful in this regard, making
it hard to find a beneficial misreport.
Rank aggregation is a more general setting, allowing the data to come from di↵erent
sources and not just reflect votes, and without needing to make a decision for a population
based on their collective viewpoints. Rank aggregation problems occur in many di↵erent
domains, for example:
• Sports: Determine an overall rank order for players in a league based on the outcomes
of pairwise match-ups.
• Online ratings: Based on rank orders provided by people about di↵erent subsets of
products or services, determine an overall rank order.
323
13 Social Choice and Rank Aggregation
In contrast to the axiomatic approach to social choice, where the emphasis is placed on
striking a fair balance between the di↵erent, subjective opinions of participants, we adopt a
statistical approach for rank aggregation. The input is viewed as noisy data about a true,
underlying rank order, and the goal is to find an aggregate ranking that best agrees, in a
sense to be made precise, with the data.
In outline, Section 13.1 focuses on social choice, introduces various axiomatic properties and various voting rules, provides characterization results, and discusses computational
and strategic considerations. Section 13.2 introduces the statistical approach to rank aggregation. We consider three di↵erent parametric ranking models, each making di↵erent
commitments about the way the input data are generated. Section 13.3 concludes with
brief remarks about applications of social choice and rank aggregation.
13.1 Social Choice
In social choice, there is a set A of alternatives, and a set of N = {1, . . . , n} agents. Alternatives are denoted A = {a, b, c, . . .}, and there are m 2 altogether.
Each agent i 2 N has a strict preference order i 2 P on alternatives, where P is the
set of all possible strict preference orders. We adopt strict preference orders to keep the
presentation simple. This is not crucial, and the social choice rules discussed in the chapter
can be generalized to handle weak preferences.
Given n agents, then = ( 1 , . . . , n ) 2 P n is the preference profile, and represents the
preferences of each agent. We consider two variations on the problem of social choice:
• Social choice rule: Given a preference profile
2 P n , select an alternative f ( ) 2 A.
• Social ranking rule: Given a preference profile
2 P n , select a rank order R( ) 2 P .
To be completely general, these rules can be defined to return sets of alternatives, or rank
orders, in the case of ties. But we will be able to avoid this detail for most of our discussion.
Social choice rules, which select an alternative, are appropriate, for example, in the context
of an election where the goal is to elect a single candidate. Social ranking rules, which
selects a rank order, are appropriate, for example, in the context of a search committee who
is meeting to form a consensus ranking on job candidates, perhaps to try to go down the
list and hire the best candidate available.
When the distinction between a choice rule and a ranking rule does not matter we simply
refer to a voting rule. Note that voting rules in which votes are placed on a single candidate
can be modeled by insisting that the choice rule or ranking rule ignores all aspects of
preferences except the top choices.
To get started let’s consider a setting with two alternatives. The usual social choice rule
on two alternatives is the majority rule:
Definition 13.1 (Majority rule). Given two alternatives, the majority rule selects the alternative that is ranked first by most agents.
In this case, as with most other rules, there can be an issue with handling ties. Ties can
be addressed in a number of ways, including through random tie breaking or an a priori
fixed order on alternatives or voters. For the theoretical results we generally require ties to
324
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
Figure 13.1: The weighted majority graphs for Examples 13.1 and 13.2 (ignoring negativeweight edges).
be handled by returning the set of tied alternatives. A similar approach can be taken when
ties occur in social ranking rules.
For two alternatives it is hard to imagine a more sensible rule than majority and indeed,
all voting rules that we study in this chapter reduce to the majority rule when there are
two alternatives and an odd number of participants. See Exercise 13.2.
For three alternatives, a simple generalization is to select as the winner the alternative
that defeats every other alternative by simple majority. Consider the following example.
Example 13.1. The preference orders of three agents on three alternatives {a, b, c} are:
1:
a, b, c
2:
b, c, a
3:
c, b, a
For example, agent 1’s preference order is a 1 b 1 c. Considering the implied pairwise
preferences, alternative b defeats a, b defeats c, and c defeats a. Based on this, alternative b
would be selected, and in addition, this suggests a social rank order of b R c R a (adopting
R to denote the consensus ranking).
But the following example shows a fundamental problem with the approach:
Example 13.2. The preference orders of three agents on three alternatives {a, b, c} are:
1:
a, b, c
2:
b, c, a
3:
c, a, b
There is now a cycle of pairwise comparisons, with alternative b defeating c, c defeating
a and a defeating b, and every alternative is defeated by some other alternative.
Figure 13.1 illustrates the preference profiles in these two examples through a weighted
majority graph (WMG). The vertices correspond to the set of alternatives. For pair of
alternatives j and k, there is a directed edge from j to k with weight wj,k = #(j, k) #(k, j)
equal to the number of agents who prefer j to k, #(j, k), minus the number who prefer k to
j, #(k, j). For visual ease we normally drop negative edge weights when drawing a WMG.
Pairwise majority cycles such as the one in Figure 13.1 (b) occur in practice. For example,
one study of the voting patterns in Olympic figure skating contests identified fifteen cycles
in the voting data from twenty-four competitions.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
325
13 Social Choice and Rank Aggregation
13.1.1 Voting Rules for Multiple Alternatives
In this section, we introduce some examples of voting rules that fill the gap left by the
problem in generalizing majority to more than two alternatives. We describe social choice
rules (i.e., selecting an alternative) rather than ranking rules, but simple variations will
generate a consensus rank order. See Exercise 13.2.
The simplest social choice rule for multiple alternatives is the plurality rule:
Definition 13.2 (Plurality). The plurality rule selects the alternative that is ranked first
by most agents.
Plurality is used in many government elections around the world. The following is a more
elaborate variation:
Definition 13.3 (Plurality-with-elimination). The plurality-with-elimination rule works in
rounds. In each round, it eliminates the alternative that is ranked first by the fewest number
of agents, and continues with preferences on the remaining alternatives. This repeats until
one alternative has a simple majority (is ranked first by more than half the votes.)
Also called instant-runo↵ voting, or alternative vote, this method is used to elect the
Australian House of Representatives and the President of Ireland. The family of singletransferable vote (STV) procedures generalize plurality-with-elimination to settings where
multiple alternatives are selected, and are used by The Academy of Motion Pictures Arts
and Sciences for nominating candidates for awards and for many elections in the Republic
of Ireland.
Another example of a voting rule is the Borda rule:
Definition 13.4 (Borda). For each vote, the Borda rule on m alternatives assigns a score
of m 1 to the top alternative, m 2 to the second, . . ., down to zero points for the last
alternative. The alternative selected is the one with the maximum total score, summing up
across all votes.
Example 13.3. Suppose there are three alternatives {a, b, c}, and twenty-one agents, with
the following votes:
7 @ a, c, b
7 @ b, c, a
6 @ c, b, a
1 @ a, b, c
Notation 7 @ a, c, b indicates that 7 votes are for preference order 0 : a, c, b. The
plurality rule selects a (with 8 votes in total), even though a majority of voters prefer either
b or c over a. Plurality-with-elimination drops c in the first round, since it has the least
number of first-ranked votes. This leaves preferences 8 @ a, b and 13 @ b, a, and b is
selected by simple majority in the second round. The Borda rule assigns scores of 16 = 8 · 2,
21 = 7 · 2 + 7 · 1 and 26 = 6 · 2 + 14 · 1 to a, b and c respectively, and selects c. Three rules,
three di↵erent outcomes!
This example reveals a shortcoming of plurality, which elects a because the voters who
prefer either b or c to a divide their support across these other alternatives. Figure 13.2 (a)
shows the WMG. Even plurality-with-elimination seems problematic, since it fails to select
alternative c even though it defeats both a and b in pairwise match-ups.
The following example shows that the Borda rule can also su↵er this difficulty:
326
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
Figure 13.2: The weighted majority graphs for Examples 13.3 and 13.4 (ignoring negativeweight edges).
Example 13.4. Suppose there are three alternatives {a, b, c} and seven agents, with the
following votes:
3 @ a, b, c
2 @ b, c, a
1 @ b, a, c
1 @ c, a, b
See Figure 13.2 (b) for the WMG. In this example, alternative a defeats both b and c
in pairwise match-ups, but Borda assigns scores of 8 = 3 · 2 + 2 · 1, 9 = 3 · 2 + 3 · 1 and
4 = 1 · 2 + 2 · 1 to a, b and c, and selects b as the winner.
There are many other voting rules. We make brief mention of range voting (sometimes
called score voting):
Definition 13.5 (Range). In range voting, each participant can assign a number of points
from a specified range, such as integers 1 to 5, to each alternative, and the alternative
selected is the one with the maximum total score, summing up across all votes.
Approval voting is the special case in which each participant can vote for as many alternatives as he likes (equivalent to assigning one point to each of the approved alternatives).
These rules do not fit easily into the formal framework of this chapter, because the inputs
are points or approvals, rather than preference orders. Moreover, strategic considerations
must be addressed when analyzing these rules because there is no direct meaning to “voting
truthfully.” How should points be associated to candidates, and should all candidates better
than the median candidate be approved, or just to the top candidate?
13.1.2 Axiomatic Properties
In this section, we introduce a number of properties that a voting rule might satisfy and
look to develop a principled design framework. The properties are considered axioms, in
that we view them as criteria that most people would consider to be reasonable properties,
and properties that should be satisfied by voting rules.
Many axiomatic properties have been proposed. Describing the axioms for social ranking
rules (rather than choice rules), we start with the following three:
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
327
13 Social Choice and Rank Aggregation
• Unanimity: Given preference profile , if every agent i 2 N ranks a
tives a and b, then a R b in the social rank order.
i
b, for alterna-
• Independence of irrelevant alternatives (IIA): If the relative preference for a and b is
unchanged for all agents at preference profiles
and 0 , then the social rank order
0
between a and b is unchanged at and .
• Non-dictatorial: There is no agent i for whom the social ranking R( ) =
preference profile 2 P n .
To illustrate the IIA axiom: it requires that if profiles and
agent 1, and agent 1’s preference switches from a 1 b 1 c to a
rank order between a and b and a and c should be the same at
a severe difficulty with unanimity and IIA:
0
i
for every
are the same except for
c 01 b, then the social
and 0 . In fact, there is
0
1
Theorem 13.1 (Arrow’s impossibility theorem). If there are three or more alternatives,
then any social ranking rule that satisfies unanimity and IIA is dictatorial.
The result can be proved through a similar line of reasoning to the Gibbard-Satterthwaite
impossibility theorem, a version of which we proved in Chapter 9, and which we will see
again later in this chapter.
We see that some apparently reasonable axioms cannot be achieved at the same time.
Moreover, given that there can be little doubt about the importance of the non-dictatorial
and unanimity properties, we should expect that voting rules of practical interest will fail
IIA. Exercise 13.1, for example, develops a failure of IIA for plurality-with-elimination.
Now that we know what is not possible, let’s return to looking for axiomatic support
for the voting rules that have been described so far. For this, we introduce the class of
positional-scoring rules, which includes plurality and Borda:
Definition 13.6 (Positional-scoring). For each vote, a positional-scoring rule on m alternatives assigns a score of ↵j for the alternative ranked in jth place by the vote, with
↵1
↵2
...
↵m
0, and ↵1 > ↵m . The alternative selected is the one with the
maximum total score, summing up across all votes.
For example, plurality is a positional-scoring rule with scores h1, 0, . . . , 0i. The Borda
rule is a positional-scoring rule with scores hm 1, m 2, . . . , 0i. The scores are defined
by the voting rule, not by the voters. The family of positional-scoring rules are exactly
characterized by six axiomatic properties, described here for social choice rules. In defining
these, remember that in the case of ties we can think about a rule such as positional-scoring
as returning the set of tied alternatives. The axiomatic properties are:
• Neutrality: Given preference profile
identity of the alternatives.
, the social choice is invariant to changing the
• Anonymity: Given preference profile
identity of the agents.
, the social choice is invariant to changing the
• Weak monotonicity: If alternative a is selected on preference profile , then a is
selected on profile 0 where a is moved up in the preference order in one or more
votes.
328
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
• Non-constancy: The social choice rule does not select the same alternative for all
profiles.
• Consistency: Given two sets of voters with preference profiles and 0 , respectively,
if the set of alternatives selected by a choice rule intersects on
and 0 , then the
alternatives selected after the votes are combined must be this intersection.
• Continuity: If alternative a is selected on preference profile from some set of voters,
then this set of voters can be replicated a finite number of times such that a is selected
when the voters are merged with a second set of voters.
Neutrality and anonymity are symmetry properties, and speak to the basic fairness properties of a social choice rule. They preclude using identity of alternatives or agents to
break ties, and necessitate returning the set of tied alternatives unless we are willing to use
randomization. Weak monotonicity says that ranking an alternative higher can only help.
Consistency and continuity rule out di↵erent forms of interactions that don’t seem sensible
when sets of votes are combined in di↵erent ways.
Theorem 13.2. A social choice rule satisfies anonymity, neutrality, weak monotonicity,
non-constancy, consistency, and continuity if and only if it is a positional-scoring rule.
Exercise 13.2 develops the direction of this result that shows that positional-scoring rules
satisfy these six properties.
13.1.3 The Condorcet Criterion
We now return to thinking about pairwise majorities, and define a property that captures
the idea that a voting rule should choose an alternative that defeats every other alternative
in pairwise elections. This is the Condorcet criterion, named after the French philosopher
who introduced the idea. We first state the property for social choice rules:
• Condorcet criterion: (choice) Given preference profile , if there is an alternative a
for which a strict majority of the voters prefers a to every other alternative, then a is
selected as the alternative.
Such an alternative, when it exists, is called a Condorcet winner. We can also state the
parallel concept for a social ranking rule:
• Condorcet criterion: (ranking) Given preference profile , then there should be no pair
of alternatives j and k that are adjacent in the social rank order · · · R j R k R · · ·
for which a strict majority of voters prefer k over j.
In particular, if there is a Condorcet winner, then this insists that the winner is ranked
in the top position in the social rank order. In general, it precludes local adjacencies in the
social ranking that are inconsistent with the majority ordering.
Examples 13.3 and 13.4 show that plurality, Borda, and also plurality-with-elimination,
fail the Condorcet criterion. In fact, no positional-scoring rule has this property:
Theorem 13.3. No positional-scoring rule satisfies the Condorcet criterion on domains
with three or more alternatives.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
329
13 Social Choice and Rank Aggregation
Proof. We proceed by case analysis on the scores:
(Case 1) ↵2 > ↵m . Consider a preference profile with 2k + 1 voters (for some k > 1),
with k + 1 @ a, b, others, and k @ b, others, a, where others indicates the alternatives
A {a, b} in any order. For any choice of k, alternative a is the Condorcet winner because
k + 1 of 2k + 1 voters prefer it to every other alternative. But the scoring rule assigns score
(k + 1)↵1 + k↵m to a and (k + 1)↵2 + k↵1 to b, and if:
(k + 1)↵2 + k↵1 > (k + 1)↵1 + k↵m
, k(↵2
↵ m ) > ↵1
↵2 ,
then alternative b has a greater score than a, and the rule fails the Condorcet criterion. For
any value of (↵1 ↵2 ), this is satisfied for a large enough k, since ↵2 > ↵m by assumption.
(Case 2) ↵2 = ↵m . We must have ↵1 > ↵m , and so ↵1 > ↵2 = . . . = ↵m . But the behavior
of any positional scoring rule with these scores is identical to plurality, and Example 13.3
shows that plurality is not Condorcet consistent.
This negative result raises the question as to whether there are useful voting rules that
satisfy the Condorcet criterion. One simple trick would be to first check to see if there is a
Condorcet winner, and select this alternative if there is, and otherwise follow another rule
such as Borda (this combination is Black’s rule. But we’d prefer a more elegant approach.
Our first Condorcet method comes from selecting, as the winning alternative, the alternative that is “as close as possible” to being a Condorcet winner. For this, we define the
following distance metric on rank orders:
Definition 13.7 (Kendall tau rank distance). The Kendall tau rank distance between two
rank orders is the total number of rank disagreements over all unordered pairs.
For example, the Kendall tau distance between rank orders b i e i d i a i c and
b 0i a 0i e 0i d 0i c is two, since the orders disagree on pairs {a, e} and {a, d} but no other
pairs. The Kemeny rule chooses a consensus ranking that minimizes the sum total Kendall
tau rank distance to the preference order of each voter.
Definition 13.8 (Kemeny). Given preference profile
= ( 1 , . . . , n ), the Kemeny rule
selects as the social ranking a preference order R that minimizes the Kendall tau rank
distance between R and i , summed over all agents i 2 N . When used as a social choice
rule, the alternative selected is the top alternative in the social rank order R .
Let Pairs denote the set of all unordered pairs of alternatives. For pair of alternatives j
and k, let nagg (j, k, 0 ) and ndis (j, k, 0 ) denote the number of votes that agree and disagree,
respectively, with rank order 0 2 P .
P
By construction, the Kemeny rule minimizes {j,k}2Pairs ndis (j, k, 0 ); i.e., the total pairwise disagreement between the consensus P
ranking and the preference orders of each voter.
Equivalently, the Kemeny rule maximizes {j,k}2Pairs nagg (j, k, 0 ).
Because Kemeny minimizes disagreements and also maximizes agreements, the Kemeny
rank order also maximizes,
X
X
scoreK ( 0 ) =
(nagg (j, k, 0 ) ndis (j, k, 0 )) =
wj,k ,
(13.1)
{j,k}2Pairs
330
e=(j,k)2E s.t. j
0k
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
Figure 13.3: (a) The weighted majority graph for Example 13.5. (b) A set of edges that
correspond to a rank order (in this case a 0 b 0 c).
across all rank orders 2 P , where scoreK ( 0 ) is the Kemeny score for rank order 0 and
wj,k is the weight on edge (j, k) in the weighted majority graph.
The use of the WMG to determine the outcome of the Kemeny rule is illustrated in the
following example.
Example 13.5. Suppose there are three alternatives {a, b, c} and sixty agents, with the
following votes:
23 @ a, b, c
17 @ b, c, a
10 @ c, a, b
8 @ c, b, a
2 @ b, a, c
Figure 13.3 (a) illustrates the WMG. There is no Condorcet winner, with each alternative
defeated by another alternative in a pairwise match-up. The Kemeny score for each of the
six possible social rankings R 2 P is:
a
b
20
R
R
c
a
c
-28
R
R
b
b
R
a
8
R
c
b
R
c
28
R
a
c
R
a
-8
R
b
c
b
-20
R
R
For example, for a R b R c and a R c R b, the Kemeny scores are 20 = 6 + 24
and 28 = 6 24 10 respectively. The Kemeny rule selects social ranking b R c
and the social choice is alternative b.
a
R
10
a,
Theorem 13.4. The Kemeny rule satisfies the Condorcet criterion.
Proof. Let R denote the ranking selected by the Kemeny rule. Viewed as a social ranking
rule, suppose for contradiction that there are two alternatives j and k, adjacent in R but
for which k has a strict majority over j. Ranking 0 that modifies R by switching the
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
331
13 Social Choice and Rank Aggregation
order of j and k has more total pairwise support and a higher Kemeny score. By the same
argument, if c is a Condorcet winner then it must be the top alternative of Kemeny rank
order R . This completes the proof.
We can precisely characterize the axiomatic properties of the Kemeny rule when used as
a social ranking rule. To be precise, we assume for this purpose that the rule returns the
set of rankings that minimize Kendall tau rank distance when the minimizer is not unique.
A ranking rule satisfies neutrality if the social rank order (or set) does not depend on the
identity of the alternatives. For consistency, we require that whenever the set of social rank
orders selected for each of two sets of votes intersect, then the rank orders selected after the
votes are combined into one set is equal to this intersection.
Theorem 13.5. The Kemeny rule is the unique social ranking rule that satisfies neutrality,
consistency, and the Condorcet criterion.
The chapter notes provide a reference for this result.
For this reason, and especially given the Condorcet criterion, the Kemeny rule has strong
theoretical support. But we will see a problem in the next section, which takes a computational viewpoint.
13.1.4 Computational Considerations
In this section, we consider some computational aspects of social choice, and in particular
the complexity of determining the outcome of a rule.
The outcome is easy to compute in most of the rules that we have considered in this
chapter. For example, for the Borda rule, we compute the total score for each alternative
and select the alternative with the maximum score. Computing the outcome in the Kemeny
rule, on the other hand, is intractable:
Theorem 13.6. The problem of determining the social rank order in the Kemeny rule is
NP-hard even with only four votes.
See the chapter notes for a reference to a proof of this result.
Although there is no difficulty with a large number of voters, this intractability presents
a barrier to the adoption of Kemeny in applications with a large number of alternatives.
There is no algorithm for Kemeny that runs in worst-case polynomial time in the number
of alternatives (unless P=NP).
At least for a small enough number of alternatives, the outcome of the Kemeny rule can
be determined in practically reasonable time via an integer program (IP) formulation. For
this, recall from (13.1) that Kemeny selects the rank order 0 that maximizes the total
weight on edges (j, k) for which j 0 k in the WMG.
We will formulate an IP, where decision variables corresponding to selecting an edge. By
including an edge between all pairs of vertices, and insisting that the subgraph formed by
the selected edges is acyclic, then this will define a unique rank order. Moreover, such a set
of edges exists for any rank order. For example, Figure 13.3 (b) illustrates a valid set of
edges, and corresponds to rank order a 0 b 0 c.
332
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
Example 13.6. Consider the WMG in Figure 13.3 (a). A valid subgraph is illustrated
in Figure 13.3 (b), and consists of edges (a, b), (b, c) and (a, c). There is an edge between
every pair of alternatives, and it is acyclic. The corresponding rank order a 0 b 0 c, and
the total weight is 6 + 24 10 = 20. On the other hand, a subgraph that consists of edges
(a, b), (b, c) and (c, a) is invalid because it is not acyclic. In particular, there is no rank order
that corresponds to this subgraph.
Let decision variables yj,k , for every j, k 2 A, with j 6= k, indicate whether or not directed
edge (j, k) is selected in the solution. Given this, we have the following IP to compute the
Kemeny rank order from the WMG:
X
max
wj,k yj,k
yj,k :j6=k
s.t.
j,k:j6=k
yj,k + yk,j = 1,
yj,k + yk,l + yl,j  2,
yj,k 2 {0, 1},
for all j, k, j 6= k
for all j, k, l, and j 6= k 6= l
for all j, k, j 6= k
The first group of constraints insist on an edge between every pair of alternatives. The
second group of constraints preclude 3-cycles. For example, they preclude y1,2 + y2,3 + y3,1 =
3, which would correspond to the cycle 1 ! 2 ! 3 ! 1. Even though we must prevent
cycles of any length, it is sufficient to include constraints to prevent 3-cycles. This is because
any cycle of length greater than three in a graph with an edge between every pair of vertices
must include a 3-cycle. This observation along with the graph-based representation of a
preference order is developed in Exercise 13.3.
Recall from Chapter 12 on combinatorial auctions that integer programs can be solved
reasonably e↵ectively via the branch-and-bound algorithm. In the case of computing the
Kemeny winner, this approach has been shown to scale to problems with up to twenty-five
alternatives (with 25! ⇡ 1.6 ⇥ 1025 possible orders) in less than five seconds, but struggles
with some very hard instances on more than forty alternatives. Interestingly, the problem
becomes harder to solve in practice when there is less agreement amongst voters as to the
best preference order.
A Tractable Condorcet Rule. The Schulze rule provides an attractive alternative to Kemeny, in that it is tractable and satisfies the Condorcet criterion. The Schulze rule also
satisfies many additional axiomatic properties, including some that are not satisfied by the
Kemeny rule.
Schulze’s rule is defined in terms of the WMG. Define the strength of a directed path from
alternative j to k as the minimum weight of all edges on the path, which can be negative.
Let S(j, k) denote the strength of the maximum strength path from alternative j to k; i.e.,
the best-path strength.
Definition 13.9 (Schulze). Given preference profile
= ( 1 , . . . , n ), the Schulze rule
selects as the winner an alternative j for which S(j, k) S(k, j) for all other alternatives
k; i.e., an alternative that defeats every other in best-path strength.
Although it is not obvious why this should be the case, a Schulze winner always exists,
we provide a reference for this result in the chapter notes. Most important for our purposes
are the following two results:
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
333
13 Social Choice and Rank Aggregation
Theorem 13.7. The Schulze rule is Condorcet consistent.
Proof. Let c denote the Condorcet winner. In this case, c is a Schulze winner because (i)
there is a positive strength path from c to any other alternative a (e.g., the weight on edge
from c to a is strictly positive); and (ii) every path from a to c has negative strength because
all in-edges to c have negative weight. Also, no other alternative, say d can be a Schulze
winner. This is because its best-path strength to c must be negative while the best-path
strength from c to d is positive.
Theorem 13.8. A Schulze winner can be found in polynomial time.
Proof. The first step is to construct the WMG, which takes time O(m2 ) for m alternatives.
The second step is to solve the all-pairs widest path problem on this graph, which can be
computed in O(m3 ) using a variant on the Floyd-Warshall all-pairs shortest path algorithm.
(The width of a path refers to what is described as the strength of a path in the description
of Schulze.) Finally, the winner can be selected in time O(m2 ), by comparing S(j, k) to
S(k, j) for every alternative j, and all other alternatives k. Altogether, the asymptotic
run-time is O(m3 ).
Example 13.7. Consider the profile from Example 13.5, with the WMG in Figure 13.3 (a).
The maximum strength paths between each pair of alternatives have strength:
S(a, b) = 6,
S(a, c) = 6,
S(b, a) = 10,
S(b, c) = 24,
S(c, a) = 10,
S(c, b) = 6
For example, for b to a the best path is (b, c) then (c, a) and has strength 10. Based on this,
the (unique) Schulze winner is alternative b because S(b, a) > S(a, b) and S(b, c) > S(c, b).
This is the same outcome as the Kemeny rule on this preference profile.
An example of an axiomatic property that is satisfied by the Schulze rule but not by the
plurality, Borda or Kemeny rules is independence of clones. For this, say that some subset
of alternatives form a set of clones if the alternatives are consecutive to each other in the
preference order of every voter. The property is:
• Independence of clones. If the social choice is a given preference profile then the rule
continues to select a or a clone of a when one or more sets of clones are introduced.
This robustness property provides protection against manipulation by the party who is
responsible for running a voting procedure, who might otherwise be able to create many
small variations on existing alternatives in order to change the outcome.
Combinatorial Voting. Let’s now consider a setting where the choice to make is about
a large number of interrelated decisions (or issues). For example, the choice might represent decisions about the location, design and research activities to go into a new science
building at a university, or the decisions that go into planning a social event. This combinatorial voting problem introduces new difficulties because the number of alternatives grows
exponentially in the number of issues. Consider the following example:
Example 13.8. Suppose that the social choice problem is to choose a menu for a holiday
party and there are two issues to decide:
334
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
CPT(Im )
b f v
CPT(Iw )
Im = b : re
Im = f : w
Im = v : ro
ro
ro
w
w
re
re
(a) The issue graph and CPT’s for a CP-net on two issues.
(b) The preference graph.
Figure 13.4: A CP-net for the menu-planning example.
• The choice of main course (issue Im ), which can be beef (b), fish (f ), or vegetarian (v).
• The choice of wine (issue Iw ), which can be red (re), white (w), or rose (ro).
There are nine possible alternatives, representing di↵erent pairs of choices, and A =
{b, f, v} ⇥ {re, w, ro}. For example, one alternative is beef and red wine, denoted (b, re).
For ` issues, each with three choices, the number of alternatives is 3` . For example, if
needing to decide also about the appetizer and desert then there would be 34 = 81 choices.
The computational challenges in combinatorial voting parallel those of combinatorial
auctions (see Chapter 12). In particular, two important questions are:
• Representation: How to design a voting language to allow participants to succinctly
express their preferences?
• Outcome determination: How to design tractable voting rules when the number of
alternatives grows exponentially with the number of issues?
One way to represent preferences on multiple interrelated issues is through a conditional
preference network (CP-net). This is succinct when the voter’s preferences on alternatives
can be decomposed into preferences on issues that only depend on the decisions on a small
number of other issues.
Definition 13.10 (Conditional preference network (CP-net)). A conditional preference
network consists of:
(1) a directed graph on issues, that represents the conditional independence properties between issues, and
(2) a conditional preference table (CPT) for each issue, that defines a preference ordering
for the issue that depends on the values assigned to the parent(s) of the issue but is
otherwise independent of other decisions.
A CP-net induces a directed graph on alternatives, with an edge from alternative a to b
if preference for a over b can be determined directly from one of the CPT’s. This is called
a preference graph. This is illustrated in the following example.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
335
13 Social Choice and Rank Aggregation
Example 13.9. Figure 13.4 (a) shows a CP-net, continuing the earlier menu planning example. The issue graph indicates an unconditional preference for the main course, whatever
the choice of wine. This is because issue Im has no parent. The preference is for beef over
fish over vegetarian. The preference for wine depends on the choice of main course, which
is the parent issue of wine. For example, CPT(Iw ) shows a preference for white wine over
rose over red wine if the main course is fish.
Figure 13.4 (b) shows the corresponding preference network. Alternative (b, re) is preferred to (b, ro) because of the first entry in CPT(Iw ). This is what we mean by “can be
determined directly” from a CPT. Similarly, the edges in the top, middle and bottom rows
of the graph come from the first, second and third entries in CPT(Iw ), respectively. In addition, alternative (f, w) is preferred to (v, w) because of CPT(Im ), which states a preference
for fish to vegetarian, for any fixed decision about wine.
The CPT for the main course makes no direct commitment about a preference order
between pairs of alternatives such as (b, ro) and (f, w). This is because the choice of wine
changes. The CPT only specifies a preference order when holding other choices constant.
Indeed, the preference graph need not completely specify a total rank order on alternatives.
For example, the constraints in the graph insist on (b, ro) (f, w), but make no commitment
to (f, re) (v, ro). This is developed more in Exercise 13.4.
Still, as long as the preference graph is acyclic then it encodes a partial order. For
example, if it includes edges a i b and a i c, then it allows rank orders a i b i c and
a i c i b. These total orders extend the partial order. A CP-net is a valid representation
of the preferences of agent i if i extends the partial order. We will focus on CP-nets with
acyclic issue graphs, which ensures that the preference graph is acyclic.
Not all preference orders have a CP-net representation. However, when a CP-net representation exists then it can be much more compact than writing down an explicit, total
order on all alternatives. Consider introducing another issue, say appetizer (Ia ) into the
example, and making this the parent of Im , so that the issue graph becomes Ia ! Im ! Iw .
The e↵ect of this is that the preference for wine depends on the choice of main course, but
fixing the choice of main course, is independent on the choice of appetizer. In this example, three small CPT’s would represent preferences on 33 = 27 alternatives, assuming three
choices for each alternative.
Let’s turn now to the design of a voting rule for a combinatorial voting problem. A useful
special case is when every agent’s preferences can be represented with a CP-net with the
same acyclic issue graph. In this case, we can adopt the following voting rule:
Definition 13.11 (Sequential voting rule on multiple issues). Assume that every agent’s
preference order can be represented by a CP-net with the same acyclic issue graph. The
sequential voting rule proceeds as follows:
1. Fix the structure of the issue graph, and receive a CP-net representation from each voter
that uses this issue graph.
2. Sort the issues so that no directed edge in the issue graph goes from a later issue to an
earlier issue in the order. This is called a topological ordering.
3. For the first issue in the topological sort, apply a voting rule to determine the choice for
the issue, considering the unconditional preference expressed by each voter on this issue.
Fix the choice made for this issue.
336
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
5 votes
4 votes
3 votes
CPT(Im )
b f v
CPT(Im )
f v b
CPT(Im )
v b f
CPT(Iw )
Im = b : re
Im = f : w
Im = v : ro
ro
ro
w
CPT(Iw )
w
re
re
Im = b : w
Im = f : w
Im = v : w
ro
ro
ro
CPT(Iw )
re
re
re
Im = b : ro
Im = f : ro
Im = v : ro
re
w
w
w
re
re
Figure 13.5: A preference profile in a combinatorial voting domain on two issues.
4. Continue for each issue in turn, applying a voting rule to determine and fix the choice
for the issue, considering the earlier decisions and the conditional preferences expressed
by each voter on this issue. A di↵erent voting rule can be adopted for each issue.
We illustrate this sequential voting procedure in the following example.
Example 13.10. Consider the preference profile in Figure 13.5, which provides the CPT’s
that go along with the issue graph in Figure 13.4 (a). The topological sort in this two issue
example is just Im and then Iw . This has the property that the edge (from Im to Iw ) in the
issue graph does not go from a later issue in the order to an earlier issue.
Given this, the sequential voting rule will make a decision for the main course and then
the wine. Let’s assume that the plurality rule is used to determine the main course and
Borda for the decision about wine. First, the agents vote on the main course, since every
agent has an unconditional preference for this issue. The plurality winner is beef, with 5, 4
and 3 votes for b, f and v respectively. This choice is fixed. Now the agents vote to choose
the wine given that the main course is beef. The vote profile is 5 @ re, ro, w, 4 @ w, ro, re,
and 3 @ ro, re, w. The Borda winner is rose wine, with Borda scores of 13, 8 and 15 for
re, w and ro respectively. The outcome is (b, ro). The menu will be beef with rose wine.
The idea behind sequential voting is that issues can be ordered such that every voter has
a well-defined preference on the next issue to be decided given the decisions already made.
This is a strong assumption, but leads to a simple rule when it is satisfied. The chapter
notes provide references to additional methods that have been developed for combinatorial
voting.
13.1.5 Strategic Considerations
In this section, we discuss some of the strategic considerations that come into play in social
choice. In particular, we ask whether a voter can improve the outcome in his favor by
misreporting his preferences.
There are a number of reasons why manipulation is undesirable. First, it can lead to
unfairness, because some participants may be better at finding good manipulations than
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
337
13 Social Choice and Rank Aggregation
others. Second, the time and resources required in determining how to manipulate is wasteful. Third, it becomes difficult to determine whether or not a rule has good properties
because this depends on the strategies adopted by participants. In particular, strategic
behavior can be extremely harmful, leading to paradoxes such as alternatives being selected
that are very unpopular according to true preferences being selected (see the chapter notes).
Following the approach of mechanism design, a social choice rule is strategyproof (or
truthful), if it is a dominant strategy for an agent to report his true preference profile
whatever the reports of others. In particular, fixing the reported preferences of others ˆ i =
( ˆ 1 , . . . , ˆ i 1 , ˆ i+1 , . . . , ˆ n ), then if a = f ( i , ˆ i ) 6= b = f ( ˆ i , ˆ i ) for preferences i of
agent i and some misreport ˆ i , then strategyproofness requires a i b. The agent prefers
the alternative selected when reporting his true preferences than under any misreport. The
majority rule on two alternatives provides a simple example of a strategyproof voting rule.
A social choice rule f is onto alternatives A if, for every a 2 A, there is some preference
profile
2 P n such that f ( ) = a. The Gibbard-Satterthwaite theorem (Theorem ??),
tell us that if there are at least three alternatives (|A| 3), then any social choice rule that
is onto A and strategyproof is dictatorial. This theorem is related to Arrow’s impossibility
result, through a close connection that can be established between IIA and strategyproofness.
The viewpoint looks bleak, then, for the possibility of truthful social choice rules. The
following example shows a useful manipulation of the Borda rule.
Example 13.11. Suppose there are four alternatives {a, b, c, d} and three agents, with the
following preference profile:
1:
b, a, c, d
2:
b, a, c, d
3:
a, b, c, d
The Borda rule selects alternative b, with scores of 7, 8, 3 and 0 for alternatives a, b, c and
d respectively. But agent 3 can deviate, and report preference order:
ˆ 3 : a, c, d, b
In this case, the Borda scores become 7 and 6 on a and b respectively, and alternative a
is selected, which is preferred to b under agent 3’s true preferences.
Even rules like plurality-with-elimination (or STV), which seem to help with the need for
strategic behavior (since votes for candidates that are generally unpopular are redistributed
to other candidates) are not strategyproof. See Exercise 13.5.
Single-peaked Preferences and the Median Rule. The Gibbard-Satterthwaite theorem
relies on the preference domain including all possible strict orders. One way to escape its
conclusion is to move to domains where some preference orders cannot exist.
A nice example is the domain of single-peaked preferences. Preferences are single peaked
when the alternatives can be lined up (e.g., from left to right), so that each voter’s preference
order has a single local maximum (or “peak”). For any pair of alternatives to the left or
right of the peak, the agent prefers the one closer to the peak:
338
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.1 Social Choice
Figure 13.6: Utility functions for three agents in two distinct domains. (a) Single-peaked
preferences. (b) Not single-peaked preferences.
Definition 13.12 (Single-peaked Preferences). A domain has single-peaked preferences if
there is some way to order alternatives such that, for any agent i and any pair of alternatives
a, b 2 A for which a i b, then either:
• a is the peak (most preferred), or
• a and b are on opposite sides of the peak, or
• a and b are on the same side, and a is closer to the peak than b.
It can be helpful to imagine that each alternative is associated with a position from left
to right, such as the amount of public expenditure on the arts, the time of a meeting, or
the location of a school along a road. For example, the preferences in Figure 13.6 (a) are
single peaked, while the preferences in Figure 13.6 (b) are not single peaked (for any way
of lining them up, see Exercise 13.6.)
Given a preference profile, a median peak is an alternative for which half or fewer of the
voters have peaks to the left and half or fewer of the voters have peaks to the right. For
example, in Figure 13.6 (a), the median is alternative b. Although we don’t see it here, note
that two peaks might be tied for the median.
Definition 13.13 (Median rule). In a domain with single-peaked preferences, the median
rule selects the alternative that is the median of voters’ most preferred alternatives.
As with other voting rules, ties can be broken at random or based on some external
preference such as preferring alternatives to the left.
Theorem 13.9. In a single-peaked preference domain, the median rule is strategyproof,
Pareto optimal, and satisfies the Condorcet criterion. In addition, there is a Condorcet
winner when there is an odd number of voters.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
339
13 Social Choice and Rank Aggregation
The proof of this theorem is developed in Exercises 13.5 and 13.6. For the Condorcet
criterion, let’s suppose c is the Condorcet winner and assume for contradiction that the
median rule selects a 6= c. Suppose WLOG that c is to the left of a. By the single-peaked
property, the voters with peak a or to the right of a prefer a to c, and since a is the median,
half or fewer voters have peaks to the left of a. Therefore it is impossible for a strict majority
to prefer c to a, and a contradiction.
For example, alternative b (the median choice) is the Condorcet winner in Figure 13.6 (a),
with a strict majority over alternatives a, c and d. In addition, agent 2 can only move the
choice of median to the left through a misreport, which is less preferred.
Complexity as a Barrier to Manipulation. In place of requiring that a rule is strategyproof,
and completely immune to manipulation, we can ask instead whether or not a voting rule
is resistant to manipulation, in the sense that it is difficult to find a useful manipulation.
This is a rare place where computational complexity can be helpful!
We are interested in hardness-of-manipulation properties for tractable voting rules. Consider the following problem:
Definition 13.14 (FindManipulation). Given a social choice rule, a set of votes, and
a preferred alternative a, the FindManipulation problem is to determine whether there
exists a vote that can be added so that a is the selected alternative.
A simple variation would consider a manipulator who wants to achieve as good an alternative as possible. The analysis of voting rules from the perspective of computation as
a barrier to manipulation becomes quite technical and we provide only a brief discussion,
focusing for the most part on positional-scoring rules.
The first set of results that we mention are negative, in that finding a beneficial manipulation is actually tractable for most rules, including most that we have discussed.
Theorem 13.10. The FindManipulation problem is tractable in positional-scoring rules
(e.g., plurality and Borda) and the Schulze rule, but intractable in plurality-with-elimination.
The difficulty with plurality-with-elimination requires the number of alternatives to be
allowed to increase. Exercise 13.5 develops an approach to find the optimal manipulation
in the Borda rule. We don’t mention the Kemeny rule here, because if the intractability
arises because determining the outcome of the voting rule itself is intractable then this is
uninteresting.
In the coalitional manipulation problem, multiple manipulators cast votes in order to try
to make alternative a win. Other rules become NP-hard to manipulate in this coalitional
variation on the problem, including the Borda rule. In particular, if we also consider weighted
voting rules, where each voter i (including the manipulators) has a weight ki 1, so that
his vote counts as ki unweighted votes, then we have the following result:
Theorem 13.11. When the manipulators are weighted, manipulation by a coalition in a
positional-scoring rule is tractable if the scoring rule is plurality, and NP-complete otherwise.
This resistance to manipulation by coalitions in weighted voting settings holds even the
number of alternatives is fixed and small.
340
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
A difficulty with this result is that NP-hardness is a worst-case concept, and does not
preclude the possibility that a fast algorithm can find a useful manipulation in most cases.
We would rather show that it is usually hard to find a manipulation, or almost never the
case that a manipulation is available. The following theorem provides an average-case result,
and makes a clear delineation between small coalitions and large coalitions:
Theorem 13.12. For every positional-scoring rule:
1. When the number of manipulators is small relative to the number of voters n ( n
where 0 < < 1/2), and preference orders are sampled IID from any distribution, then
the frequency of manipulability goes to zero in the limit as n goes to infinity;
2. When the number of manipulators is large relative to the number of voters n ( n
where 1/2 < < 1), there exists distributions for which preference orders are sampled
IID and the frequency of manipulability goes to one as n goes to infinity, and it is easy
to find a manipulation that succeeds with probability that goes to 1 as n goes to infinity.
The uniform distribution is a typical distribution that allows for manipulability with high
probability in the second part of the theorem. The chapter notes provide some references
to additional discussion on this topic of rules that resist manipulation.
13.2 Rank Aggregation: A Statistical Approach
In this section, we turn to problems of rank aggregation that are not specific to social
choice. In fact, rank aggregation problems arise all over the place, including in the following
settings:
• Sports and competitive games (e.g., ranking teams in a league, or gamers playing as
individuals or part of teams on Xbox Live).
• Ranking product quality based on purchase decisions by consumers that reveal a
preference across a set of products that they had viewed.
• Combining the responses from multiple crowdsourcing workers about the best translation for an article, the “top ten restaurants” lists of di↵erent food critics, or the
search results from multiple search engines into one combined set of results.
Given the breadth of possible applications, we are interested in flexibly handing data
that takes di↵erent forms, for example pairwise comparisons, ranked lists on some subset of
alternatives, and “top-5” lists.
In adopting a statistical approach, we imagine that the data are generated as noisy
samples according to a ground truth rank order. By taking this viewpoint, the problem
of rank aggregation is formulated as determining the ground truth that best “agrees” with
the data. Some problems of social choice can also be formulated as a search for ground
truth; e.g., selecting the best papers to accept to a conference program based on reviews
from members of the program committee, or judging an international diving competition.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
341
13 Social Choice and Rank Aggregation
Figure 13.7: Illustrating the Condorcet noise model for two alternatives a and b.
13.2.1 Handling Two Alternatives
We introduce the statistical approach to rank aggregation in a setting with only two alternatives, say a and b. The data are a sequence of observations, where each observation is either
“a b” or “b a.” For example, the data could represent the result of multiple match-ups
between the same two teams, or the opinions of multiple people about the relative quality
of two restaurants in town. Let’s describe a model for this pairwise comparison data:
Definition 13.15 (Condorcet model). The Condorcet model for pairwise comparisons on
two alternatives, a and b, has two parameters:
(i) an order
0,
either a
0
b or b
0
a, and
(ii) a probability p > 0.5.
The model defines a distribution on pairwise comparisons. The probability of comparison
2 P given parameters ✓ = ( 0 , p), is
⇢
p
, if = 0
Pr✓ ( ) =
(13.2)
1 p o.w.
An observed comparison agrees with “ground truth” order 0 with probability p, and
disagrees otherwise. Figure 13.7 illustrates the idea for the case of p = 0.6.
Given the a description of the data and a model for the data, the next step is to find
parameters that best explain the data. There are di↵erent ways to perform this estimation
step. For the most part, we will focus on finding parameters that maximize the probability
of the data. For this, it is convenient to work with the likelihood of parameters ✓, which is
L(✓; D), and any function that is proportional to the probability of the data for parameters
✓. In this way, maximizing likelihood is equivalent to maximizing probability (but allows
constants to be ignored).
Given data D, we estimate model parameters by solving the maximum likelihood estimation (MLE) problem. This finds parameters ✓ to solve,
max L(✓; D),
✓
(13.3)
and maximizes the likelihood (and by proportionality, the probability) of the data. For a
simple example, if the data are real numbers and the model is the normal distribution then
342
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
the maximum likelihood estimator for the mean is the sample mean, i.e., the average of the
observed values. See Exercise 13.7.
The data are a sequence of independent observations, D = ( 1 , . . . , n ). We have:
Pr✓ (D) = Pr✓ (
=p
nagg (
1 ) Pr✓ (
0)
(1
2 ) · · · Pr✓ (
p)
ndis (
0)
n)
(13.4)
,
(13.5)
which depends on the number of observations that agree with 0 , nagg ( 0 ), the number
that disagree, ndis ( 0 ). From this, we see that for any p > 0.5 (and so 1 p < 0.5), the
maximum likelihood estimator for parameter 0 is the order that maximizes the number
of agreements with the data. This maximizes pnagg ( 0 ) while simultaneously minimizing
(1 p)ndis ( 0 ) .
Now we have data, a model, and a method to estimate the order parameter 0 . We will
not need to estimate p for the purpose of rank aggregation.
To see this, let’s consider the final step, which is that of inference, and involves working
with the model to determine an aggregate rank order. A reasonable approach is to return
the order that is most probable given the parameters; i.e., the mode of the distribution. For
the Condorcet model, this is just 0 , since this occurs with probability p > 0.5.
Example 13.12. Let’s assume that the data are a
samples of “a b” and 40 samples of “b a.” For
model says that,
⇢ 60
p (1 p)40
Pr✓ (D) =
p40 (1 p)60
sequence of 100 observations, with 60
any probability p > 0.5, the Condorcet
, if a
, o.w.
0
b
For any p > 0.5, the maximum likelihood estimator for order parameter
because p60 (1 p)40 > p40 (1 p)60 . The result of rank aggregation is order a
is the majority vote.
(13.6)
0
is a 0 b,
R b, which
The result of this analysis can be captured in the following theorem:
Theorem 13.13 (Condorcet Jury Theorem). The majority rule on two alternatives is the
maximum likelihood estimator of the true social preference order given identically, independently informed agents with probability p > 0.5 of being correct.
This is named after Condorcet, who first proposed this probabilistic view point on the role
of majority decision rules in social choice, and was interested in developing non-axiomatic
approaches to social choice.
13.2.2 Handling Multiple Alternatives
The general approach that we adopt for statistical rank aggregation is illustrated in Figure 13.8. We start with a description of the data, then define a model, then estimate the
parameters of the model (estimation), and finally reason about the model that has been fit
to the data in order to determine an aggregate rank order (inference).
The choice of model is important. Models vary in regard to the assumptions that are
made about the data (e.g., independence properties) and also lead to estimation problems
with di↵erent degrees of tractability. We work with two fundamentally di↵erent kinds of
models, each of which makes a di↵erent commitment about the data:
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
343
13 Social Choice and Rank Aggregation
Figure 13.8: The general approach to statistical rank aggregation.
• Rank-order models. A rank-order model defines the probability for each possible total
order on the set of alternatives.
• Pairwise-comparison models. A pairwise-comparison model defines, for every pair of
alternatives, the probability for each of the two possible outcomes that can occur when
comparing the pair.
In the first model, the objects that are independently sampled according to the model
are total rank orders. In the second model, the objects that are independently sampled
according to the model are pairwise comparisons.
Rank-order models are appropriate, for example, if the domain is car racing and the
same drivers compete against each other every week, this race generating a rank order on
the drivers. Rank-order models are also appropriate in social choice domains. Pairwisecomparison models are appropriate, for example, if the domain consists of teams in pairwise
match-ups against each other, or food critics asked for their preference between di↵erent
dishes, with no requirement that the pairwise responses correspond to a total order and the
possibility of preference cycles.
Continuing, we define two rank-order models and one pairwise-comparison model.
Whereas the estimation problem in the first rank-order model is intractable, it can scale
much more successfully in the second rank-order model.
13.2.3 A First Rank-Order Model: The Mallows Model
In this setting, the data are a sequence of independent observations of total rank orders. The
model that we introduce generalizes the Condorcet model for two alternatives to multiple
alternatives.
Definition 13.16 (Mallows). The Mallows model for the rank order on a set of alternatives
A has two parameters:
(i) a rank order
0
on alternatives, and
(ii) a probability p > 0.5.
The model defines a distribution on rank orders. The probability of rank order
given parameters ✓ = ( 0 , p), is
Pr✓ ( ) =
344
1 nagg (
p
Z1
,
0)
(1
p)ndis (
,
0)
,
2 P
(13.7)
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
where Z1 is a normalization constant that ensures that the probabilities sum to one, and
nagg ( , 0 ) and ndis ( , 0 ) are the number of agreements and disagreements, respectively,
over all unordered pairs between rank orders and 0 .
One way to interpret the model is that a sequential process generates a rank order. For
every unordered pair of alternatives, j and k, we sample a comparison that agrees with
0 with probability p and disagrees otherwise. If this process results in no cycles then it
generates a rank order. Otherwise, the pairwise comparisons are discarded and the sampling
process repeats. The probability that this process generates rank order
is proportional
to pnagg ( , 0 ) (1 p)ndis ( , 0 ) , since all pairs must agree with .
Example 13.13. Suppose there are three alternatives {a, b, c} and the parameters are 0 :
b, a, c and p = 0.7. The Mallows model defines the following probability distribution on
rank orders:
a
b
c
b
a
a
c
c
b
1
0.7)(0.7)(0.7)
Z1 (1
1
(0.7)(0.7)(0.7)
Z1
1
0.7)(1 0.7)(1
Z1 (1
a
b
c
0.7)
c
c
b
1
0.7)(0.7)(1 0.7)
Z1 (1
1
(0.7)(1
0.7)(0.7)
Z1
1
(0.7)(1
0.7)(1 0.7)
Z1
b
a
a
For example, the first expression reflects agreement between the order and 0 on pairs
(a, c) and (b, c) and disagreement on pair (a, b). Straightforward calculations reveal that the
1
normalization constant is Z1 = 0.79
. Based on this, example probabilities are Pr✓ (a b
c) ⇡ 0.186 and Pr✓ (b a c) ⇡ 0.434.
Now that we have described the data and defined a model, let’s turn to the problem of
estimation. For this, it is convenient to express the probability of a rank order as,
P(
| ✓) =
1
Z1
Y
pyjk (
,
0)
(1
p)1
yjk ( ,
0)
,
(13.8)
j,k2Pairs
where yjk ( , 0 ) 2 {0, 1} indicates whether orders and 0 agree on pairwise comparison
j, k 2 A, where Pairs is the set of unordered pairs on alternatives A.
The data consist of a sequence 1 , 2 , . . . of independent observations, and we have:
Pr✓ (D) =
n
Y
Pr✓ (
i=1
i)
=
1
Z1
Y
pnagg (j,k,
0)
(1
p)ndis (j,k,
0)
,
(13.9)
j,k2Pairs
where nagg (j, k, 0 ) and ndis (j, k, 0 ) count agreements and disagreements, just as when
we defined the Kemeny rule. Looking to maximize likelihood (allowing us to drop the
normalization constant), and taking the log, we want parameters ✓ to maximize:
X
LL(✓; D) =
(nagg (j, k, 0 ) ln(p) + ndis (j, k, 0 ) ln(1 p)) ,
(13.10)
j,k2P airs
where LL(✓; D) denotes the log-likelihood of the data.
Now we see an interesting connection to the Kemeny rule:
Theorem 13.14. The Kemeny rule is the maximum-likelihood estimator for the rank order
parameter 0 of the Mallows model.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
345
13 Social Choice and Rank Aggregation
Proof. For any p > 0.5, we have ln(p) > ln(1 p), and the log-likelihood is maximized by the
rank order 0 that maximizes the number of pairwise agreements, or equivalently minimizes
the number of pairwise disagreements. The Kemeny rule selects a social rank order that
minimizes the total Kendall tau rank distance to each rank order in the preference profile,
and Kendall tau distance is the total number of pairwise disagreements.
The fourth step in the statistical approach is inference, and we can again return as the
aggregate rank the most probable rank order given the estimated parameters. We don’t
need to estimate p because the mode is 0 for any p > 0.5; from (13.7) we see that rank
order = 0 maximizes the probability because it agrees on all pairs.
In addition to providing a statistical justification for the Kemeny rule, this connection
provides axiomatic support for the Mallows model.
Example 13.14. Suppose the data follows the earlier profile in Example 13.5, with three
alternatives and sixty rank orders (see also Figure 13.3). Considering the total number of
pairwise agreements and disagreements in the data, then for rank order parameter a 0 b 0
c we have,
Pr✓ (D) =
1 33
p (1
Z1
p)27 p42 (1
p)18 p25 (1
p)35 .
For example, for pair a and b, 33 data points agree with a 0 b and 27 disagree, and for
pair b and c there are 42 data points agree with b 0 c and 18 that disagree.
In comparison, rank order parameter b 0 c 0 a yields,
Pr✓ (D) =
1 27
p (1
Z1
p)33 p42 (1
p)18 p35 (1
p)25 .
For example, for pair a and b there are now 27 data points that agree with b
that disagree. We know from the analysis in Example 13.5 that rank order b
the maximum likelihood estimator.
0
0
a, and 33
c 0 a is
Unfortunately, this connection back to Kemeny also highlights a severe difficulty with
statistical estimation in the Mallows model. The Kemeny rule is intractable, which means
that it is NP-hard to solve the MLE problem. This makes the Mallows model unsuitable
(at least with exact estimation) for problems with a large number of alternatives.
The difficulty stems from the way the parameter space is set-up. In particular, the rank
order parameter can take on any possible permutation on alternatives, which makes the
MLE problem combinatorial.
13.2.4 A Pairwise-Comparison Model: The Bradley-Terry Model
One way to think about the Mallows model is that the pairwise order for every pair is either
p or 1 p, depending on the ground truth rank order. The Bradley-Terry model relaxes
this, allowing the probability when comparing two alternatives to depend in a simple way
on the scores of each alternative, these scores providing a new parameterization.
We can assume that the true ranking is implied by the order of scores (higher being
better), which removes the combinatorial parameter of the model and makes the estimation
problem easier to solve.
346
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
The data are a sequence of independent pairwise comparisons for di↵erent pairs of alternatives. Some pairs of alternatives may be completely absent from the data. The model is
as follows:
Definition 13.17 (Bradley-Terry). The Bradley-Terry model for pairwise comparisons on
a set of alternatives A = {1, . . . , m} has m parameters:
j 2 R+ for each alternative j.
P
The parameters are normalized so that
j j = 1. The model defines, for any pair
of alternatives j, k 2 A, j 6= k, the probability that alternative j is preferred to k, given
parameters ✓ = ( 1 , . . . , m ), as
• a score
Pr✓ (j
j
k) =
j
+
.
(13.11)
k
The score j for alternative j quantifies how successful the alternative is, with alternatives
with high scores more likely to defeat other alternatives in pairwise comparisons.
Let’s turn now to the problem of estimation. It is convenient to divide the data into data
for each pair of alternatives. For alternatives j and k, data Djk are a sequence (perhaps
empty) of njk pairwise comparisons (where njj = 0 by convention).
Adopting the MLE approach, the probability of data Djk , for any pair of alternatives j
and k, and given that observations are independent, is
Pr✓ (Djk ) =
✓
j
j
+
k
◆#(j,k) ✓
k
j
+
k
◆#(k,j)
,
(13.12)
where #(j, k) is the number of times j defeats k. Given this, and recognizing that the
observations for one pair are independent of those from another pair, we have:
Pr✓ (D) =
m Y
m
Y
Pr✓ (Djk ) =
j=1 k=j
m Y
m ✓
Y
j=1 k=1
j
j
+
k
◆#(j,k)
,
(13.13)
where Pr✓ (Djj ) = 1 and #(j, j) = 0 by convention. Looking to maximize the likelihood,
and taking logs, we want parameters ✓ to maximize:
LL(✓; D) =
m X
m
X
(#(j, k) ln( j )
#(j, k) ln(
j
+
k )) .
(13.14)
j=1 k=1
P
Recall that we also require j j = 1.
One question that arises in solving the MLE problem, is whether there is a unique maximizer. This question has a nice answer, which we capture through the following property:
Definition 13.18 (All-pairs paths property). Consider a directed graph with an edge from
j to k if there is some comparison in the data where j is preferred to k. The all-pairs paths
property requires that, for all j and k, there is a directed path in the graph.
For a sports setting, we can think about this as requiring that each team has a “win path”
to every other team (team a has beaten a team that has beaten a team that...). It can be
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
347
13 Social Choice and Rank Aggregation
Algorithm 13.1: The MM algorithm for computing the maximum likelihood estimator of scores in the Bradley-Terry model.
Input: Pairwise comparison data on m alternatives, convergence tolerance, ✏ > 0.
begin
njk := number of comparisons between j and k
#(j):= total number of wins by j against other alternatives.
1
1
Initialize t := 1, scores (1) := ( m
,..., m
).
bool converged:= false
while not converged do
foreach alternative j 2 {1, . . . , m} do
✓
◆ 1
P
njk
(t+1)
:=
#(j)
.
k6=j (t) (t)
j
j
+
P
renormalize scores, so that
(t) || < ✏ then
if || (t+1)
2
converged:= true
t := t + 1
Output: Scores (
j
k
(t+1)
j
=1
(t)
(t)
1 ,..., m )
shown the all-pairs path property is necessary and sufficient for the existence of a unique
maximizer; see the chapter notes.
This MLE problem for Bradley-Terry is a lot easier to solve in practice than the estimation problem in the Mallows model. The intuition for this is that it adopts a real valued
parameter space (scores) rather than a combinatorial parameter space (rank order 0 ).
Solutions can be obtained through gradient-descent optimization methods, such as
Newton-Raphson, or other iterative approaches. The MM algorithm is an example of an
e↵ective iterative method for solving the problem. From some initial score vector, the MM
algorithm iterates through simple update steps until the change from one iteration to the
next is within a threshold. See Algorithm 13.1 for the details. When the all-pairs path property holds the MM algorithm is guaranteed to converge to the unique maximum likelihood
estimator for the scores; see the chapter notes.
Now that we have estimated the parameters, we turn to the inference step. Because the
Bradley-Terry model does not define distributions on rank-orders, we cannot return the
mode of the distribution as the aggregate rank order. A sensible approach is to return the
rank order in order of decreasing score. Exercise 13.7 shows that there is a simple frequencybased justification for this approach. The estimated scores are also directly useful in some
applications, in that they provide information about the intensity of preference for each
alternative.
Example 13.15. Let’s consider a pairwise comparison dataset that mirrors the total pairwise comparisons generated by the preference profile in Example 13.5. In particular, the
total count of each pairwise comparison is:
348
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
wins
a
b
c
a
27
35
loses
b
33
18
c
25
42
-
Either by using the iterative MM algorithm, or by direct consideration of the various
possible values for the scores a , b , c and (13.14), the maximum likelihood estimators for
the scores of the Bradley-Terry model are:
a
= 0.315,
b
= 0.403,
c
= 0.282
Just to write out part of the log-likelihood expression, these scores maximize,
LL(✓; D) = (33 ln( a )
33 ln(
a
+
b ))
+ (25 ln( a )
25 ln(
. . . + (18 ln( c )
a
+
18 ln(
c
c ))
+ ...
+
b )),
where the first, second and third pairs of terms are for pair a defeating b, a defeating c and
c defeating b, respectively.
Based on these scores, the inference step would return aggregate rank b R a R c. This
disagrees with the ranking b R c R a generated by applying MLE to the Mallows model,
although it agrees with the top choice (and thus agrees with the Kemeny rule, and also the
Schulze rule on this.) As a social ranking function, it is apparent that Bradley-Terry does
not satisfy the Condorcet criterion.
One interpretation for the di↵erence is that the Bradley-Terry model accounts for a actually defeating b (the top-ranked alternative) in its pairwise match-ups, while b soundly
defeats c. This leads to a better agreement with the data by having a be closer to b than
c is to b , while still having a close to c in order to explain reasonably well the results of
their own pairwise match-ups.
Generalized Method of Moments Estimation. Generally speaking, methods of estimation
vary along multiple dimensions, including in regard to their conceptual simplicity, and
statistical and computational properties.
The generalized method of moments (GMM) is another widely used estimation method.
Compared to MLE, the method of GMM tends to be more tractable and more conceptually
straightforward; in fact, GMM is sometimes used to initialize search algorithms that are
used to solve the MLE problem. Our motivation for introducing GMM is that it leads to
an estimation algorithm for Bradley-Terry with a natural interpretation in terms of a graph
representing the decisiveness with which alternatives defeat each other.
The idea in GMM is to specify one or more moment conditions, which are functions
of model parameters and statistics on the data, and then find parameters that satisfy, or
otherwise approximately satisfy, the conditions. The conditions are specified so that they
hold in expectation for the true model parameters, when the data are generated according
to the model.
For an elementary example, suppose that the data are n samples x1 , . . . , xn , distributed
according to a distribution with unknown mean parameter, µ. A simple moment condition
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
349
13 Social Choice and Rank Aggregation
is:
(
n
X
xi )
nµ = 0.
(13.15)
i=1
We can check that
P the expected value of the function on the left hand side is zero, by
observing that E[ i xi ] = nµ (here we interpret xi as a random variable, representing the
result ofPthe ith sample.) Given the data, the method of moments would estimate parameter
µ := n1 ni=1 xi ; i.e., the sample mean.
In our setting of pairwise-comparison data, a typical moment condition is:
#(j, k) · Pr✓ (k
j)
#(k, j) · Pr✓ (j
k) = 0,
(13.16)
where the statistics #(j, k) and #(k, j) count the number of pairwise wins in the data, as
defined earlier, and the probabilities depend on the score parameters of the Bradley-Terry
model.
The expected value of the function on the left-hand side is zero because E[#(j, k)] =
njk Pr✓ (j k) and E[#(k, j)] = njk Pr✓ (k j) for njk comparisons in the data (interpreting
the counts as random variables.)
Having expressed multiple moment conditions, GMM finds parameters that satisfy all
conditions simultaneously, or otherwise minimize an aggregate measure of the amount of
violation. The chapter notes provide some introductory references to the general approach.
For the particular case of Bradley-Terry, GMM leads to a particularly simple natural
algorithm. For this, we build a weighted pairwise defeat graph. The vertices in the graph
are alternatives. First, introduce a directed edge from j to k, for every pair of alternatives,
if j is ever defeated by k in the data (note the convention for edge direction is reversed from
elsewhere in the chapter.) Let dmax denote the maximum out-degree in this graph. Define
the weight on an edge from j to k as:
✓
◆
1
#(k, j)
0
wj,k
=
,
(13.17)
dmax
njk
so that it depends on the fraction ofP
times that alternative k has defeated j in pairwise
0 =1
0
match-ups. In addition, if wj,j
k6=j wj,k > 0, then introduce a self-edge from j to j
0
with weight wj,j .
Example 13.16. Continuing from Example 13.15, Figure 13.9 illustrates the weighted pairwise defeat graph. We have dmax because every alternative is defeated by two other alternatives. For alternative a, the weights are:
0
wa,b
=
1 27
· ,
2 60
0
wa,c
=
1 35
· ,
2 60
0
wa,a
=1
0
0
wa,b
+ wa,c
=
29
60
For example, the weight from a to b depends on the fraction of times that b defeats a in
the pairwise match-ups between a and b.
We now define a random walk on this pairwise defeat graph. Imagine that you observe
pairwise match-ups between alternatives (we’ll call them players) according to the following
process:
350
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
Figure 13.9: An example pairwise defeat graph, used for parameter estimation in the
Bradley-Terry model.
• Sample some player (= vertex) uniformly at random.
0 , and otherwise move
• At each step, stay at the same player j with probability wjj
to another player k (= adjacent vertex) in proportion to the fraction of times k has
defeated j in the data.
Equivalently, the random walk proceeds by following an out-edge (j, k) from vertex j to
0 . Note that the walk follows edges to alternatives
some vertex k according to probability wj,k
that tend to win in pairwise match-ups.
Consider following multiple such traces of this random walk, and looking at the distribution on the vertices it lands on after a large number of steps (ultimately, for the limiting
number of very many steps.) This distribution, the so-called stationary distribution of the
walk, is unique when the all-pairs connected property holds. Moreover, this distribution
represents the solution to a set of moment conditions defined on statistics of the data. (See
the chapter notes.)
The RankCentrality algorithm, which computes the stationary distribution on this pairwise defeat graph, is summarized in Algorithm 13.2. From a computational perspective,
there is a close connection with the scores for web pages that are computed by the PageRank
algorithm, and the same computational approaches that are used for large scale PageRank
computations can be adopted here (albeit on a di↵erent graph); we refer the reader to
Chapter 21.
Example 13.17. Let’s return now to the example, and the pairwise defeat graph (Figure 13.9). By following the power iteration method, which is described in Chapter 21 for
computing the stationary distribution in the context of the PageRank algorithm, we obtain
the following probabilities for visiting each vertex (and thus estimated scores):
a
= 0.324,
b
= 0.399,
c
= 0.277
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
351
13 Social Choice and Rank Aggregation
Algorithm 13.2: The RankCentrality algorithm for estimating scores
Bradley-Terry model.
Input: Pairwise comparison data on m alternatives.
begin
Construct the pairwise defeat graph G
Compute the stationary distribution of a random walk on G
Output: Scores ( 1 , . . . , m )
in the
The estimates are close to the maximum likelihood estimators of the scores, and generate
the same aggregate rank of b R a R c.
To understand what it means for this to be a stationary distribution, observe that the
probability of being at vertex a in the next step of the random walk, given that the scores
represent the probability of being at each vertex in the current step, is
a
0
· wa,a
+
b
0
· wb,a
+
c
0
· wc,a
,
which considers the possibility of being at a and staying at a, being at b and moving to a,
and being at c and moving to a. For the estimated scores (= probabilities) and the weights
in the graph, we can check that this evaluates to a . This stationary distribution represents
a kind of balance between the di↵erent probabilistic transitions between vertices.
Exercise 13.8 includes a computational study that compares the performance of the MLE
and a GMM (specifically, RankCentrality) approach on a real-world data set.
Interpretation of the Bradley-Terry model By thinking about the random walk used
by the RankCentrality method, we can make some simple observations about when an
alternative will tend to receive a higher score in the Bradley-Terry model:
• it defeats many di↵erent alternatives (more in-edges in the graph);
• it defeats a given alternative more frequently (higher weight on in-edges); and
• it defeats stronger alternatives (in this case, the random walk visits the defeated
alternative more often, and thus visits this alternative more often.)
The next section presents a model that generalizes the score-based approach of the
Bradley-Terry model to apply to a distribution on rank orders.
13.2.5 A Second Rank-Based Model: The Plackett-Luce model
In this section we present a rank-based model that retains the real-valued parameters of the
pairwise-comparison model (making the estimation problem tractable), while allowing for
new flexibility over the Mallows model in handling partial observations about a rank order.
Because this is a rank-based model, the (random) objects that are modeled in the world
are total rank orders. The flexibility comes about because we will also allow the data to
include a partial observation of a rank order.
For example, the data can include kinds of observation types:
352
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
• partial ranks, which are orders on subsets of alternatives, such as the order “b
(without specifying the rank order for other alternatives);
• top-k lists, such as “a
b
d
g”
c are my top-3 alternatives; and
• total rank orders.
To motivate this, we can imagine that an expert is asked for his top-10 restaurants, or a
house buyer is asked to compare some subset of the houses on the market.
Definition 13.19 (Plackett-Luce). The Plackett-Luce model for the rank order on a set of
alternatives A = {1, . . . , m}, has m parameters:
2 R+ for each alternative j.
P
The parameters are normalized so that j j = 1. The model defines a distribution on rank
orders. Let [k] denote the alternative ranked in kth place by rank order
2 P . The
probability of rank order 2 P given parameters = ( 1 , . . . , m ), is
• a score
j
[1]
Pr✓ ( ) =
[1] + . . . +
[m]
·
[2]
[2] + . . . +
... ·
[m]
· ...
[m 1]
[m
1] +
[m]
·
[m]
(13.18)
[m]
A useful interpretation of the model is that there are balls of di↵erent colors in a jar,
each color reflecting a di↵erent alternative. The number of each color is in proportion to
the scores, and there are a large number of balls so that the proportions are accurate. A
sequential process generates a rank order. The first ball that we draw is the top ranked
alternative. In the next step, we keep drawing until we select a new color, and this provides
the second alternative in the rank order. The process continues until a ball of each color
has been selected. We refer to this as a generative interpretation.
Example 13.18. Suppose the alternatives are A = {a, b, c}, and consider Plackett-Luce
with scores a = 0.6, b = 0.3, c = 0.1. The model assigns the following probabilities to
rank orders:
Pr✓ (a
0
b
0
Pr✓ (b
0
a
0
Pr✓ (c
0
a
0
0.6
0.6 + 0.3 + 0.1
0.3
c) =
0.6 + 0.3 + 0.1
0.1
b) =
0.6 + 0.3 + 0.1
c) =
0.3
0.3 + 0.1
0.6
·
0.6 + 0.1
0.6
·
0.6 + 0.3
·
0.1
= 0.45
0.1
0.1
·
⇡ 0.26
0.1
0.3
·
⇡ 0.07
0.3
·
The generative viewpoint suggests a large urn with 600 a (aquamarine) balls, 300 b (beige)
balls, and 100 c (cobalt) balls. We can imagine that it is quite likely to draw aquamarine
first, and very unlikely to draw cobalt first.
Turning to estimation, we again follow the approach of MLE. For this, we need to construct expressions for the probability of each of the di↵erent kinds of observations.
First of all, we consider partial ranks, such as the observation type “rank orders on
alternatives b, d and g.” For this, we can ask “what is the probability that a total rank order
0 2 P agrees with partial rank b
d g?” This is the marginal probability of b d g,
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
353
13 Social Choice and Rank Aggregation
0
and defined as the probability that the rank order
that respect the rank b d g:
Pr✓ (partial b
d
is any one of the multiple rank orders
X
g) =
0 2P
s.t.
b
0
Pr✓ (
).
(13.19)
0d
A nice property of the Plackett-Luce model is that this marginal probability has a remarkably simple expression:
Pr✓ (partial b
d
b
g) =
b
+
d
+
g
·
d
d
+
g
·
g
.
(13.20)
g
We can check, for example, that this is a well defined distribution on the possible orders
that result from just comparing alternatives b, d and g; in particular, the probabilities of all
possible orders on b, d and g add up to one. A proof of the correctness of this expression is
developed in Exercise 13.7.
There is nothing special about alternatives b, d and g. Writing ˜ i to denote a partial
rank, and observing that (13.20) is exactly the expression for a Plackett-Luce distribution
on rank orders on b, d and g when the set of alternatives is A = {b, d, g}, we have:
Pr✓ (partial ˜ i ) = Pr✓ ( ˜ i )
(13.21)
Example 13.19. Let’s continue the earlier example, with alternatives a, b and c, and scores
a = 0.6, b = 0.3, c = 0.1. We have:
Pr✓ (partial a
Pr✓ (partial c
0.6
0.6 + 0.1
0.1
b) =
0.3 + 0.1
c) =
0.1
⇡ 0.86
0.1
0.3
·
= 0.25
0.3
·
The other observation types we consider are the top-k lists, for various values of k. Let’s
start with top-1 lists; e.g., “b is my top alternative.” From the generative interpretation of
the Plackett-Luce model, we have:
b
Pr✓ (top b) = P
,
j
(13.22)
j
since we can imagine a process where we sample the first ball repeatedly with probability
P b . Top-2 observations are very similar. The probability of partial observation “b
d
j j
are my top-2 alternatives” is,
Pr✓ (top b
b
d) = P
j
j
·P
d
.
(13.23)
j6=b b
For this, the generative process is modified to sample two balls in sequence and then
repeat. This samples the sequence of b then d with probability P b j · P d b .
j
j6=b
Example 13.20. Continuing with the earlier example, with alternatives a, b and c, and
scores a = 0.6, b = 0.3, c = 0.1. For top-1 probabilities, we have:
top a :
354
0.6
0.3
0.1
= 0.6; top b :
= 0.3; top c :
= 0.1
0.6 + 0.3 + 0.1
0.6 + 0.3 + 0.1
0.6 + 0.3 + 0.1
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.2 Rank Aggregation: A Statistical Approach
We can understand these easily from the generative interpretation of Plackett-Luce. For
an examples of a top-2 probability, we have:
Pr✓ (top a
c) =
0.6
0.1
·
= 0.15
0.6 + 0.3 + 0.1 0.3 + 0.1
Note that Pr✓ (top a c) 6= Pr✓ (partial a c), since the latter still allows for b to come
before c. In fact, Pr✓ (top a c) = Pr✓ (a c b), given that we have only three alternatives.
Introducing alternative d, and with a = 0.4, b = 0.3, c = 0.2, d = 0.1, we have
Pr✓ (top a
c) =
0.4
0.2
·
⇡ 0.133
0.4 + 0.3 + 0.2 + 0.1 0.3 + 0.2 + 0.1
We can now construct an expression for the log likelihood on each part of the data. For
this, let D⌧ denote the sequence of n⌧ observations (perhaps empty) of observation type ⌧ .
An example of a partial rank observation type is rank orders on alternatives {b, d, g}. An
example of a top-k type is top-5 lists.
For a partial rank observation type, let m⌧ denote the number of alternatives compared;
e.g., m⌧ = 3 if ⌧ is rank orders on {b, d, g}. Given observation ˜ i , let ˜[i, k] denote the alternative ranked in kth place amongst the alternatives compared. From the earlier analysis,
a general expression for the probability of this observation is,
!
mY
⌧ 1
˜ [i,k]
Pm⌧
Pr✓ (partial ˜ i ) =
.
(13.24)
k=1
`=k
˜ [i,`]
Because each observation in D⌧ is independent, we can combine them and take the log
in the usual way, so that the log likelihood for this part of the data is:
!
n⌧ mX
m⌧
⌧ 1
X
X
LL1 (✓; D⌧ ) =
ln( ˜ [i,k] ) ln(
(13.25)
˜ [i,`] ) ,
i=1 k=1
`=k
where we adopt LL1 (✓; D⌧ ) to distinguish this from the expression for another type. The
same expression is valid for all partial rank observation types, and also for total ranks (with
m⌧ = m).
Let’s get now to the top-k observation types. Consider, for example, the data D⌧ that
are top-5 lists. For a single such observation, we have:
!
5
Y
˜ [i,k]
Pr✓ (top ˜ i ) =
,
(13.26)
Pk 1
1
`=1 ˜ [i,`]
k=1
where the denominator follows from the earlier analysis, recognizing that the sum of all
scores is one. Using independence, we can again combine them and take the log, so that
the log likelihood for the top-5 data is:
!
n⌧ X
5
k 1
X
X
LL2 (✓; D⌧ ) =
ln( ˜ [i,k] ) ln(1
(13.27)
˜ [i,`] ) .
i=1 k=1
`=1
Leveraging independence across the data of di↵erent observation types, the overall MLE
problem can be formulated as maximizing the sum of the log-likelihood expressions for all
the observation types present in the data. The following example develops some intuition
for how partial observations interact with full rank observations.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
355
13 Social Choice and Rank Aggregation
Example 13.21. Suppose there are three alternatives, a, b and c. First consider the following two data sets D and D0 , which are symmetric in that no alternative is preferred:
D:
D0 :
6 @ partial a
2@c
2@a
2@a
a
c
b
b;
b
b
c
6 @ partial b
c;
6 @ partial c
a
6 @ partial b
c;
6 @ partial c
a
In D0 we replace the observation of “3 @ partial a b” with three observations of each of
the total orders consistent with the partial rank. In this case, the e↵ect of the partial ranks
are equivalent to having seen each total rank, and the maximum likelihood estimators are
1
0
a = b = c = 3 for both data D and D .
But now consider the following data sets:
D:
D0 :
6 @ partial a
2@c
2@a
2@a
a
c
b
b;
b
b
c
6 @ partial b
c;
1 @ partial c
a
6 @ partial b
c;
1 @ partial c
a
In this case, the maximum likelihood estimators are a ⇡ 0.81, b ⇡ 0.16, c ⇡ 0.03 for
data D, compared with a ⇡ 0.62, b ⇡ 0.23, c ⇡ 0.15 when the data are D0 . The observation
“6 @ partial a b” makes no commitment to the underlying total ranks of these six data
points beyond the requirement that a is ranked ahead of b, and in this case the rest of the
data puts most weight on ranking b ahead of c— suggesting a b c is likely. On the other
hand, data D0 commits to two observations of each of the three consistent ranks, and in
particular sometimes ranks c ahead of a and ahead of b. The e↵ect is to bring more balance
to the estimated scores.
The structure of the log likelihood is similar to the expression for the Bradley-Terry
model, and modified versions of the MM algorithm can be used to compute the maximum
likelihood estimator of the scores in Plackett-Luce. See Exercise 13.9 and the chapter notes.
Having estimated the parameters of the model, the final step can return as the aggregate
rank the rank order with the highest probability. For the Plackett-Luce model, this is
just the order obtained from placing the alternatives in decreasing order of score. See
Exercise 13.7.
Example 13.22. We return to the Example 13.5, and recall that the votes are:
23 @ a, b, c
17 @ b, c, a
10 @ c, a, b
8 @ c, b, a
2 @ b, a, c
Either by using the modified MM algorithm, or by direct consideration of the various
possible values for the scores a , b , c and (13.25) (since we only have rank observations
and no top-k observations), the maximum likelihood estimators for the scores of the PlackettLuce model are:
a
356
= 0.297,
b
= 0.425,
c
= 0.278
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.3 Applications
Just to write out part of the log-likelihood expression, these scores maximize,
LL1 (✓; D) = 23(ln( a )
ln(
a
+
b
+
c)
+ ln( b )
. . . + 8(ln( c )
ln(
b
+
ln(
c
c ))
+
+ ...
b
+
a)
+ ln( b )
ln(
b
+
a )),
where the first term corresponds to the observations of a 0 b 0 c and the final term
corresponds to the observations of c 0 b 0 a.
Based on these scores, the inference step would return aggregate rank b R a R c. This
ranking agrees with the result of using the Bradley-Terry model, and thus is again at odds
with the result of the Mallows/Kemeny approach.
13.3 Applications
Social choice methods are of course used throughout the world, and provide the basis for
the democratic process. In addition to the widely used plurality rule, we already mentioned
earlier in the chapter some places where plurality-with-elimination and single-transferable
vote (STV) style rules are used. Voting procedures are increasingly the subject of public
debate as well, in regard to whether one rule should be changed for another. For example,
a 2011 referendum in the UK as to whether a STV procedure should be adopted in place
of the plurality rule (68% voted against the change). There is also much debate about the
introduction of methods of computerized voting, and especially in regard to the need to
balance privacy with concerns about trustworthiness and security. We return to some of
these themes in Chapter 29.
A fun application of a positional-scoring rule is to determine the winner of the annual Eurovision song contest. A European popular culture sensation for decades, the contest mixes
music, glamour and politics. For example, forty-three countries were represented in 2011,
with the ultimate victors Eldar and Nigar, and performing “Running Scared” for Azerbaijan. Each country votes for ten other countries, with assigned scores of 12, 10, 8, 7, 6, . . .
down to 1. The winning country is the one with the highest total score. The contest famously resulted in a four-way tie in 1969, and before any such rule had been determined
for how to handle ties. In the event, all four countries were declared as winners, leading to
general unhappiness and threatened walk-outs. (There is now a tie-breaking rule.)
A variation on Kemeny’s rule has been used by Duke university’s Computer Science
department to rank Ph.D. applicants, where committee members give partial orders over
the applicants. Schulze’s rule seems to be finding increasing application. For example,
Schulze’s rule was used for the elections determining membership in the Wikipedia Board
of Trustees in 2008 and 2009, and also by developers in the volunteer organization behind
Debian, which is an open source code community with more than 1000 developers. The
strong axiomatic support for the rule along with its tractability is likely leading to Schulze’s
rule apparently finding more wide application then the Kemeny rule.
Methods of statistical rank aggregation are used in many places. For example, they have
been applied to combining the results of multiple web searches into a single rank order, and
for ranking movies based on user feedback. The international chess federation uses the “Elo”
system for ranking, which is based on the Bradley-Terry model, and TrueSkill is a statistical
ranking system for Xbox Live that is designed to update the ranking of individuals based
on the outcome of multi-team and multi-player match-ups. Similar to Exercise 13.9, many
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
357
13 Social Choice and Rank Aggregation
methods are tested and compared on a dataset of NASCAR race results for the 2002 season.
Experimental uses have been demonstrated in the context of real-time voting at visitors to
an open house in honor of the 150th anniversary of MIT. Rank aggregation is also important
for drug discovery by predicting the function of proteins.
13.4 Notes
xx These notes will be compressed and citations included as numbers. But
please let us know if the credit looks wrong, or we are missing something. xx
This seeds for the structure of this chapter were inspired by lecture notes by Yiling Chen.
S. J. Brams and P. C. Fishburn “Voting procedures” Ch 4 in Handbook of Social Choice
and Welfare Vol 1, K. J. Arrow, A. K. Sen, K. Suzumara (eds) (2002) provide a discussion
of di↵erent voting procedures, including approval voting and range voting. (The examples
in Exercise 13.1 (c) and (d) are due to Brams and Fishburn).
What we refer to as social choice rules and social ranking rules are sometimes referred to
as social choice functions and social welfare functions, respectively, in the literature. For an
accessible proof of Arrow’s theorem, along with the related Muller-Satterthwaite theorem
that applies to social choice rules, see Chapter 9 of Kevin Leyton-Brown and Yoav Shoham’s
“Multi-agent Systems” (CUP 2009).
Example 13.5 is adapted from H. P. Young’s “Optimal voting rules” in the J. Economic
Perspectives 9 (1) 51-64, 1995. Theorem 13.2 is due to H. P. Young, “Social Choice Scoring
Functions” SIAM Journal on Applied Mathematics, Vol. 28, No. 4 (Jun., 1975), pp. 824838. A variation on Theorem 13.3 is presented in P. C. Fishburn “Paradoxes of Voting,”
The American Political Science Review 68(2): 537–546, 1974. Theorem 13.5 is due to H. P.
Young and A. Levenglick, “A consistent extension of Condorcet’s election principle,” SIAM
J. Appl. Math. 35:285–300 (1978).
The observation about pairwise-majority cycles in Olympic ice skating contests is from
M. Truchon, “Aggregation of Rankings in Figure Skating” (June 2004). CIRPEE Working
Paper No. 04-14. Exercise 13.5 (a) is due to Lirong Xia. Exercise 13.1 (b) is due to D. R.
Woodall. Magazine, 56(4):207-214, 1983.
The complexity of determining the outcome of choice rules was first studied in J.
Bartholdi, C. A. Tovey, and M. A. Trick “Voting schemes for which it can be difficult to tell
who won the election” in Social Choice and Welfare, 6:157-165, 1989. C. Dwork, R. Kumar,
M. Naor and D. Sivakumar, “Rank aggregation methods for the web” (WWW’01) prove
that Kemeny is hard to compute for even 4 alternatives, and describe the local-Kemenization
procedure (see Exercise 13.3). V. Conitzer, A. Davenport, and J. Kalagnanam, “Improved
bounds for computing Kemeny rankings” in Proceedings of the 21st AAAI Conference on
Artificial Intelligence (AAAI), pages 620-626, 2006 discuss computational approaches to
finding the Kemeny rank order.
M. Schulze, “A new monotonic, clone-independent, reversal symmetric, and condorcetconsistent single-winner election method,” Social Choice and Welfare 36(2):267–303, 2011
introduces Schulze’s rule, and establishes its formal properties including its robustness
against clones.
Example 13.10 is from L. Xia and V. Conitzer, “Strategy-proof voting rules over multiissue domains with restricted preferences,” Proc. WINE pp 402-414, 2010. The same
358
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.4 Notes
authors also establish characterize when strategyproofness can be obtained through sequential voting. J. Lang and L. Xia, “Sequential composition of voting rules in multi-issue
domains,” Mathematical Social Sciences 57(3): 304-324 (2009) study the axiomatic properties of sequential voting rules. Another approach to combinatorial voting through succinct
representations is “Multi-agent soft constraint aggregation via sequential voting,” G. Pozza,
M. Pini, F. Rossi, and K. B. Venable, Proc. IJCAI 2011.
L. Xia and V. Conitzer “Stackelberg Voting Games: Computational Aspects and Paradoxes,” 24th. AAAI Conference on Artificial Intelligence (AAAI-10), pp. 921-926, 2010
and Y. Desmedt and E. Elkind, “ Equilibria of Plurality Voting with Abstentions” ACM
ACMEC 2010 observe the paradoxical outcomes that can arise because of strategic behavior
in voting rules.
J. Bartholdi, C. A. Tovey, and M. A. Trick “The computational difficulty of manipulating
an election” Social Choice and Welfare, 6:227-241, 1989 first studied the use of computational intractability to provide a barrier to manipulation. The result that manipulation is
hard in plurality-with-elimination is in “Single Transferable Vote Resists Strategic Voting”
J.J. Bartholdi and J.B. Orlin, in Social Choice and Welfare 8, (1991), 341-354. Theorem 13.11 is due to E. Hemaaspandra and L. A. Hemaaspandra “Dichotomy for voting systems” in J. Comput. Syst. Sci. 73(1): 73-83 (2007). Theorem 13.12 is due to A. Procaccia
and J. Rosenschein “Average-Case Tractability of Manipulation in Voting via the Fraction
of Manipulators,” Proc. 6th Intl. Joint Conference on Autonomous Agents and Multiagent
Systems, pp. 718-720, May 2007 (short paper). A generalized theorem is proved by L. Xia
and V. Conitzer in “Generalized scoring rules and the frequency of coalitional manipulability” (EC’08). D. C. Parkes and L. Xia “A complexity-of-strategic-behavior comparison
between Schulze’s Rule and Ranked Pairs” in 26th AAAI (2012) consider the robustness
to manipuation of Schulze’s rule; these authors also contribute the WMG in Exercise 13.3.
For a survey on computational resistance to manipulation, see “AI’s War on Manipulation:
Are We Winning?”, by P. Faliszewski and A. D. Procaccia, AI Magazine 31(4):53-64, Dec
2010. For further reading on computational social choice, a survey is “Computational Social
Choice” F. Brandt, V. Conitzer and U. Endriss in G Weiss (ed.) Multiagent Systems MIT
Press (2012).
I. J. Marden “Analyzing and modeling rank data” Chapman and Hall, 1995 is a good
introduction to rank aggregation models. “Condorcet’s Theory of Voting” by H. P. Young
in The American Political Science Review, Vol. 82, No. 4. (Dec., 1988), pp. 1231-1244,
formalizes the connection between Condorcet’s noisy votes model and maximum likelihood
estimators and the Kemeny rule. We adopt the standard terminology Mallows model, due to
C. L. Mallows, “Non-null ranking models,” Biometrika, 44: 114-130, 1957. More typically,
the Mallows model is presented as a distribution on rank orders 0 where the probability
0
Pr✓ ( 0 ) / e ·d( , 0 ) where the distance metric is Kendall tau and parameter > 0.
The Bradley-Terry model dates back to a paper by E. Zermelo in 1929, but was popularized by R. A. Bradley and M. E. Terry “Rank analysis of incomplete block designs.
I. The method of paired comparisons,” Biometrika 39 324-345 (1952). The generalization
to the Plackett-Luce model is credited to R. Plackett “The analysis of permutations, Applied Statistics, 24:193-202, 1975 and R. D. Luce “Individual choice behavior: A theoretical
analysis” Wiley, 1959. The Plackett-Luce model is also referred to as the multinomial logit
model in econometrics.
D. R. Hunter “MM algorithms for generalized Bradley-Terry models” Ann. of Stats., 32,
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
359
13 Social Choice and Rank Aggregation
384-406 (2004) presents and analyzes the MM algorithm, including the generalization that
we use in Exercise 13.9 for estimation in the Plackett-Luce model. Hunter also develops the
marginal probability analysis for partial ranks in Plackett-Luce (see also Exercise 13.7). The
RankCentrality algorithm is due to S. Negahban, S. Oh and D. Shah, “Iterative Ranking
from Pair-wise Comparisons” in Proc. NIPS 2012, who analyze it in the context of the
Bradley-Terry model but note that it can be used more generally. The compiled data for
the Eastern division of the American league, and suggestion to analyze in the Bradley-Terry
model, is due to Luke Tierney. The condition under which the maximizer is unique is due
to Zermelo and L. R. Ford, Jr., “Solution of a ranking problem from binary comparisons”
Amer. Math. Monthly 64 28-33 (1957).
For applications of rank aggregation, see: J. Guiver and E. Snelson, “Bayesian inference
for Plackett-Luce ranking models,” Proc. 26th Intl. Conf. on Machine Learning (ICML09), pp.377-384, 2009 (Plackett-Luce applied to NASCAR driver data and movie preference
data); Herbrich, R., Minka, T., & Graepel, T. (2007). TrueSkill(TM): A Bayesian skill
rating system, Adv. in Neur. Inf. Proc. Sys. (NIPS) 19, 569-576 (a model related
to Bradley-Terry applied to the Xbox TrueSkill player ranking system); T. Lu and C.
Boutilier, “Learning Mallows Models with Pairwise Preferences” Proc. 28th ICML 2011
(Mallows applied to a sushi dataset, as well as movies); Hunter (Plackett-Luce applied
to NASCAR); Dwork et al. (Borda, local-Kemenization, and Markov chain approaches
related to RankCentrality applied to meta-search rank aggregation); and Volkovs and Zemel
“A Flexible Generative Model for Preference Aggregation” WWW 2012 (Plackett-Luce,
Bradley-Terry and other models applied to meta-search).
13.5 Comprehension Questions and Exercises
13.5.1 Comprehension Questions
c13.1 Suggest an example of a social choice problem where the viewpoint of “finding a fair
compromise on subjective opinions” seems valid.
c13.2 Suggest an example of a social choice problem where the viewpoint from statistical
rank aggregation of “finding the ground truth” seems valid.
c13.3 What’s a reason why Borda’s rule (or similar positional-scoring rule) might be used
in practice rather than Schulze’s rule?
c13.4 What challenge is presented by Kemeny’s rule, and how is this addressed by Schulze’s
rule?
c13.5 What difficult new problem arises in combinatorial voting and how do CP-nets try to
address this?
c13.6 Suggest an example of a domain that is familiar to you, where single-peaked preferences seem reasonable.
c13.7 What is the fundamental di↵erence between the Mallows model and the Plackett-Luce
model?
360
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.5 Comprehension Questions and Exercises
c13.8 What is the fundamental di↵erence between the Bradley-Terry model and the PlackettLuce model?
13.5.2 Exercises
13.1 Plurality-with-elimination
(a) What is the outcome of plurality-with-elimination on the profile in Example 13.4?
(b) Consider two sets of votes:
set one
6@ a, b, c,
4@ b, c, a 3@ c, b, a
set two
3@ a, b, c,
4@ b, c, a 6@ c, b, a
Confirm a failure of the continuity axiom on this example.
(c) Consider the profile:
2@ a, c, b
2@ b, c, a
1@ c, b, a.
By changing both of the first two votes in the same way, show a failure of IIA.
(d) Consider the profile:
27@ a, b, c
42@ c, a, b
What happens when four votes switch from a
axiomatic property does this violate?
24@ b, c, a.
0
b
0
0
c to c
0
a
b, and what
(e) Consider the profile
27@ a, b, c
46@ c, a, b
24@ b, c, a.
What paradoxical outcome occurs when four voters with a
0
b
0
c don’t vote?
13.2 Axioms, Equivalences and Variations
(a) Show that all rules introduced in this chapter except range and approval reduce
to the majority rule when applied to two alternatives. Assume an odd number of
voters to avoid problems with ties.
(b) Prove that positional-scoring rules satisfy the six axioms in Theorem 13.2. (This
is only one direction of the theorem!).
(c) The local independence of irrelevant alternatives (LIIA) property states that whenever the alternatives are restricted to a set that form a contiguous interval in the
social rank order (e.g., {b, a} is an interval given c R b R a R d), then the
social rank order is unchanged on these alternatives. Prove that the Kemeny rule
satisfies this property.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
361
13 Social Choice and Rank Aggregation
Figure 13.10: A 4-cycle and 5-cycle, used to explain why it is sufficient to preclude 3-cycles
in the Kemeny rule computation.
(d) Use the following profile to show that plurality is not LIIA:
35@ a, c, b
33@ b, a, c
32@ c, b, a
(e) Use the following profile to show that Borda is not LIIA (hint: consider an interval
of size three):
3@ a, b, c, d
2@ b, c, d, a
2@ c, d, a, b
(f) Consider the profile:
9@ a, b, c
8@ a, c, b
15@ b, c, a
16@ c, a, b
Introduce a clone a0 for a and a clone c0 for c. Show that the Kemeny rule fails
the independence of clones property on this example. [Hint: for the purpose of
Kemeny, the pairs of clones can be treated as a single alternative, with twice the
weight in regard to wins and defeats as a single clone.]
(g) Explain how to interpret plurality, plurality-with-elimination, and Borda as social
ranking rules.
13.3 Condorcet rules
(a) The IP formulation used to compute the Kemeny rule represents a rank order as
a set of edges on a graph (where alternatives are vertices), insisting on one edge
between every pair of vertices and no 3-cycles.
(i) Explain why the set of edges used to encode a preference order is acyclic, and
why no other order is consistent with the graph.
(ii) Referring to Figure 13.10, first explain by considering the possible edges that
must be selected between b and d on the 4-cycle why there can be no 4-cycle
without a 3-cycle. Now, consider possible edges between a and d on the 5-cycle
and argue that there must be a 3-cycle or a 4-cycle. Generalize the argument to
explain why there can be no k-cycle for any k > 3.
362
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.5 Comprehension Questions and Exercises
Figure 13.11: A weighted majority graph on four alternatives (ignoring negative weight
edges).
(b) The rest of the question considers the WMG in Figure 13.11. [Although we don’t
provide the votes, note that there is always a set of votes that realize a WMG.] Is
there a Condorcet winner for this preference profile?
(c) Show that the Kemeny and Schulze rules do not agree when applied to this
weighted majority graph. (For Kemeny you might want to write a short program to enumerate the various orders.)
(d) Local Kemenization (LK) is a procedure that approximates the Kemeny rule and
satisfies the Condorcet criterion. It finds a rank order that is locally optimal in
that no swap can reduce the Kemeny score. For a simple version: start with order
d, c, b, a, and make repeated passes from right to left in the style of bubble sort,
switching adjacent alternatives i, j if j defeats i in a pairwise comparison. What
is the outcome when applied to the example in Figure 13.11?
13.4 CP-nets and Combinatorial voting
(a) Give a pair of alternatives (distinct from those provided in the text) of main
course, wine decisions about which the CP-net in Figure 13.4 gives no direct
preference order.
(b) Explain why the CP-net requires preference order (b, ro)
(f, w).
(c) Why does the CP-net leave the relationship between (f, re) and (v, ro) undetermined? Provide another pair of alternatives for which the order is not determined.
(d) Explain how the issue graph would be modified, and state the number of entries
in the CPT’s of any one agent, in the following two variations for the structure
on preferences when a decision must also be made about an appetizer:
(i) the preference for main course is unconditional, and the preference for wine
and for appetizer depend only on the choice of main course;
(ii) the preference for appetizer is unconditional, the preference for main course
depends only on the choice of appetizer, and the preference for wine depends on
both the appetizer and the main course.
13.5 Strategic behavior
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
363
13 Social Choice and Rank Aggregation
(a) Consider plurality-with-elimination, and profile
2@ a, b, c
2@ b, c, a
2@ c, a, b.
Assume tie breaking in favor of a then b then c. Find a manipulation for one of
the voters.
(b) Prove that the median rule for single-peaked preferences is strategyproof.
(c) For single-peaked preferences, suppose that a location `(j) 2 [0, 1] is associated
with each alternative j 2 A, with the alternatives arranged in order of increasing
location. Define the mean rule to choose the alternative that is closest to the
mean of the locations of the top choice of each voter. Explain why this rule is not
strategyproof.
(d) A suggested procedure for FindManipulation in the Borda rule is: Consider
agent i, and fix reports of others. Compute the Borda score of each alternative
given truthful report i 2 P . To construct misreport ˆ i : place the favored alternative a in top position, and rank the other alternatives in descending order of
their Borda scores at truth.
Prove that this procedure is the optimal manipulation in Borda, in the sense that
it will cause a to be elected whenever any manipulation can succeed.
(e) What could be viewed as unrealistic about the way the FindManipulation problem is defined? Would relaxing this assumption be expected to make manipulation
easier or harder?
13.6 Single-peaked preferences and the median rule
(a) Prove that the median rule is Pareto optimal.
(b) Prove that there is a Condorcet winner in a single-peaked domain when there are
an odd number of votes.
(c) Explain, without explicitly checking the di↵erent ways to line up the alternatives,
why the profile in Figure 13.6 (b) is not single-peaked.
(d) Consider a generalized median rule where the designer first (without looking at
votes) positions some number of “phantom peaks”— top choices that are to be
considered along with the reported preferences in determining the outcome. Explain how the designer can use this to select the min, max and 2nd-lowest of the
top choices contained in the votes.
13.7 Statistical rank aggregation (Theory)
(a) Show that the maximum likelihood estimator of the mean of thePnormal distribution given data D = {x1 , . . . , xn } is the sample mean x = n1 ni=1 xi . [Hint:
maximize the joint probability density of the observations, and work with the log
likelihood.]
(b) Fixing 0 , how does the distribution in the Mallows model change as p varies
from 1 towards 0.5? [Just informally.]
364
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.5 Comprehension Questions and Exercises
(c) Consider the Mallows model on three alternatives with parameters 0 : a, b, c
and p = 0.6. Suppose that it is applied to data that includes partial ranks and
total ranks. Explain why the marginal probability of “partial a b” is not just
the parameter p.
(d) The inference step on Bradley-Terry returns a rank order according to the estimated scores. By considering the expected outcome of 1,000 pairwise match-ups
between each pair of alternatives, provide a frequency-based justification for this
rank order.
(e) Why is the mode of the Plackett-Luce distribution obtained by placing the alternatives in decreasing order of score?
(f) Assuming a domain with four alternatives {a, b, c, d}, write out the expression for
marginal probability Pr✓ (top a b) and Pr✓ (partial a b); i.e., expressing these
as the sum over the probabilities for consistent rank orders. Do these expressions
explain why Pr✓ (top a b) < Pr✓ (partial a b)?
(g) The text uses the generative interpretation of Plackett-Luce to support the claim
that Pr✓ (top b) = a + b +b c + d , in a setting with four alternatives. Provide an
alternative explanation, by expressing this marginal probability as
Pr✓ (top b) =
b
X
0
Pr✓ (
0
),
0
s.t.
others
where others is an arbitrary order on the other alternatives, and working directly
with the definition of the Plackett-Luce distribution on rank orders.
(h) Consider alternatives A = {a, b, c}. Prove that Pr✓ (partial a
b) = a +a b · bb .
[Hint: write out the marginal probability, which will include a sum over the 3
distinct total orders consistent with the partial rank, and use algebra to simplify.]
(i) Suppose the argument generalizes, so that the analysis can be repeated for any
partial rank of size m 1 for a set of alternatives A of size m. By using this idea
repeatedly, prove the claim in the notes that Pr✓ (partial ˜ ) = Pr✓ ( ˜ ).
13.8 Statistical rank aggregation (Bradley-Terry)
(a) Compare the results of using the MM algorithm and the RankCentrality algorithm
on the results of the Eastern division of the American league for the 1987 baseball
season. [For RankCentrality, you will need to consult Chapter 21 for a method to
compute the stationary distribution of the random walk.]
Eastern division of the American league for the 1987 baseball season
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
365
13 Social Choice and Rank Aggregation
winning
team
Milwaukee
Detroit
Toronto
New York
Boston
Cleveland
Baltimore
Milw.
6
4
6
6
4
2
Detroit
7
6
8
2
4
4
losing
Toronto
9
7
6
6
5
1
team
NY Bos
7
7
5
11
7
7
6
7
6
6
3
1
Clevel.
9
9
8
7
7
7
Balt.
11
9
12
12
12
6
-
(b) How can we easily see that the all-pairs paths property holds in this data? What
simple heuristic does this suggest for modifying a data set that does not satisfy
the all-pairs paths property to obtain a data set that does?
13.9 Statistical rank aggregation (Plackett-Luce)
In this exercise, you will use the Plackett-Luce model to determine an aggregate rank
from rank-order data.
For data that includes total or partial ranks (but no top-k observations), the modification to the MM algorithm (Algorithm 13.1) for Bradley-Terry to perform MLE for
the Plackett-Luce model uses the following update step:
(t+1)
j
=P P
n
mi
i=1
k=1
1
#(j)
hP
mi
ikj
`=k
(t)
˜ [i,`]
i
1,
where #(j) is the number of times that j is ranked above last position, mi is the
number of alternatives ranked in data point i (mi = m if every alternative is ranked),
and indicator ikj = 1 if amongst the alternatives ranked in the ith observation,
alternative j is ranked in position k or lower, with ikj = 0 otherwise.
This modified MM algorithm will converge to a unique maximizer for a generalized
all-pairs path property, where “i wins over j” is interpreted as i being ranked higher
than j.
(a) Find data for a recurring sports event (or competitive game) in which multiple
players (or teams) compete against each other within a season, and finish in a
rank order (e.g., motorcar racing, college cycling, and so forth.) One example
could be data from a Formula 1 Grand Prix season. The data can include partial
ranks, so that not every player needs to participate in every match-up.
Describe your data, including the total number of alternatives, the number of
observations, and any variation in the number of alternatives compared in each
rank order observation. Describe anything that you have done to clean up the
data in making it suitable for the Plackett-Luce model.
(b) Use the Floyd-Warshall all-pairs shortest-paths algorithm to check this property
for your data set, and drop any players (or teams) from consideration for which
it does not hold.
(c) Run the modified MM algorithm for Plackett-Luce on your data, and find maximum likelihood estimators of the score parameters.
366
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
13.5 Comprehension Questions and Exercises
(d) Compare the associated rank order with the official season results (if available!).
Comment on any interesting di↵erences.
Copyright © 2013 D.C. Parkes & S. Seuken. Do not distribute without permission.
367