Game Theoretic Analysis for Offense-Defense - nlsde

2015 IEEE Symposium on Service-Oriented System Engineering
Game Theoretic Analysis for Offense-Defense
Challenges of Algorithm Contests on TopCoder
Zhenghui Hu and Wenjun Wu
State Key Laboratory of Software Development Environment
Department of Computer Science and Engineering, Beihang University
Beijing, China 100191
[email protected] [email protected]
Abstract—Software crowdsourcing platforms such as TopCoder have successfully adopted the offense-defense based quality
assurance mechanism into software development process to deliver high-quality software solutions. TopCoder algorithm contests
run single-round matches (SRM) with the challenge phase that
allows participants to find bugs in the submitted program
and eliminate their opponents. In the paper, we introduce a
game theoretic model to study the competitive behaviors in the
challenge phase of SRM. By analyzing the Nash Equilibrium of
our multiple-person game model, we find that the probability
of making a successful challenge and effort cost are the major
factors for contestants’ decisions. To verify the theoretic result,
we make empirical data analysis with the dataset collected from
the algorithm challenge phase on TopCoder. And the results
indicate that contestants with a high rating are more likely to
launch challenges against lower ones. However, contestants with
the highest rating may be unwilling to challenge to avoid risks
of losing their points in the contests.
software crowdsourcing, this paper focus on studying the
offense-defense occurred during the algorithm contests on the
TopCoder platform.
TopCoder, a crowdsourcing development community, has
attracted 716,180 registered members by December 2014 [3].
It applied the competitive and interesting programming contest
as the kernel of the platforms software development pattern.
On TopCoder, a project will be divided into several stages
and presents people with various categories of contests, such
as conception, specification, architecture, design, development,
assembly, and test suites. Among these types of contests, the
algorithm contest type is regarded as the most important programming contest in TopCoder, which is used as an effective
and primary way to attract and retain the community members.
Algorithm contests, hosted fortnightly on TopCoder, are
designed to encourage wide participation by the TopCoder
community. And single round match (SRM), the main form of
algorithm competitions, consists of three phases including the
coding phase, the challenge phase, and system test phase. The
major goal of the challenge phase is to collect valid test cases
for finding the faulty programs submitted by participants at
the coding phase. It reflects strong min-max interaction where
players take offensive actions against their peers and eliminate
the weaker players. Therefore, such a contest provides us great
opportunity to observe and analyze the competitive behaviors
of contestants during the real offense-defense scenario such as
algorithm challenge phase.
In our previous research on the TopCoder’s algorithm
contests [4], we proposed a two-person competitive model on
the basis of game theory. In order to make further investigation
into the behaviors and performances of contestants during the
algorithm challenge phase, we extend the model to a multipleperson game model in the paper, along with empirical analyses.
The rest of the paper is organized as follows. Section 2
provides the overview of related work. Section 3 specifically introduces our multiple-person game model for software
crowdsourcing. Section 4 presents empirical data analysis of
SRM history on TopCoder to validate the game theoretic
model. And we conclude the paper and discuss the future work
in section 5.
1. I NTRODUCTION
Crowdsourcing is a powerful paradigm that allows the
wisdom of crowds to apply to more problems with faster
and better results [1]. As a new and efficient innovation, it
has already demonstrated its capability of being a problemsolving mechanism for various groups, such as government,
business, nonprofit use, researchers, artists, and even software
design and development [2]. Obtaining quality software is
a common but challenging goal for software crowdsourcing
projects. Software quality has many dimensions including
reliability, performance, security, safety, maintainability, controllability, and usability. Based on the observation of success practices of crowdsourcing software tests such as TopCoder(www.topcoder.com) and uTest(www.utest.com), we find
that the offense-defense based quality assurance is one of the
most fundamental way to eliminate the potential deficits in the
documents, models and codes submitted by crowd. Multiple
groups from the community undertake different responsibilities
of software development tasks and intensively cooperate with
each other to find problems in their counterpart’s work and
reduce the bugs in their own work.
Although research efforts have been made to study the
theory and practices of software crowdsourcing, few have
been done about the offense-defense quality assurance. Many
research questions remain such as the major decision factor
for participants to launch offenses against their peers and
the effectiveness of the offense-defense mechanism on the
quality assurance. In order to investigate the behaviors of
participants in the scenario of offense-defense scenario of
978-1-4799-8356-8/15 $31.00 © 2015 IEEE
DOI 10.1109/SOSE.2015.44
2. R ELATED W ORK
This section reviews the related work on the two main
areas: software development crowdsourcing and game theory’s
applications in crowdsourcing.
339
A. Software Development Crowdsourcing
Crowdsourcing has been more and more popular for people
to deal with various issues that require human intelligence successfully. And researchers are promoted to apply it in software
engineering, one of the most challenging and creative activities. Many companies, such as TopCoder, uTest and oDesk,
have already adopted the new development mode–software
crowdsourcing in different forms. TopCoder and uTest select
members to solve problems, while oDesk serves as an online
liaison between clients and freelance software developers [5].
Additionally, successful crowdsourcing platforms such as App
Store and TopCoder, have demonstrated the capability and
potential of crowdsourcing on supporting various software
development activities, coding and testing included [6].
Several researches have already been done for software
crowdsourcing. Lakhani, Garvin and Lonstein [5] discussed
a variety of issues of TopCoder in detail, such as the evolution
of community, the resource management mode, challenges
and the future. In [7], competitive and collaborative software frameworks for online crowdsourcing were investigated.
Moreover, empirical analyses on TopCoder have been done
to analyze strategic behaviors of contestants and collaborative
competition in specific software development projects [8, 9].
Ke Mao et al. [10] addressed the important pricing questions
occurred during the development of software crowdsourcing,
while Wenjun Wu et al. [6] presented an evaluation framework
for software crowdsourcing. Lately, [11] analyzed the key
factors for software quality in crowdsourcing development,
[12] discussed the influence of the competition’s level and
a tournament’s size on performance, and the relationships
between customer, business and technical characteristics were
described in [13]. Nevertheless, crowdsourcing software development is still at its infant stage, and few research efforts have
been made on the offense-defense quality assurance. Therefore, it’s essential to make some further work on competitive
behaviors during the process of offense-defense in software
crowdsourcing.
Fig. 1: Algorithm contests on TopCoder
3.
M ULTIPLE - PERSON G AME M ODEL FOR S OFTWARE
C ROWDSOURCING
A. Overview of TopCoder Challenge in SRM
For each rated algorithm contest, there are three phases:
the coding phase, the challenge phase, and system-testing
phase [24, 25].
(1) The Coding Phase
The Coding Phase is a timed event where all contestants are
presented with the same three algorithm questions representing
three levels of complexity, and accordingly, three levels of
point earnings potential. Within the time constraint of 75
minutes, contestants have to design a solution and implement it
in programming language. Upon submission of their solutions
to a problem, the system checks whether the submitted code
can successfully compiles. If so, contestants are awarded with
points calculated based on the problem difficulty and the
elapsed time for accomplishing the solution.
(2) The Challenge Phase
The Challenge Phase is a offense-defense process with the
goal to collect test cases against the submission. It presents
each competitor a chance to review their peer’s source code
and challenge the functionality of the solutions. Within 15
minutes, a contestant needs to locate the most vulnerable code
and decides whether to send a test case to challenge the code.
A successful challenge will result in a loss of the original
problem submission points by the defendant, and a 50-point
reward for the challenger. Unsuccessful challengers will incur
a point reduction of 25 points as a penalty, applied against
their total score in that round of competition.
(3) The System-testing Phase
The System-testing Phase is not an interactive process. It
is run by an automated tester responsible for checking the
submitted source code with specified inputs. All successful
challenges from the challenge phase will be added to the sets
of inputs for the System-testing Phase. With the test cases from
both TopCoder team and the contestants in the challenge phase,
the automated tester applies the test cases to all submitted code
that has not already been successfully challenged. If the tester
finds code that is flawed, the author of that code submission
will lose all of the points that were originally earned for that
code submission.
Additionally, each algorithm contest consists of two
B. Game Theory’s Applications in Crowdsourcing
Game theory, introduced by von Neumann and Morgenstern, studies mathematical models of conflict and cooperation
between intelligent rational decision-makers [14]. It has been
used in economics, political science, psychology, logic, and
biology, as well as crowdsourcing, by offering techniques for
formulating competitions between parties that wish to reach
an optimal position.
[15] and [16] applied the game theory to design optimal
crowdsourcing contests. [17] used all-pay auctions to evaluate
the effects of reward size and early high-quality submission on
overall participation and submission quality on Taskcn. Other
work adopted game theory or all-pay auctions to seek for great
incentives for workers to exert effort, the relationship between
rewards and participation, and how to obtain perfect outputs
with a high quality [18–23]. Most current game theoretic models about software crowdsourcing are focused on design and
programming contests where players attempt to outperform
others for contest prizes. Few research papers tackle with the
issue on the strong competitions through which players can
eliminate their opponents. In this paper, we apply the game
theory to study and analyze competitive behaviors in the SRM
challenges by extending the two-person model defined in [4].
340
When player i attempts to challenge player j, there are only
two possible results: either i’s effort succeeds or fails. Let Xi,j
is a Bernoulli random variable representing the outcome of the
challenge between i and j as follows:
⎧
⎨1 if i succeeds
(3.1)
Xi,j =
⎩
0 otherwise
divisions–division I and division II, each of which hosts
contestants based on their rating values. And in each division,
three algorithm problems with different difficulty level are
presented to participants. They are asked to design programs
as proper solutions to the problems within the given time in
the coding phase. In division I, the players are more skillful
at the contests as they are required to acquire a high rating
value greater than or equal to 1200. While the contests in
division II with the lower requirement allows less-skilled
members, and even novices to participant. In each division,
all registered participants are placed into virtual competitive
rooms with around 20 contestants. And this room assignments
for algorithm contests really only matter for the challenge
phase.
In order to observe what decisions a rational person makes
under the competitive environment, we concentrate more on
the behaviors and phenomena occurred in division I. Because
we assume that members allowed to enter into division I are
already familiar with TopCoder and the algorithm contest, thus
they will make decisions rationally and reasonably during algorithm challenges. And the simultaneous game of our attention
occurs in the virtual room during the algorithm challenge phase
in the form of challenges.
Additionally, for a challenge at the challenge phase, any player
can be a challenger, and also be challenged at the same time.
And in our algorithm game, the number of challenges between
two contestants can be zero, one, and two.
The probability of a player i to win the challenge against j
is denoted by P (Xi,j ). In our game model, we assume that a
challenge entails a cost on its initiator such as time or financial
expense incurred by the effort of finding test cases against other
contestants’ programs. We define C(Xi,j ) to represent the cost
of the challenge between contestants i and j. Specifically, if
contestant i tries to challenge contestant j with a successful
probability of P (Xi,j = 1), player i’s effort incurs a cost
Ci (Xi,j ); and conversely, if contestant j aims to challenge
contestant i with a winning probability of P (Xj,i = 1), the
cost of player j is denoted as Cj (Xj,i ).
In each virtual room of algorithm challenges, under the
strategy π 0 where no challenge occurs, the utility for every
player is zero. Otherwise, we assume that each participant tries
to maximize its expected utility with probabilities. During the
challenge phase, the expected utility of every player iP is
the expected gain from the challenges that he gets involved
in minus the cost. And the gain that a player can achieve
from a challenge depends upon his role in the challenge.
In the following, we discuss the utility/payoff functions for
contestants under the scenario that player i initiates a challenge
to player j.
Define πij as the strategy profile where contestant i initiates
the offense against j while contestant j doesn’t launch an
active offense back. Then the gain for contestant i is,
B. Description of Multiple-person Game Model
Consider n participants playing in a simultaneous game in
the form of challenges for some tempting rewards. This game
is modelled by a tuple P, {Σi }, {μi }, where P represents the
set of n players, Σi denotes the finite strategy space for player
iP , μi (·) is the utility function of player i. Furthermore, we
define π = (π1 , · · ·, πn ) as the strategy profile which is an
association of strategies made by the n players. The strategy
profile π can be described by a directed graph Gπ = (P, E),
where the set of nodes P = {p1 , p2 , · · · , pn } is the equivalent of the set of players, and the set of directed edges
E = {(pi , pj )| i challenges j s code at the challenge phase}
represents the challenges in π.
A player i is characterized by his skill level si and
resources ri in the form of scores for his performance at
the coding phase. And we presume that a player can only
challenge one at a time. That is, players are not allowed
to challenge more than one person simultaneously. Thereby,
for each player there are n available options or actions:
not challenge or choose one of the rest n − 1 players to
challenge. During the game, a successful challenge is awarded
by return R, while an unsuccessful one will be penalized
by loss L. Here, we assume that the game or contest is an
application of complete information game theory, which means
that every player knows the strategies and payoffs available to
all players in the same contest. For simplicity, we presume
that a contestant is related to one solution of some difficulty
level. We think this assumption is reasonable as challenges can
happen only once at a time. In the meanwhile, only one of the
source code solutions can be challenged.
The beginning of the algorithm challenge phase is defined
as the model’s initial state π 0 , where there are no challenges.
And every contestant i has an associated ability power si and
resources ri in the form of scores obtained in algorithm coding
phase. In every strategy profile π = π 0 , we say that there is
a challenge between i and j if either contestant i challenges
contestant j or contestant j challenges contestant i.
Gaini (Xi,j ) =
⎧
⎨50
⎩
for Xi,j = 1
(3.2)
−25
for Xi,j = 0
Therefore, the (expected) utility of player i in πij is
μi (πij )
=
=
P (Xi,j ) · Gaini (Xi,j ) − Ci (Xi,j )
P (Xi,j = 1) · Gaini (Xi,j = 1)
(3.3)
+ P (Xi,j = 0) · Gaini (Xi,j = 0) − Ci (Xi,j )
And the gain for contestant j is,
⎧
⎨−rj
Gainj (Xi,j ) =
⎩
0
for Xi,j = 1
(3.4)
for Xi,j = 0
Therefore the (expected) utility of player j in πij is
μj (πij )
=
=
=
341
P (Xi,j ) · Gainj (Xi,j )
P (Xi,j = 1) · Gainj (Xi,j = 1)
+ P (Xi,j = 0) · Gainj (Xi,j = 0) (3.5)
P (Xi,j = 1) · Gainj (Xi,j = 1)
2: μi (πik ) when player i challenges player k out of N − 1
players and no one of N − 1 players challenges player i;Case
3: μi (πji ) when player i adopts the defensive strategy and
player j out of N − 1 players challenges player i;Case 4:
0 when neither player i nor player j chooses to launch an
offense.
No matter what strategy the rival out of N − 1 players
chooses, what player i needs to evaluate is whether μi (πik ) is
positive. If there exists such a player k strategy profile πik is
the optimal strategy for player i. Otherwise, if for any player
k, μi (πik ) < 0, then the best response of player i is to stay
defensive instead of launching challenges. In the case when
μi (πik ) = 0, then player i can take either the offensive strategy
or the defensive strategy.
Using the Eq.(3.3), we can further derive the expression of
μi (πik ) as the following:
C. Equilibrium of Multiple-person Game Model
Which strategy will contestants tend to select during the
algorithm challenge phase? Do they always adopt an attacking
strategy or a defensive one? What are the optimal strategies for
each of them? To answer such problems, we should analyze
the Nash Equilibrium of our multiple-person game model.
Definition 1 (Nash Equlilibrium:) A strategy profile π ∗ =
∗
(π1 , π2∗ , · · · , πn∗ ) is a Nash Equilibrium(NE) if for each player
i, every other possible strategy profile can’t achieve higher
utility than π ∗ , i.e., ∀iP, μi (π1∗ , · · · , πi , · · · , πn∗ ) ≤ μi (π ∗ ).
Although it’s difficult to solve the Nash Equilibrium for
the multiple-person game directly, one can take advantage of
the characteristics of the problem to simplify the computing
procedure. And as previously discussed, the players in our
game are not able to challenge more than one person at a time.
Therefore, each player’s utility consists of two major parts: the
utility yield by his offense against his rival and loss incurred
by the offense against him.
μi (π) = μi (πi,j ) +
Σn−1
j=1 Gaini (Xj,i )
Σn−1
j=1 Xj,i
μi (πik )
(3.6)
Apparently, player i has no option to change the second part
because it is totally depends upon the decisions made by the
other players in the same room. The player can only decide
whether to be a challenger. If his decision is positive, he
needs to evaluate the capability of the other players and select
the most promising target to launch an offense for his gain.
According to this assumption, we can simply reduce the Nash
Equilibrium analysis of the multiple-person game into that of
a two-player games ( one player against N − 1 players). In
this paper, we focus on the directed utility output of these
2-player games, which is enough to achieve the main goal
of our research: study the contestants’ behaviors under the
competitive circumstance. Then we extract the competitions
between two players from our multiple-person game model,
and the one-vs-multi game’s utility matrix is as follows (see
TableI).
In the below game matrix, {C1 , · · · , Ci−1 , Ci+1 , · · · , Cn }
denotes the challenge strategies that are available to player i.
And C represents the offensive strategy that one constant out
of N − 1 players can adopt to challenge player i, while D
represents the defensive strategy.
In order to know the contestants’ rational actions in the
face of competitions, we try to infer the contestants’ optimal
strategies by solving the mathematical matrix’s Nash Equilibrium, including the pure strategy Nash Equilibrium and the
mixed strategy Nash Equilibrium.
Theorem 1 (Optimal Strategy): The optimal strategy πik
for player i satisfies the following constraints:
=
=
P (Xi,j ) · Gaini (Xi,j ) − Ci (Xi,j )
P (Xi,k = 1) · Gaini (Xi,k = 1)
+P (Xi,k = 0) · Gaini (Xi,k = 0) − Ci (Xi,k )
= P (Xi,k = 1) · Gaini (Xi,k = 1)
+[1 − P (Xi,k = 1)] · Gaini (Xi,k = 0)
−Ci (Xi,k )
= P (Xi,k = 1)
·[Gaini (Xi,k = 1) − Gaini (Xi,k = 0)]
+Gaini (Xi,k = 0) − Ci (Xi,k )
(3.7)
The rule of the challenge phase at the TopCoder algorithm
contests has specified the constant value of Gaini (Xi,k = 1)
and Gaini (Xi,k = 1) in the Eq.(3.2). Thus, the variables in
Eq.(3.7) are the probability of winning the challenge P (Xi,k =
1) and Ci (Xi,k ), the normalized cost of the effort to find
the valid test case. From Eq.(3.7), we have the condition for
the player i to have a positive utility from the challenge:
μi (πik ) > 0 if and only if P (Xi,k = 1) > Vi (Xi,k ), where
C (X )−Gaini (Xi,k =0)
. The player i evaluVi (Xi,k ) = Gainii (Xi,k
i,k =1)−Gaini (Xi,k =0)
ates all the possible strategies where the winning probability
P (Xi,k = 1) is higher than the threshold value Vi (Xi,k ). To
achieve the maximum expected utility, he needs to select the
weakest rival k against whom he can find the valid test case
with the highest winning probability P (Xi,k = 1) and lowest
effort Ci (Xi,k ).
When the probability of challenging every rival is lower
than the threshold value, i.e., ∀k, k = i, P (Xi,k = 1) <
Vi (Xi,k ), the player i will avoid the offensive strategy because
the offense utility μi (πik ) < 0. Therefore the player has no
incentive to challenge other players. If such a condition exists
for every player, i.e., ∀i, k, k = i, P (Xi,k = 1) < Vi (Xi,k ),
every player prefers to stay defensive, thus keeping the strategy
profile at π 0 .
(1)∀j, j = k,
P (Xi,j = 1) ≤ P (Xi,k = 1), Ci (Xi,k ) ≤ Ci (Xi,j )
Ci (Xi,k ) − Gaini (Xi,k = 0)
(2)P (Xi,k = 1) >
Gaini (Xi,k = 1) − Gaini (Xi,k = 0)
4. DATA A NALYSIS OF T OP C ODER SRM C HALLENGES
As mentioned in the previous section, there are Nash Equilibria in our multiple-person game model. If the probability
of a successful challenge and the cost of challenging are
known with other factors fixed, it is obvious for contestants
to calculate their optimal strategies. To validate the theoretic
result, we need to collect the dataset from the TopCoder platform and make further investigation to the winning probability
According to the game form in TableI, there are four major
cases for the utility of every player i:Case 1: μi (πik )+μi (πji )
when player i challenges player k out of N − 1 players and
player j out of N − 1 players also challenges player i;Case
342
TABLE I: One-VS-Multi Game
Contestant j from N-1 players
C
C1
.
.
.
Contestant i
Cj
.
.
.
Cn
D
μi (πi1 ) + μi (πji )
μ1 (πi1 ) + μj (πji )
D
μi (πi1 )
.
.
.
.
.
.
μi (πij ) + μi (πji )
.
.
.
μj (πij ) + μj (πji )
.
.
.
μi (πij )
.
.
.
.
.
.
μi (πin ) + μi (πji )
μi (πji )
μ1 (πi1 )
μj (πij )
.
.
.
μn (πin ) + μj (πji )
μj (πji )
.
.
.
μi (πin )
μn (πin )
0
0
P (Xi,j = 1) and the challenge cost Ci (Xi,j ).
As the winning probability is a latent variable in the process
of TopCoder SRM contests, it’s not intuitive for us to tell
whether a participant is willing to take the offensive strategy
or the defensive one. We want to use statistical analysis on the
SRM dataset to answer the following research questions:
Q1: What factors are relevant to the probability to win a
challenge?
Q2: What is the winning probability determined by? Can
it simply be expressed as the ratio between the allocated skill
i
levels of the contestants, such as P (Xi,j = 1) = si s+s
for
j
challenges between contestants i and j? Or does there exist
some certain mathematical formula for the probability?
In order to solve these problems, we implement a Web
crawler to download data of the TopCoder algorithm contests
held before Feb 20th 2014. For each contest, we focus on the
dataset that is relevant to the algorithm challenge phase. And
this data collection is composed of records with 19 attributes
including contest ID, division ID, room ID, room name, challenger ID, challenger name, defender ID, defender name, the
challenged solution, the difficulty level of the solution, given
scores for the challenged solution, challenger’s old rating,
defender’s old rating, challenger’s coding time, defender’s
coding time, challenger’s solution scores, defender’s solution
scores, challenge time, and the result of a challenge (succeeded
or not). Note each player has an algorithm rating that is
calculated by TopCoder based on his historical performance
in the algorithm contests. The value can be measured as the
player’s algorithm programming ability.
After the data cleaning process to remove invalid data items
such as null values, we get 95477 challenge items, which
involve 576 algorithm contests. Among these contests, contest
14514 has the largest number of challenges with a value of 853,
while the smallest number of challenges is 1. We replaced the
values of “Yes” and “No” with 1 and -1 for the result that
marks the success or failure of the challenge. In addition to
these above extracted properties, we guess that the difference
value of both players’ rating involved in the same challenge
may also be important. Therefore, we computed the difference
value for every challenge item and added it into subsequent
analyses. On the dataset, we apply correlation analysis to study
primary factors related to the result of a challenge.
Fig. 2: Correlation analysis output1 for algorithm challenges
Fig. 3: Correlation analysis output2 for algorithm challenges
A. Correlation Analysis for Algorithm Challenges
From Figure2 and Figure3, one can find that the result
of a challenge (succeeded or not) has significant correlation
with the old rating, coding time, solution scores for both
challenger and defender, challenge time, and the difference
value of the two competitors’ ratings. Among these factors,
the difference value of ratings and the challenger’s old rating
are the top two elements that has the strongest influence
on the challenge result. The correlations for the challenger’s
rating and submission scores are positive, while the defender’s
old rating and challenge time are negative. These correlation
results are in line with our intuition. Specifically, as the
challenger’s rating is an indicator of his programming skill,
a challenger with a high rating seems more likely to find the
bugs in his rival’s code and successfully launch an offense
if his rival is a low-ranked defender. Moreover, the negative
correlation of the challenge time with the challenge outcomes
343
Fig. 4: The relationship
p between the result and the skill ratio
Fig. 6: Win/loss ratio of challengers
the rating system on TopCoder[26], we divide contestants into
three rating levels with different colors–red, yellow and blue
(see Figure5). And we collectively described these levels as
rank1(2200+), rank2(1500-2199) and rank3(1200-1499) in the
paper.
H1: A contestant with a high rating is more likely to
make a successful challenge.
As the result of the correlation analysis indicates, the challenger’s old rating may have a positive and significant impact
on the result of a challenge. To further test the hypothesis H1,
we compute the number and ratio of challengers under the
two cases (win and loss) respectively according to the rating
division rule. And Figure6 shows the results.
As shown in the Figure6, one can see that the percentage
of successful challenges increases along with the growth of
challengers’ rating level. Therefore, we believe the assumption
that a contestant with a high rating is more likely to make a
successful challenge is valid.
H2: High-ranked contestants seem to challenge more,
and relatively low-ranked contestants are more likely to be
chosen as rivals.
According to the correlation analysis, the difference value
of ratings and the challenger’s old rating are the top two
factors with the strongest influence on the challenge result.
In other words, it seems easier for a challenger with a higher
rating and a greater difference value against the defender to
make a successful challenge. Based on this observation, we
have reason to believe that high-ranked contestants are more
motivated to challenge with a high winning probability, while
low-ranked contestants are more likely to be chosen as rivals.
On the basis of the rating division, we counted the number of
challengers and defenders respectively to validate the hypothesis H2. The outcomes and the frequency distribution histogram
for the difference value are shown below.
According to Figure7(a), the number of challengers doesn’t
go up with the rise of rating. Instead, players in the middle
rating level(rank2) are the biggest group who actively launches
the challenges. One possible explanation is that a player’s
decisions about offense are affected by psychological factors
such as the confidence of the player. Specifically, a player in
rank3 often has less experience in TopCoder algorithm contests
and feels less confidence to launch a risky challenge against
the programs submitted by peers with more experiences.
Moreover, a player in rank1 may think that his performance in
the coding phase is perfect enough to stand out from the rest
over the competition, thus opting out to make an extra effort
in the challenge phase.
And Figure7(b) indicates that the distribution of defenders
Fig. 5: Algorithm rating division
implies that within the shorter challenge time the challenger
i have more chances to find a valid test case to fail the rival
j’s code, as the total challenge time is limited to 15 minutes.
Given the fact that the challenge time is proportional to the
effort cost Ci (Xi,j ), the lower cost reduces the threshold value
for the player i to make the decision of an offense.
With the correlation analysis result, we remove some
irrelevant factors and retain the rest for follow-up data analysis.
Then we attempt to find if there exists some explicit mathematical expression for the challenge result or the probability
in order to make some predictions. Specifically, if explicit
formula is founded to compute the winning probability and
the challenge cost for challengers, one can infer contestants’
decision-making under the circumstances of competitions in
advance.
Firstly we examine if the ratio between the allocated skill
levels of the players simply can express the winning probability
well. And then we test our expectation by computing the ratio
of the allocated skill levels and doing the correlation analysis
between the factor succeeded or not and the factor skill ratio.
The result (see Figure4) shows a significant positive correlation
between them with a relatively low coefficient value 0.209.
Thus, the ratio of the allocated skill levels for players simply is
not enough to express the winning probability well. And then
we explore to use all the retained significant factors to find
some proper mathematical formula to express the probability
by applying a few regression and classification methods in
statistics including linear regression, multi-variable regression,
logistic regression, principal component analysis(PCA), and
support vector machine(SVM). However, none of the methods
can yield a certain prediction formula to describe the functional
relationship between the challenge result and the primary
factors. For this, there is one possibility that some key factors
which are not open to the public have not yet been included
in our data collection, i.e., the submitted solution source code.
If so, we should ask the TopCoder to share more data with us
for further research.
B. Empirical Analysis for Algorithm Challenges
Based on the above-mentioned outcomes of our multipleperson game model and correlation analysis, we propose the
following three hypotheses to study the contestants’ competitive behaviors in TopCoder algorithm contests. According to
344
(a)
(a)
(b)
(b)
Fig. 8: The number and ratio of challenges for the problems
with different difficulty levels
By observing the behaviors of contestants in algorithm
contests on TopCoder, we find that in general the number of
challenges is low compared to the scale of participation in
these competitions. There are about 20 members in a virtual
competitive room, while the mean value of challenges is about
10. Moreover, the number of challenges with the highest
frequency is 2. As for this phenomena, we think that for
contestants in division I with a rating more than 1200, the
challenge cost seems too expensive for them to make offensive
challenges. During the algorithm challenge phase, there are
solutions for three problems with different levels of difficulty.
We cross examine the challenged number and ratio for the
three different categories respectively.
Figure8 illustrates that both the number and ratio of
challenged solutions decrease with the difficulty level of
corresponding problems increases. It’s easy to understand, as
the challenge cost will inevitably increase along with the
rise of challenged solutions’ difficulty, and then it becomes
harder for the winning probability to exceed the threshold
value Vi (Xi,j ). Consequently, fewer contestants are willing
to challenge others. That is to say, fewer contestants will
participant in the algorithm challenge phase with the increase
of cost (leaving the other factors unchanged).
(c)
Fig. 7: The frequency distributions for challengers, defenders,
and the difference value
is quite different from the distribution of the challengers. The
lower rating a player has, the more likely he is going to be
challenged by other players with superior ratings.
And from Figure7(c), one see that the distribution for the
difference value of players’ ratings is very similar to a normal
distribution. In other words, players are not always apt to
challenge one with the lowest skill level. On the contrary,
they may choose to challenge the greatest players in the same
competition room. In our perspective, the reason they behave
like this is that they are ambitious to compete in order to prove
themselves and see who’s the best, as Jeff Howe ever said [27].
H3: contestants are less likely to challenge with the rise
of the difficulty level.
From the Nash Equilibrium solution of our multiple-person
game model, we obtain that if the probability P (Xi,j = 1) for
a player to make a successful challenge exceeds the threshold
value Vi (Xi,j ), then he would like to select the challenge
strategy. And the threshold value is mainly determined by the
effort cost for finding a valid test case. Obviously, it takes more
effort to challenge the submitted solution with a high difficulty
level, thus resulting in the higher cost. Therefore, the expensive
cost renders the higher threshold value that discourage a player
from launching offensive challenges for difficult programming
problems.
5. C ONCLUSION AND F UTURE W ORK
The paper presents a multiple-person game model to study
the competitive behaviors and phenomena occurred during
algorithm challenges on TopCoder by applying complete information game theory. With the Nash Equilibrium solution,
we find that if a contestant’s probability to make a successful
challenge exceeds the threshold value related to the cost of
launching such a challenge, he will always decide to challenge.
345
[9] S. Nag, I. Heffan, A. Saenz-Otero, and M. Lydon,
“Spheres zero robotics software development: Lessons on
crowdsourcing and collaborative competition,” in IEEE
Conference Publications, 2012.
[10] K. Mao, Y. Yang, M. Li, and M. Harman, “Pricing
crowdsourcing-based software development tasks,” in
Proceedings of the 2013 International Conference on
Software Engineering, 2013, pp.1205–1208.
[11] K. Li, J. Xiao, Y. Wang, and Q. Wang, “Analysis of
the key factors for software quality in crowdsourcing
development: An empirical study on topcoder.com,” in
Proceedings of the 2013 IEEE 37th Annual Computer
Software and Applications Conference, 2013, pp.812–
817.
[12] K. J. Boudreau, K. Lakhani, and M. E. Menietti, “Performance responses to competition across skill-levels in
rank order tournaments: Field evidence and implications
for tournament design,” 2014.
[13] A. F. M. H. Y. J. W. M. F. Sarro and Y. Zhang, “App store
analysis: Mining app stores for relationships between
customer, business and technical characteristics.”
[14] http://en.wikipedia.org/wiki/Game theory.
[15] N. Archak and A. Sundararajan, “Optimal design of
crowdsourcing contests,” in ICIS, 2009.
[16] S. Chawla, J. D. Hartline, and B. Sivan, “Optimal
crowdsourcing contests,” in Proceedings of the Twentythird Annual ACM-SIAM Symposium on Discrete
Algorithms, 2012, pp. 856–868.
[17] T. X. Liu, J. Yang, L. A. Adamic, and Y. Chen, “Crowdsourcing with all-pay auctions: A field experiment on
taskcn,” in Proceedings of the American Society for
Information Science and Technology, 2011.
[18] DiPalantino, Dominic, Vojnovic, and Milan, “Crowdsourcing and all-pay auctions,” in Proceedings of the
10th ACM Conference on Electronic Commerce, 2009,
pp. 119–128.
[19] Y. Zhang and M. van der Schaar, “Reputation-based incentive protocols in crowdsourcing applications,” CoRR,
vol. abs/1108.2096, 2011.
[20] G.
Ranade
and
L.
Varshney,
“To
crowdsource or not to crowdsource?” 2012.
The theoretical model is validated by empirical analysis of the
dataset about the challenge phase of TopCoder SRM contests. Additionally, the analytical results indicate the following
conclusions: (1) Both the difference rating value between
the challenger and defendant as well as the challenger’s old
rating have a relatively strong and positive influence on the
result of a challenge. With the proficient programming skills,
a contestant with a high rating is more likely to deliver a
successful challenge. (2) Contestants with mid-level ranking
tend to challenge more, and relatively low-ranked contestants
are easier to be chosen as rivals. But contestants with the
highest rating may be unwilling to challenge due to some
psychological factors. (3) Fewer contestants tend to be active
in the algorithm challenge phase with the increase of the cost
for initiating challenges.
The research results in the paper presents better understanding of competitive behaviors in the offense-defense process of software crowdsourcing. Nonetheless, there are still
many open questions to be addressed. In future work, we
can make more effort on the data collection to find the key
determinative factors to express the winning probability in an
explicit mathematical formula. More attributes for challenges,
especially the source code and test case in the challenges, need
to be incorporated to our study to reveal the latent rational of
challenge decisions in terms of the winning probability and
the challenge cost. More importantly, these program-related
attributes may enable us to assess the effectiveness of offensedefense based quality assurance in the scenarios of TopCoder
SRM challenges.
ACKNOWLEDGMENT
This work is funded by National High-Tech R&D Program
of China (Grant No. 2013AA01A210) and the State Key
Laboratory of Software Development Environment (Funding
No. SKLSDE-2013ZX-03).
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
R EFERENCES
Crowdsourcing
in
2014:
With
Great
Power
Comes
Great
Responsibility.
http://www.wired.com/2014/01/crowdsourcing-2014great-power-comes-great-responsibility/
Crowdsourcing. http://en.wikipedia.org/wiki/Crowdsourcing
TopCoder. http://www.topcoder.com/
Z. Hu and W. Wu, “A game theoretic model of software
crowdsourcing,” in 2014 IEEE 8th International Symposium on Service Oriented System Engineering (SOSE),
April 2014, pp. 446–453.
K. R. Lakhani, D. A. Garvin, and E. Lonstein., “Topcoder (a): Developing software through crowdsourcing,”
Harvard Business School, Tech. Rep., January 2010.
W. Wu, W.-T. Tsai, and W. Li, “An evaluation
framework for software crowdsourcing,” Frontiers of
Computer Science, vol. 7, no. 5, pp. 694–709, 2013.
D. Fried, “Crowdsourcing in the software development
industry,” 2010.
N. Archak, “Money, glory and cheap talk: Analyzing
strategic behavior of contestants in simultaneous
crowdsourcing contests on topcoder.com,” in Proceedings
of the 19th International Conference on World Wide
Web, 2010, pp. 21–30.
http://www.aaai.org/ocs/index.php/WS/AAAIW12/paper/view/5241
[21] D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcing
to smartphones: Incentive mechanism design for mobile
phone sensing,” in Proceedings of the 18th Annual
International Conference on Mobile Computing and
Networking, 2012, pp. 173–184.
[22] B. Hoh, T. Yan, D. Ganesan, K. Tracton, T. Iwuchukwu,
and J.-S. Lee, “Trucentive: A game-theoretic incentive
platform for trustworthy mobile crowdsourcing parking
services,” in Proceedings of IEEE ITSC, 2012.
[23] A. Ghosh, “Social computing and user-generated content:
A game-theoretic approach,” SIGecom Exch., vol. 11,
no. 2, pp. 16–21, 2012.
[24] http://blog.csdn.net/touzani/article/details/1633572.
[25] http://apps.topcoder.com/wiki/display/tc/Algorithm+Overview.
[26] Algorithm Rating System. http://help.topcoder.com/datascience/srm-and-mm-rating-systems/algorithm-ratingsystem/
[27] A. Bingham and D. Spradlin, Case Study: Virtual Software Development: How TopCoder Is Rewriting the
Code. FT Press, 2011, ch. 7.
346