2015 IEEE Symposium on Service-Oriented System Engineering Game Theoretic Analysis for Offense-Defense Challenges of Algorithm Contests on TopCoder Zhenghui Hu and Wenjun Wu State Key Laboratory of Software Development Environment Department of Computer Science and Engineering, Beihang University Beijing, China 100191 [email protected] [email protected] Abstract—Software crowdsourcing platforms such as TopCoder have successfully adopted the offense-defense based quality assurance mechanism into software development process to deliver high-quality software solutions. TopCoder algorithm contests run single-round matches (SRM) with the challenge phase that allows participants to find bugs in the submitted program and eliminate their opponents. In the paper, we introduce a game theoretic model to study the competitive behaviors in the challenge phase of SRM. By analyzing the Nash Equilibrium of our multiple-person game model, we find that the probability of making a successful challenge and effort cost are the major factors for contestants’ decisions. To verify the theoretic result, we make empirical data analysis with the dataset collected from the algorithm challenge phase on TopCoder. And the results indicate that contestants with a high rating are more likely to launch challenges against lower ones. However, contestants with the highest rating may be unwilling to challenge to avoid risks of losing their points in the contests. software crowdsourcing, this paper focus on studying the offense-defense occurred during the algorithm contests on the TopCoder platform. TopCoder, a crowdsourcing development community, has attracted 716,180 registered members by December 2014 [3]. It applied the competitive and interesting programming contest as the kernel of the platforms software development pattern. On TopCoder, a project will be divided into several stages and presents people with various categories of contests, such as conception, specification, architecture, design, development, assembly, and test suites. Among these types of contests, the algorithm contest type is regarded as the most important programming contest in TopCoder, which is used as an effective and primary way to attract and retain the community members. Algorithm contests, hosted fortnightly on TopCoder, are designed to encourage wide participation by the TopCoder community. And single round match (SRM), the main form of algorithm competitions, consists of three phases including the coding phase, the challenge phase, and system test phase. The major goal of the challenge phase is to collect valid test cases for finding the faulty programs submitted by participants at the coding phase. It reflects strong min-max interaction where players take offensive actions against their peers and eliminate the weaker players. Therefore, such a contest provides us great opportunity to observe and analyze the competitive behaviors of contestants during the real offense-defense scenario such as algorithm challenge phase. In our previous research on the TopCoder’s algorithm contests [4], we proposed a two-person competitive model on the basis of game theory. In order to make further investigation into the behaviors and performances of contestants during the algorithm challenge phase, we extend the model to a multipleperson game model in the paper, along with empirical analyses. The rest of the paper is organized as follows. Section 2 provides the overview of related work. Section 3 specifically introduces our multiple-person game model for software crowdsourcing. Section 4 presents empirical data analysis of SRM history on TopCoder to validate the game theoretic model. And we conclude the paper and discuss the future work in section 5. 1. I NTRODUCTION Crowdsourcing is a powerful paradigm that allows the wisdom of crowds to apply to more problems with faster and better results [1]. As a new and efficient innovation, it has already demonstrated its capability of being a problemsolving mechanism for various groups, such as government, business, nonprofit use, researchers, artists, and even software design and development [2]. Obtaining quality software is a common but challenging goal for software crowdsourcing projects. Software quality has many dimensions including reliability, performance, security, safety, maintainability, controllability, and usability. Based on the observation of success practices of crowdsourcing software tests such as TopCoder(www.topcoder.com) and uTest(www.utest.com), we find that the offense-defense based quality assurance is one of the most fundamental way to eliminate the potential deficits in the documents, models and codes submitted by crowd. Multiple groups from the community undertake different responsibilities of software development tasks and intensively cooperate with each other to find problems in their counterpart’s work and reduce the bugs in their own work. Although research efforts have been made to study the theory and practices of software crowdsourcing, few have been done about the offense-defense quality assurance. Many research questions remain such as the major decision factor for participants to launch offenses against their peers and the effectiveness of the offense-defense mechanism on the quality assurance. In order to investigate the behaviors of participants in the scenario of offense-defense scenario of 978-1-4799-8356-8/15 $31.00 © 2015 IEEE DOI 10.1109/SOSE.2015.44 2. R ELATED W ORK This section reviews the related work on the two main areas: software development crowdsourcing and game theory’s applications in crowdsourcing. 339 A. Software Development Crowdsourcing Crowdsourcing has been more and more popular for people to deal with various issues that require human intelligence successfully. And researchers are promoted to apply it in software engineering, one of the most challenging and creative activities. Many companies, such as TopCoder, uTest and oDesk, have already adopted the new development mode–software crowdsourcing in different forms. TopCoder and uTest select members to solve problems, while oDesk serves as an online liaison between clients and freelance software developers [5]. Additionally, successful crowdsourcing platforms such as App Store and TopCoder, have demonstrated the capability and potential of crowdsourcing on supporting various software development activities, coding and testing included [6]. Several researches have already been done for software crowdsourcing. Lakhani, Garvin and Lonstein [5] discussed a variety of issues of TopCoder in detail, such as the evolution of community, the resource management mode, challenges and the future. In [7], competitive and collaborative software frameworks for online crowdsourcing were investigated. Moreover, empirical analyses on TopCoder have been done to analyze strategic behaviors of contestants and collaborative competition in specific software development projects [8, 9]. Ke Mao et al. [10] addressed the important pricing questions occurred during the development of software crowdsourcing, while Wenjun Wu et al. [6] presented an evaluation framework for software crowdsourcing. Lately, [11] analyzed the key factors for software quality in crowdsourcing development, [12] discussed the influence of the competition’s level and a tournament’s size on performance, and the relationships between customer, business and technical characteristics were described in [13]. Nevertheless, crowdsourcing software development is still at its infant stage, and few research efforts have been made on the offense-defense quality assurance. Therefore, it’s essential to make some further work on competitive behaviors during the process of offense-defense in software crowdsourcing. Fig. 1: Algorithm contests on TopCoder 3. M ULTIPLE - PERSON G AME M ODEL FOR S OFTWARE C ROWDSOURCING A. Overview of TopCoder Challenge in SRM For each rated algorithm contest, there are three phases: the coding phase, the challenge phase, and system-testing phase [24, 25]. (1) The Coding Phase The Coding Phase is a timed event where all contestants are presented with the same three algorithm questions representing three levels of complexity, and accordingly, three levels of point earnings potential. Within the time constraint of 75 minutes, contestants have to design a solution and implement it in programming language. Upon submission of their solutions to a problem, the system checks whether the submitted code can successfully compiles. If so, contestants are awarded with points calculated based on the problem difficulty and the elapsed time for accomplishing the solution. (2) The Challenge Phase The Challenge Phase is a offense-defense process with the goal to collect test cases against the submission. It presents each competitor a chance to review their peer’s source code and challenge the functionality of the solutions. Within 15 minutes, a contestant needs to locate the most vulnerable code and decides whether to send a test case to challenge the code. A successful challenge will result in a loss of the original problem submission points by the defendant, and a 50-point reward for the challenger. Unsuccessful challengers will incur a point reduction of 25 points as a penalty, applied against their total score in that round of competition. (3) The System-testing Phase The System-testing Phase is not an interactive process. It is run by an automated tester responsible for checking the submitted source code with specified inputs. All successful challenges from the challenge phase will be added to the sets of inputs for the System-testing Phase. With the test cases from both TopCoder team and the contestants in the challenge phase, the automated tester applies the test cases to all submitted code that has not already been successfully challenged. If the tester finds code that is flawed, the author of that code submission will lose all of the points that were originally earned for that code submission. Additionally, each algorithm contest consists of two B. Game Theory’s Applications in Crowdsourcing Game theory, introduced by von Neumann and Morgenstern, studies mathematical models of conflict and cooperation between intelligent rational decision-makers [14]. It has been used in economics, political science, psychology, logic, and biology, as well as crowdsourcing, by offering techniques for formulating competitions between parties that wish to reach an optimal position. [15] and [16] applied the game theory to design optimal crowdsourcing contests. [17] used all-pay auctions to evaluate the effects of reward size and early high-quality submission on overall participation and submission quality on Taskcn. Other work adopted game theory or all-pay auctions to seek for great incentives for workers to exert effort, the relationship between rewards and participation, and how to obtain perfect outputs with a high quality [18–23]. Most current game theoretic models about software crowdsourcing are focused on design and programming contests where players attempt to outperform others for contest prizes. Few research papers tackle with the issue on the strong competitions through which players can eliminate their opponents. In this paper, we apply the game theory to study and analyze competitive behaviors in the SRM challenges by extending the two-person model defined in [4]. 340 When player i attempts to challenge player j, there are only two possible results: either i’s effort succeeds or fails. Let Xi,j is a Bernoulli random variable representing the outcome of the challenge between i and j as follows: ⎧ ⎨1 if i succeeds (3.1) Xi,j = ⎩ 0 otherwise divisions–division I and division II, each of which hosts contestants based on their rating values. And in each division, three algorithm problems with different difficulty level are presented to participants. They are asked to design programs as proper solutions to the problems within the given time in the coding phase. In division I, the players are more skillful at the contests as they are required to acquire a high rating value greater than or equal to 1200. While the contests in division II with the lower requirement allows less-skilled members, and even novices to participant. In each division, all registered participants are placed into virtual competitive rooms with around 20 contestants. And this room assignments for algorithm contests really only matter for the challenge phase. In order to observe what decisions a rational person makes under the competitive environment, we concentrate more on the behaviors and phenomena occurred in division I. Because we assume that members allowed to enter into division I are already familiar with TopCoder and the algorithm contest, thus they will make decisions rationally and reasonably during algorithm challenges. And the simultaneous game of our attention occurs in the virtual room during the algorithm challenge phase in the form of challenges. Additionally, for a challenge at the challenge phase, any player can be a challenger, and also be challenged at the same time. And in our algorithm game, the number of challenges between two contestants can be zero, one, and two. The probability of a player i to win the challenge against j is denoted by P (Xi,j ). In our game model, we assume that a challenge entails a cost on its initiator such as time or financial expense incurred by the effort of finding test cases against other contestants’ programs. We define C(Xi,j ) to represent the cost of the challenge between contestants i and j. Specifically, if contestant i tries to challenge contestant j with a successful probability of P (Xi,j = 1), player i’s effort incurs a cost Ci (Xi,j ); and conversely, if contestant j aims to challenge contestant i with a winning probability of P (Xj,i = 1), the cost of player j is denoted as Cj (Xj,i ). In each virtual room of algorithm challenges, under the strategy π 0 where no challenge occurs, the utility for every player is zero. Otherwise, we assume that each participant tries to maximize its expected utility with probabilities. During the challenge phase, the expected utility of every player iP is the expected gain from the challenges that he gets involved in minus the cost. And the gain that a player can achieve from a challenge depends upon his role in the challenge. In the following, we discuss the utility/payoff functions for contestants under the scenario that player i initiates a challenge to player j. Define πij as the strategy profile where contestant i initiates the offense against j while contestant j doesn’t launch an active offense back. Then the gain for contestant i is, B. Description of Multiple-person Game Model Consider n participants playing in a simultaneous game in the form of challenges for some tempting rewards. This game is modelled by a tuple P, {Σi }, {μi }, where P represents the set of n players, Σi denotes the finite strategy space for player iP , μi (·) is the utility function of player i. Furthermore, we define π = (π1 , · · ·, πn ) as the strategy profile which is an association of strategies made by the n players. The strategy profile π can be described by a directed graph Gπ = (P, E), where the set of nodes P = {p1 , p2 , · · · , pn } is the equivalent of the set of players, and the set of directed edges E = {(pi , pj )| i challenges j s code at the challenge phase} represents the challenges in π. A player i is characterized by his skill level si and resources ri in the form of scores for his performance at the coding phase. And we presume that a player can only challenge one at a time. That is, players are not allowed to challenge more than one person simultaneously. Thereby, for each player there are n available options or actions: not challenge or choose one of the rest n − 1 players to challenge. During the game, a successful challenge is awarded by return R, while an unsuccessful one will be penalized by loss L. Here, we assume that the game or contest is an application of complete information game theory, which means that every player knows the strategies and payoffs available to all players in the same contest. For simplicity, we presume that a contestant is related to one solution of some difficulty level. We think this assumption is reasonable as challenges can happen only once at a time. In the meanwhile, only one of the source code solutions can be challenged. The beginning of the algorithm challenge phase is defined as the model’s initial state π 0 , where there are no challenges. And every contestant i has an associated ability power si and resources ri in the form of scores obtained in algorithm coding phase. In every strategy profile π = π 0 , we say that there is a challenge between i and j if either contestant i challenges contestant j or contestant j challenges contestant i. Gaini (Xi,j ) = ⎧ ⎨50 ⎩ for Xi,j = 1 (3.2) −25 for Xi,j = 0 Therefore, the (expected) utility of player i in πij is μi (πij ) = = P (Xi,j ) · Gaini (Xi,j ) − Ci (Xi,j ) P (Xi,j = 1) · Gaini (Xi,j = 1) (3.3) + P (Xi,j = 0) · Gaini (Xi,j = 0) − Ci (Xi,j ) And the gain for contestant j is, ⎧ ⎨−rj Gainj (Xi,j ) = ⎩ 0 for Xi,j = 1 (3.4) for Xi,j = 0 Therefore the (expected) utility of player j in πij is μj (πij ) = = = 341 P (Xi,j ) · Gainj (Xi,j ) P (Xi,j = 1) · Gainj (Xi,j = 1) + P (Xi,j = 0) · Gainj (Xi,j = 0) (3.5) P (Xi,j = 1) · Gainj (Xi,j = 1) 2: μi (πik ) when player i challenges player k out of N − 1 players and no one of N − 1 players challenges player i;Case 3: μi (πji ) when player i adopts the defensive strategy and player j out of N − 1 players challenges player i;Case 4: 0 when neither player i nor player j chooses to launch an offense. No matter what strategy the rival out of N − 1 players chooses, what player i needs to evaluate is whether μi (πik ) is positive. If there exists such a player k strategy profile πik is the optimal strategy for player i. Otherwise, if for any player k, μi (πik ) < 0, then the best response of player i is to stay defensive instead of launching challenges. In the case when μi (πik ) = 0, then player i can take either the offensive strategy or the defensive strategy. Using the Eq.(3.3), we can further derive the expression of μi (πik ) as the following: C. Equilibrium of Multiple-person Game Model Which strategy will contestants tend to select during the algorithm challenge phase? Do they always adopt an attacking strategy or a defensive one? What are the optimal strategies for each of them? To answer such problems, we should analyze the Nash Equilibrium of our multiple-person game model. Definition 1 (Nash Equlilibrium:) A strategy profile π ∗ = ∗ (π1 , π2∗ , · · · , πn∗ ) is a Nash Equilibrium(NE) if for each player i, every other possible strategy profile can’t achieve higher utility than π ∗ , i.e., ∀iP, μi (π1∗ , · · · , πi , · · · , πn∗ ) ≤ μi (π ∗ ). Although it’s difficult to solve the Nash Equilibrium for the multiple-person game directly, one can take advantage of the characteristics of the problem to simplify the computing procedure. And as previously discussed, the players in our game are not able to challenge more than one person at a time. Therefore, each player’s utility consists of two major parts: the utility yield by his offense against his rival and loss incurred by the offense against him. μi (π) = μi (πi,j ) + Σn−1 j=1 Gaini (Xj,i ) Σn−1 j=1 Xj,i μi (πik ) (3.6) Apparently, player i has no option to change the second part because it is totally depends upon the decisions made by the other players in the same room. The player can only decide whether to be a challenger. If his decision is positive, he needs to evaluate the capability of the other players and select the most promising target to launch an offense for his gain. According to this assumption, we can simply reduce the Nash Equilibrium analysis of the multiple-person game into that of a two-player games ( one player against N − 1 players). In this paper, we focus on the directed utility output of these 2-player games, which is enough to achieve the main goal of our research: study the contestants’ behaviors under the competitive circumstance. Then we extract the competitions between two players from our multiple-person game model, and the one-vs-multi game’s utility matrix is as follows (see TableI). In the below game matrix, {C1 , · · · , Ci−1 , Ci+1 , · · · , Cn } denotes the challenge strategies that are available to player i. And C represents the offensive strategy that one constant out of N − 1 players can adopt to challenge player i, while D represents the defensive strategy. In order to know the contestants’ rational actions in the face of competitions, we try to infer the contestants’ optimal strategies by solving the mathematical matrix’s Nash Equilibrium, including the pure strategy Nash Equilibrium and the mixed strategy Nash Equilibrium. Theorem 1 (Optimal Strategy): The optimal strategy πik for player i satisfies the following constraints: = = P (Xi,j ) · Gaini (Xi,j ) − Ci (Xi,j ) P (Xi,k = 1) · Gaini (Xi,k = 1) +P (Xi,k = 0) · Gaini (Xi,k = 0) − Ci (Xi,k ) = P (Xi,k = 1) · Gaini (Xi,k = 1) +[1 − P (Xi,k = 1)] · Gaini (Xi,k = 0) −Ci (Xi,k ) = P (Xi,k = 1) ·[Gaini (Xi,k = 1) − Gaini (Xi,k = 0)] +Gaini (Xi,k = 0) − Ci (Xi,k ) (3.7) The rule of the challenge phase at the TopCoder algorithm contests has specified the constant value of Gaini (Xi,k = 1) and Gaini (Xi,k = 1) in the Eq.(3.2). Thus, the variables in Eq.(3.7) are the probability of winning the challenge P (Xi,k = 1) and Ci (Xi,k ), the normalized cost of the effort to find the valid test case. From Eq.(3.7), we have the condition for the player i to have a positive utility from the challenge: μi (πik ) > 0 if and only if P (Xi,k = 1) > Vi (Xi,k ), where C (X )−Gaini (Xi,k =0) . The player i evaluVi (Xi,k ) = Gainii (Xi,k i,k =1)−Gaini (Xi,k =0) ates all the possible strategies where the winning probability P (Xi,k = 1) is higher than the threshold value Vi (Xi,k ). To achieve the maximum expected utility, he needs to select the weakest rival k against whom he can find the valid test case with the highest winning probability P (Xi,k = 1) and lowest effort Ci (Xi,k ). When the probability of challenging every rival is lower than the threshold value, i.e., ∀k, k = i, P (Xi,k = 1) < Vi (Xi,k ), the player i will avoid the offensive strategy because the offense utility μi (πik ) < 0. Therefore the player has no incentive to challenge other players. If such a condition exists for every player, i.e., ∀i, k, k = i, P (Xi,k = 1) < Vi (Xi,k ), every player prefers to stay defensive, thus keeping the strategy profile at π 0 . (1)∀j, j = k, P (Xi,j = 1) ≤ P (Xi,k = 1), Ci (Xi,k ) ≤ Ci (Xi,j ) Ci (Xi,k ) − Gaini (Xi,k = 0) (2)P (Xi,k = 1) > Gaini (Xi,k = 1) − Gaini (Xi,k = 0) 4. DATA A NALYSIS OF T OP C ODER SRM C HALLENGES As mentioned in the previous section, there are Nash Equilibria in our multiple-person game model. If the probability of a successful challenge and the cost of challenging are known with other factors fixed, it is obvious for contestants to calculate their optimal strategies. To validate the theoretic result, we need to collect the dataset from the TopCoder platform and make further investigation to the winning probability According to the game form in TableI, there are four major cases for the utility of every player i:Case 1: μi (πik )+μi (πji ) when player i challenges player k out of N − 1 players and player j out of N − 1 players also challenges player i;Case 342 TABLE I: One-VS-Multi Game Contestant j from N-1 players C C1 . . . Contestant i Cj . . . Cn D μi (πi1 ) + μi (πji ) μ1 (πi1 ) + μj (πji ) D μi (πi1 ) . . . . . . μi (πij ) + μi (πji ) . . . μj (πij ) + μj (πji ) . . . μi (πij ) . . . . . . μi (πin ) + μi (πji ) μi (πji ) μ1 (πi1 ) μj (πij ) . . . μn (πin ) + μj (πji ) μj (πji ) . . . μi (πin ) μn (πin ) 0 0 P (Xi,j = 1) and the challenge cost Ci (Xi,j ). As the winning probability is a latent variable in the process of TopCoder SRM contests, it’s not intuitive for us to tell whether a participant is willing to take the offensive strategy or the defensive one. We want to use statistical analysis on the SRM dataset to answer the following research questions: Q1: What factors are relevant to the probability to win a challenge? Q2: What is the winning probability determined by? Can it simply be expressed as the ratio between the allocated skill i levels of the contestants, such as P (Xi,j = 1) = si s+s for j challenges between contestants i and j? Or does there exist some certain mathematical formula for the probability? In order to solve these problems, we implement a Web crawler to download data of the TopCoder algorithm contests held before Feb 20th 2014. For each contest, we focus on the dataset that is relevant to the algorithm challenge phase. And this data collection is composed of records with 19 attributes including contest ID, division ID, room ID, room name, challenger ID, challenger name, defender ID, defender name, the challenged solution, the difficulty level of the solution, given scores for the challenged solution, challenger’s old rating, defender’s old rating, challenger’s coding time, defender’s coding time, challenger’s solution scores, defender’s solution scores, challenge time, and the result of a challenge (succeeded or not). Note each player has an algorithm rating that is calculated by TopCoder based on his historical performance in the algorithm contests. The value can be measured as the player’s algorithm programming ability. After the data cleaning process to remove invalid data items such as null values, we get 95477 challenge items, which involve 576 algorithm contests. Among these contests, contest 14514 has the largest number of challenges with a value of 853, while the smallest number of challenges is 1. We replaced the values of “Yes” and “No” with 1 and -1 for the result that marks the success or failure of the challenge. In addition to these above extracted properties, we guess that the difference value of both players’ rating involved in the same challenge may also be important. Therefore, we computed the difference value for every challenge item and added it into subsequent analyses. On the dataset, we apply correlation analysis to study primary factors related to the result of a challenge. Fig. 2: Correlation analysis output1 for algorithm challenges Fig. 3: Correlation analysis output2 for algorithm challenges A. Correlation Analysis for Algorithm Challenges From Figure2 and Figure3, one can find that the result of a challenge (succeeded or not) has significant correlation with the old rating, coding time, solution scores for both challenger and defender, challenge time, and the difference value of the two competitors’ ratings. Among these factors, the difference value of ratings and the challenger’s old rating are the top two elements that has the strongest influence on the challenge result. The correlations for the challenger’s rating and submission scores are positive, while the defender’s old rating and challenge time are negative. These correlation results are in line with our intuition. Specifically, as the challenger’s rating is an indicator of his programming skill, a challenger with a high rating seems more likely to find the bugs in his rival’s code and successfully launch an offense if his rival is a low-ranked defender. Moreover, the negative correlation of the challenge time with the challenge outcomes 343 Fig. 4: The relationship p between the result and the skill ratio Fig. 6: Win/loss ratio of challengers the rating system on TopCoder[26], we divide contestants into three rating levels with different colors–red, yellow and blue (see Figure5). And we collectively described these levels as rank1(2200+), rank2(1500-2199) and rank3(1200-1499) in the paper. H1: A contestant with a high rating is more likely to make a successful challenge. As the result of the correlation analysis indicates, the challenger’s old rating may have a positive and significant impact on the result of a challenge. To further test the hypothesis H1, we compute the number and ratio of challengers under the two cases (win and loss) respectively according to the rating division rule. And Figure6 shows the results. As shown in the Figure6, one can see that the percentage of successful challenges increases along with the growth of challengers’ rating level. Therefore, we believe the assumption that a contestant with a high rating is more likely to make a successful challenge is valid. H2: High-ranked contestants seem to challenge more, and relatively low-ranked contestants are more likely to be chosen as rivals. According to the correlation analysis, the difference value of ratings and the challenger’s old rating are the top two factors with the strongest influence on the challenge result. In other words, it seems easier for a challenger with a higher rating and a greater difference value against the defender to make a successful challenge. Based on this observation, we have reason to believe that high-ranked contestants are more motivated to challenge with a high winning probability, while low-ranked contestants are more likely to be chosen as rivals. On the basis of the rating division, we counted the number of challengers and defenders respectively to validate the hypothesis H2. The outcomes and the frequency distribution histogram for the difference value are shown below. According to Figure7(a), the number of challengers doesn’t go up with the rise of rating. Instead, players in the middle rating level(rank2) are the biggest group who actively launches the challenges. One possible explanation is that a player’s decisions about offense are affected by psychological factors such as the confidence of the player. Specifically, a player in rank3 often has less experience in TopCoder algorithm contests and feels less confidence to launch a risky challenge against the programs submitted by peers with more experiences. Moreover, a player in rank1 may think that his performance in the coding phase is perfect enough to stand out from the rest over the competition, thus opting out to make an extra effort in the challenge phase. And Figure7(b) indicates that the distribution of defenders Fig. 5: Algorithm rating division implies that within the shorter challenge time the challenger i have more chances to find a valid test case to fail the rival j’s code, as the total challenge time is limited to 15 minutes. Given the fact that the challenge time is proportional to the effort cost Ci (Xi,j ), the lower cost reduces the threshold value for the player i to make the decision of an offense. With the correlation analysis result, we remove some irrelevant factors and retain the rest for follow-up data analysis. Then we attempt to find if there exists some explicit mathematical expression for the challenge result or the probability in order to make some predictions. Specifically, if explicit formula is founded to compute the winning probability and the challenge cost for challengers, one can infer contestants’ decision-making under the circumstances of competitions in advance. Firstly we examine if the ratio between the allocated skill levels of the players simply can express the winning probability well. And then we test our expectation by computing the ratio of the allocated skill levels and doing the correlation analysis between the factor succeeded or not and the factor skill ratio. The result (see Figure4) shows a significant positive correlation between them with a relatively low coefficient value 0.209. Thus, the ratio of the allocated skill levels for players simply is not enough to express the winning probability well. And then we explore to use all the retained significant factors to find some proper mathematical formula to express the probability by applying a few regression and classification methods in statistics including linear regression, multi-variable regression, logistic regression, principal component analysis(PCA), and support vector machine(SVM). However, none of the methods can yield a certain prediction formula to describe the functional relationship between the challenge result and the primary factors. For this, there is one possibility that some key factors which are not open to the public have not yet been included in our data collection, i.e., the submitted solution source code. If so, we should ask the TopCoder to share more data with us for further research. B. Empirical Analysis for Algorithm Challenges Based on the above-mentioned outcomes of our multipleperson game model and correlation analysis, we propose the following three hypotheses to study the contestants’ competitive behaviors in TopCoder algorithm contests. According to 344 (a) (a) (b) (b) Fig. 8: The number and ratio of challenges for the problems with different difficulty levels By observing the behaviors of contestants in algorithm contests on TopCoder, we find that in general the number of challenges is low compared to the scale of participation in these competitions. There are about 20 members in a virtual competitive room, while the mean value of challenges is about 10. Moreover, the number of challenges with the highest frequency is 2. As for this phenomena, we think that for contestants in division I with a rating more than 1200, the challenge cost seems too expensive for them to make offensive challenges. During the algorithm challenge phase, there are solutions for three problems with different levels of difficulty. We cross examine the challenged number and ratio for the three different categories respectively. Figure8 illustrates that both the number and ratio of challenged solutions decrease with the difficulty level of corresponding problems increases. It’s easy to understand, as the challenge cost will inevitably increase along with the rise of challenged solutions’ difficulty, and then it becomes harder for the winning probability to exceed the threshold value Vi (Xi,j ). Consequently, fewer contestants are willing to challenge others. That is to say, fewer contestants will participant in the algorithm challenge phase with the increase of cost (leaving the other factors unchanged). (c) Fig. 7: The frequency distributions for challengers, defenders, and the difference value is quite different from the distribution of the challengers. The lower rating a player has, the more likely he is going to be challenged by other players with superior ratings. And from Figure7(c), one see that the distribution for the difference value of players’ ratings is very similar to a normal distribution. In other words, players are not always apt to challenge one with the lowest skill level. On the contrary, they may choose to challenge the greatest players in the same competition room. In our perspective, the reason they behave like this is that they are ambitious to compete in order to prove themselves and see who’s the best, as Jeff Howe ever said [27]. H3: contestants are less likely to challenge with the rise of the difficulty level. From the Nash Equilibrium solution of our multiple-person game model, we obtain that if the probability P (Xi,j = 1) for a player to make a successful challenge exceeds the threshold value Vi (Xi,j ), then he would like to select the challenge strategy. And the threshold value is mainly determined by the effort cost for finding a valid test case. Obviously, it takes more effort to challenge the submitted solution with a high difficulty level, thus resulting in the higher cost. Therefore, the expensive cost renders the higher threshold value that discourage a player from launching offensive challenges for difficult programming problems. 5. C ONCLUSION AND F UTURE W ORK The paper presents a multiple-person game model to study the competitive behaviors and phenomena occurred during algorithm challenges on TopCoder by applying complete information game theory. With the Nash Equilibrium solution, we find that if a contestant’s probability to make a successful challenge exceeds the threshold value related to the cost of launching such a challenge, he will always decide to challenge. 345 [9] S. Nag, I. Heffan, A. Saenz-Otero, and M. Lydon, “Spheres zero robotics software development: Lessons on crowdsourcing and collaborative competition,” in IEEE Conference Publications, 2012. [10] K. Mao, Y. Yang, M. Li, and M. Harman, “Pricing crowdsourcing-based software development tasks,” in Proceedings of the 2013 International Conference on Software Engineering, 2013, pp.1205–1208. [11] K. Li, J. Xiao, Y. Wang, and Q. Wang, “Analysis of the key factors for software quality in crowdsourcing development: An empirical study on topcoder.com,” in Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference, 2013, pp.812– 817. [12] K. J. Boudreau, K. Lakhani, and M. E. Menietti, “Performance responses to competition across skill-levels in rank order tournaments: Field evidence and implications for tournament design,” 2014. [13] A. F. M. H. Y. J. W. M. F. Sarro and Y. Zhang, “App store analysis: Mining app stores for relationships between customer, business and technical characteristics.” [14] http://en.wikipedia.org/wiki/Game theory. [15] N. Archak and A. Sundararajan, “Optimal design of crowdsourcing contests,” in ICIS, 2009. [16] S. Chawla, J. D. Hartline, and B. Sivan, “Optimal crowdsourcing contests,” in Proceedings of the Twentythird Annual ACM-SIAM Symposium on Discrete Algorithms, 2012, pp. 856–868. [17] T. X. Liu, J. Yang, L. A. Adamic, and Y. Chen, “Crowdsourcing with all-pay auctions: A field experiment on taskcn,” in Proceedings of the American Society for Information Science and Technology, 2011. [18] DiPalantino, Dominic, Vojnovic, and Milan, “Crowdsourcing and all-pay auctions,” in Proceedings of the 10th ACM Conference on Electronic Commerce, 2009, pp. 119–128. [19] Y. Zhang and M. van der Schaar, “Reputation-based incentive protocols in crowdsourcing applications,” CoRR, vol. abs/1108.2096, 2011. [20] G. Ranade and L. Varshney, “To crowdsource or not to crowdsource?” 2012. The theoretical model is validated by empirical analysis of the dataset about the challenge phase of TopCoder SRM contests. Additionally, the analytical results indicate the following conclusions: (1) Both the difference rating value between the challenger and defendant as well as the challenger’s old rating have a relatively strong and positive influence on the result of a challenge. With the proficient programming skills, a contestant with a high rating is more likely to deliver a successful challenge. (2) Contestants with mid-level ranking tend to challenge more, and relatively low-ranked contestants are easier to be chosen as rivals. But contestants with the highest rating may be unwilling to challenge due to some psychological factors. (3) Fewer contestants tend to be active in the algorithm challenge phase with the increase of the cost for initiating challenges. The research results in the paper presents better understanding of competitive behaviors in the offense-defense process of software crowdsourcing. Nonetheless, there are still many open questions to be addressed. In future work, we can make more effort on the data collection to find the key determinative factors to express the winning probability in an explicit mathematical formula. More attributes for challenges, especially the source code and test case in the challenges, need to be incorporated to our study to reveal the latent rational of challenge decisions in terms of the winning probability and the challenge cost. More importantly, these program-related attributes may enable us to assess the effectiveness of offensedefense based quality assurance in the scenarios of TopCoder SRM challenges. ACKNOWLEDGMENT This work is funded by National High-Tech R&D Program of China (Grant No. 2013AA01A210) and the State Key Laboratory of Software Development Environment (Funding No. SKLSDE-2013ZX-03). [1] [2] [3] [4] [5] [6] [7] [8] R EFERENCES Crowdsourcing in 2014: With Great Power Comes Great Responsibility. http://www.wired.com/2014/01/crowdsourcing-2014great-power-comes-great-responsibility/ Crowdsourcing. http://en.wikipedia.org/wiki/Crowdsourcing TopCoder. http://www.topcoder.com/ Z. Hu and W. Wu, “A game theoretic model of software crowdsourcing,” in 2014 IEEE 8th International Symposium on Service Oriented System Engineering (SOSE), April 2014, pp. 446–453. K. R. Lakhani, D. A. Garvin, and E. Lonstein., “Topcoder (a): Developing software through crowdsourcing,” Harvard Business School, Tech. Rep., January 2010. W. Wu, W.-T. Tsai, and W. Li, “An evaluation framework for software crowdsourcing,” Frontiers of Computer Science, vol. 7, no. 5, pp. 694–709, 2013. D. Fried, “Crowdsourcing in the software development industry,” 2010. N. Archak, “Money, glory and cheap talk: Analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on topcoder.com,” in Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 21–30. http://www.aaai.org/ocs/index.php/WS/AAAIW12/paper/view/5241 [21] D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcing to smartphones: Incentive mechanism design for mobile phone sensing,” in Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, 2012, pp. 173–184. [22] B. Hoh, T. Yan, D. Ganesan, K. Tracton, T. Iwuchukwu, and J.-S. Lee, “Trucentive: A game-theoretic incentive platform for trustworthy mobile crowdsourcing parking services,” in Proceedings of IEEE ITSC, 2012. [23] A. Ghosh, “Social computing and user-generated content: A game-theoretic approach,” SIGecom Exch., vol. 11, no. 2, pp. 16–21, 2012. [24] http://blog.csdn.net/touzani/article/details/1633572. [25] http://apps.topcoder.com/wiki/display/tc/Algorithm+Overview. [26] Algorithm Rating System. http://help.topcoder.com/datascience/srm-and-mm-rating-systems/algorithm-ratingsystem/ [27] A. Bingham and D. Spradlin, Case Study: Virtual Software Development: How TopCoder Is Rewriting the Code. FT Press, 2011, ch. 7. 346
© Copyright 2026 Paperzz