A Self-Adapting Intelligent Optimized Analytical Model for team selection using player performance utility in Cricket Bharathan S1, Sundarraj RP2, Abhijeet3, Ramakrishnan S3 Email: [email protected], [email protected], [email protected], [email protected] 1Analytics Manager at Sports Mechanics and PhD scholar in Indian Institute of Technology, Madras 2Professor, Indian Institute of Technology, Madras 3Sports Mechanics India Private Limited Abstract Good team selection is vital for success in all sports. Today, team selection is a very subjective decision and is controlled by coaches and captains using their gut feel or player’s current form. We have proposed a new methodology for objective evaluation of players for team selection. Player selection involves evaluating a player across multiple dimensions viz. batting and bowling based on role in the team, context, opponents etc. In our study, we consider all possible team level metrics that affects the outcome (win or loss) of the match, translate them to individual player metrics, develop player evaluation utility and use it for team selection. The results of the model are demonstrated for the game of cricket and can be easily extended to other sports. The model successfully identified 75% of the good performers and team selection accuracy is at 83%. The model can be used as an effective enabler to rate players; select teams; define salary caps, predict winner etc. 1 Introduction The statistics of professional sports, players and teams provides numerous opportunities for research [1,2]. The two popular American sports, namely, Football and Baseball is always been characterized by high degree of analytics. In contrast, the British game of cricket has not been subjected to the same degree of analytics. Cricket, the second most popular sport in the world after Soccer with 2-3 billion fans [3] is relatively a new and upcoming research area. Traditionally, the game of cricket is classified as test cricket which is played between two countries over a duration of 5 days, one day international (ODI) cricket which is played between two international teams where each team play 50 overs and first class cricket which is either played between domestic teams or between an international visiting team and a domestic team for a duration of 3 or 4 days. During the last 3 decades, the game of cricket has seen several changes and all these changes were directed to make this game more popular among the masses and to expand the reach of cricket to non-cricket playing nations. However, the most significant change came in the form of Twenty20 (T20) cricket. T20 is the latest innovation in the game and is a shorter version than ODI cricket. The total duration of the T20 game is about 3 hours, and each team gets to play 20 overs. The frenetic pace and growing popularity of the T20 format is changing the way the other formats are being played. For example, when Duckworth and Lewis came up with the rain-rule in 1999, the average score was 225 runs in 50 overs but today teams score 225 runs in 20 overs in a T20 match. Today teams require players who are efficient in multiple roles. So selection of players for a team with various constraints is a complex task which can be viewed as a constrained multi-objective optimization and a multiple criteria decision making problem. In the formation of a good and successful cricket team, batting and bowling strength of a team are major factors affecting its performance and an optimum trade-off needs to be attained. There are numerous variables on which a player’s performance is measured today but it is important to know whether all variables should be considered for player evaluation or only a subset of variables depending on the role of the player in the team. Currently most of the team selections are done using different heuristics, past experiences, or at most using some crude methodologies. Generally, in a team selection committee, each member evaluates the player’s performance individually and vote for inclusion/exclusion from the team. Negotiations are then conducted to arrive at an agreement among the members as to which cricketer should be finally selected. In our work, we employ a 2 phase approach whereby we first identify the variables of significance to evaluate a player, then evaluate a player and select the player for inclusion in team. The rest of the paper is organized as follows: section 2 presents an overview of literature related to player evaluation and team selection, section 3 explains in detail the methodology and section 4 discusses the results and section 5 concludes the paper with scope for future work. 2 Related Work Studies addressing different research issues related to various dimensions of the cricketing sport can be found in the literature. Articles [4,5,6,7,8,9] discuss and introduce the game of cricket while specifically considering statistical methods for determining a player’s performance. Performance measurement and classification of players based on their performance can be considered as a researchers’ delight irrespective of the sport. The physical demands of English Football Association Premier League soccer of three different positional classifications viz. defender, midfielder and striker were evaluated in [10]. Elite Cuban baseball players were classified into five categories based on the roles they played in the field that includes infielders, outfielders, catches, first basemen and pitchers in [11]. Similar studies are discussed in soccer and cricket [12], in rugby [13] and in football [14]. Some recent articles by [15,16,17,18,19] discuss the various multi-criteria decision making models of player evaluation for multiplayer sports, namely, Baseball, Basketball, Cricket and Football. Most studies have used Analytical Hierarchy Process (AHP) to assess the relative importance for each variable and Technique for Order Performance by Similarity to the Ideal Solution (TOPSIS) to rank players based on their relative closeness to the ideal player. However, these studies have two major shortcomings 1) weightages used in player evaluation might not be accurate as they are arrived from subjective inputs and 2) there is no relationship between weightage given and outcome of the match. Selection of a cricket team under various constraints such as number of batsmen, bowlers, all-rounder and a wicket keeper is a complex task as coaches are required to consider number of qualitative and quantitative attributes. These attributes may include the player’s individual skills and performance statistics, combination of players’ physical fitness, psychological factors, and injuries among others [20]. An integer programming model to select a squad of 15 players for one day international cricket team was developed in [21]. A neural network approach to predict each cricketer’s performance in the future based upon their past performance and classification of players into three categories performer, moderate and failure was discussed in [22]. Based on the ratings generated and by applying heuristic rules they recommended the cricketers to be included in the World Cup 2007. In [23] a method for quantifying a cricket player’s performance based on his ability to score runs and take wickets was proposed. Then, the performance measures were used to determine the optimal team using an integer programming. An integer programming model was used in [24] for cricket team selection based on 2009 ICC Champions trophy data. A Data Envelopment Analysis (DEA) formulation was proposed in [25] for evaluation and ranking of cricket players in different capabilities based on DEA scores. The ranking is then used to choose the required number of players for a cricket team in each cricketing capability. 3 Methodology The proposed methodology for player evaluation and team selection is schematically shown in Figure 1. There are five stages starting with data identification, data preparation, data reduction, data modeling, player evaluation and team selection. Data Identification Interviews with experts to identify variables Data Preparation Data cleaning to remove rain affected matches and data trasformation Data Reduction Principal Component Analysis to address multicollinearity Player evaluation and Team selection Integer programming model to select players based on utility Modelling Logistic Regression to determine significant factors that impacts the outcome of the match Figure 1 Framework for player evaluation and team selection 3.1 Data source Twenty-20 cricket database which consists of ball by ball details of more than 1500 international and domestic T20 matches played between 2008 and 2014, tracked across 80+ variables by SportsMechanics, a sports technology and performance analytics company is used for the study. 3.2 Data Identification Alignment with users is a key to success for analytics as they drive or at least strongly influence major decisions. If even one member of the team isn’t committed, it is unlikely that analytics will receive sustained and serious focus. So as the first step, the users and domain experts that include coaches, players, members of national team selection committee and performance analysts are involved to identify the list of variables that should be considered for player evaluation and team selection. Focus groups sessions and In-depth interviews were conducted and a list of variables that could possibly influence the outcome of the match is identified. Description of identified variables is given in Appendix A. 3.3 Data preparation From ball by ball details, data is aggregated to calculate the performance variables like batting strike rate, dot-ball percentage, bowling economy etc. at match level for all matches using standard cricketing definitions. Data is then cleaned to remove all drawn and shortened or rain affected matches. Since the performance of the players depend on the role of the player in the team, we further divided the variables based on batsman or bowler’s role. Variables that directly don’t measure the performance of a player like venue, ground, pitch-type and opponent are used as filters to restrict the data depending on the opponent and venue for which team selection is being done. Preliminary scatter plot and correlation analysis is done to understand the distribution of data and relationship between independent variables and outcome of the match. Based on the analysis, variables like partnerships, playing order etc having weak correlation (coefficient<0.3) is removed from further modeling. The final list of variables used in the study and their naming convention is as given in Table 2 and 3 for batsman and bowlers respectively. Table 2 Batting variables under study Variable Runs Strike Rate Dotball % Boundary % Boundary frequency RSS uncomfortables 3.4 Top batopnar batopnsr batopndbp batopnbp batopnbf batopnrss batopuc Middle batmidar batmidsr batmiddbp batmidbp batmidbf batmidrss batmiduc Lowermiddle Batlmar Batlmsr Batlmdbp Batlmbp Batlmbbf Batlmbrss Batlmbuc Table 3 Bowling variables under study Variable Fast Spin Economy Average runs conceded Bowling Strike rate Dotball % Boundary % Boundary frequency Fsteco fstblavg fstblsr fstbldb Fsblbp Fsblbf spneco spnblavg spnblsr spnbldb spnblbp spnblbf Principal Component Analysis Since most of the variables are derived from two basic variables namely, runs and balls, we felt that there could be multicollinearity among the independent variables. The correlation matrix of the data set was obtained to assess the measure of pair wise association in variables. Appendix B shows the Pearson correlation matrix with statistically significant correlation coefficients (P<0.01) highlighted. Some interesting observations from analysis is absence of relationship between a team’s batting and bowling performance and high correlation among variables within the particular role of the player. Since the number of variables is high and there is no correlation between variables of different roles, correlation results are split by batting and bowling roles. From Appendix B, we find that there is a significant amount of correlation among the predictors, judging by the strength of the correlation coefficients between them. To remove multicollinearity, the observed variables are reduced into a smaller number of principal components (PCs) using Principal Component Analysis (PCA). PCA is a special case of factor analysis which transforms the original set of inter-correlated variables into a new set of an equal number of independent uncorrelated variables or PCs that are linear combinations of the original variables. The principal components are ordered in such a way that the first PC explains most of the variance in the data, and each subsequent one accounts for the largest proportion of variability that has not been accounted for by its predecessors. We also used principal component methods for selecting subsets of variables for a regression equation using varimax rotation of the principal components that retains a subset of the original variables associated with each of the first few components. Tables 3 and 4 summarize the results of the varimax rotation on the 15 principal components together with the amount of variance explained by each component. The higher the loading of a variable, the more that variable contributes to the variation accounted for by the particular principal component. A principal component with an eigenvalue greater than or equal to 1, is usually considered as being of statistical significance (the Kaiser criterion). Principal component analysis was done using STATA to obtain orthogonal scores accounting for the variance in the attributes. A 10 component solution was chosen based on the scree plot and interpretability of the Varimax rotated components. The ten components accounted for Figure 2 Scree plot 85.98% of the variance. The ten PCs, variables loaded on each of them, eigen values and cumulative percentage of variance explained is given in Table 4. From Table 4, we find that although boundary percentage is derived from runs and boundary frequency is derived from balls faced, it is interesting to note that these two variables are strongly correlated for top order batsmen. Similarly for fast bowlers we see that economy rate which is a factor of total runs is correlated to number of fours conceded by the bowler. Most of the variance in data is being explained by middle and lower middle order performance than top order batsman. Also, we find the number of uncomfortable balls faced by a batsman or bowled by a bowler is almost insignificant Table 3 Rotated principal component loadings Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 batopnar 0.01 0.16 0.06 -0.77 0.01 0.03 -0.01 -0.03 0.00 -0.10 -0.02 -0.04 -0.03 0.24 -0.32 batopnsr 0.00 0.04 0.52 -0.79 0.06 0.05 0.02 -0.06 0.02 -0.04 -0.06 -0.07 -0.02 -0.05 0.08 batopndbp 0.02 -0.06 0.20 0.93 -0.04 -0.02 -0.01 0.05 -0.01 0.04 0.05 0.07 0.00 0.06 -0.07 batopnbp 0.04 0.01 0.92 0.07 0.09 0.09 0.01 -0.01 0.02 0.01 -0.03 0.00 0.03 0.00 -0.02 batopuc 0.09 -0.01 0.04 0.13 0.07 0.03 0.02 -0.01 -0.05 0.01 0.04 0.01 0.57 -0.02 0.02 batopnrss 0.03 -0.01 0.89 0.07 0.06 0.06 0.00 -0.01 0.01 0.00 -0.02 -0.03 -0.02 0.00 0.00 batopnbf -0.01 -0.02 -0.75 0.41 -0.08 -0.05 -0.01 0.02 -0.02 0.01 0.01 0.02 -0.04 0.00 -0.03 batmidar 0.36 0.14 0.06 0.22 0.01 0.01 -0.02 -0.64 0.01 -0.11 -0.01 -0.03 0.00 0.11 0.40 batmidsr 0.65 0.11 0.02 -0.09 0.06 0.06 0.00 -0.70 -0.01 -0.07 -0.06 -0.08 -0.01 0.01 -0.03 batmiddbp 0.01 -0.10 -0.01 0.12 -0.03 -0.03 0.02 0.95 0.01 0.11 0.09 0.05 0.02 0.01 0.05 batmidbp 0.93 0.05 0.03 0.03 0.09 0.07 0.01 0.03 -0.01 0.00 0.00 0.00 0.02 -0.01 0.02 batmiduc 0.03 -0.02 0.03 -0.03 0.09 0.00 0.00 0.10 0.00 0.01 0.03 0.00 0.63 0.00 -0.03 batmidrss 0.95 0.04 0.02 -0.01 0.05 0.06 0.02 -0.03 -0.01 0.02 0.00 -0.06 0.02 0.02 0.01 batmdbf -0.78 -0.04 -0.02 -0.02 -0.09 -0.03 -0.01 0.25 -0.01 0.04 0.03 -0.01 -0.01 -0.01 -0.01 batlmar -0.07 0.17 0.04 0.20 0.04 0.04 0.00 0.19 0.03 -0.33 -0.05 -0.02 -0.02 -0.47 -0.03 batlmsr 0.05 0.74 0.00 -0.09 -0.03 0.02 -0.01 -0.13 -0.01 -0.59 -0.03 -0.04 -0.02 0.01 -0.02 batlmdbpr -0.02 -0.10 0.00 0.09 0.01 0.01 0.01 0.15 0.00 0.92 0.05 0.06 0.02 0.03 -0.01 batlmbp 0.06 0.92 0.01 -0.04 0.04 0.04 -0.02 -0.03 0.00 0.07 0.01 -0.01 0.00 0.00 0.02 batlmbuc 0.03 -0.07 0.02 0.02 0.11 -0.03 0.02 0.00 -0.02 0.11 -0.02 0.00 0.53 0.03 0.05 batlmrss 0.06 0.92 0.01 -0.06 -0.03 0.03 -0.01 -0.08 -0.01 0.00 0.02 0.00 -0.03 0.00 -0.01 Batlmdbf -0.04 -0.77 0.02 0.07 -0.04 -0.01 0.03 0.09 0.01 0.20 0.03 0.05 -0.02 0.06 -0.02 fsteco 0.11 0.03 0.09 -0.08 0.64 0.07 0.30 -0.08 0.12 -0.01 -0.07 -0.59 0.05 -0.01 -0.01 fstblavg 0.03 -0.01 0.02 -0.02 0.23 0.04 0.94 0.00 0.05 0.00 -0.03 -0.18 0.01 0.00 -0.01 fstblsr -0.01 -0.03 -0.01 0.01 0.09 0.04 0.98 0.03 0.02 0.01 -0.03 -0.05 0.00 0.00 0.01 fstdbpr -0.05 -0.05 -0.02 0.13 -0.09 -0.01 -0.24 0.10 -0.09 0.09 0.11 0.83 0.02 0.00 -0.01 fsblbp 0.11 0.02 0.09 -0.01 0.92 0.10 0.15 -0.02 0.07 0.03 0.00 0.09 0.03 -0.01 0.00 fsblbf -0.08 0.02 -0.09 0.05 -0.85 -0.07 -0.17 0.04 -0.08 0.02 0.06 0.20 -0.04 0.00 -0.01 spneco 0.08 0.01 0.08 -0.07 0.09 0.67 0.07 -0.10 0.20 -0.02 -0.61 -0.05 -0.01 -0.01 0.02 spnblavg 0.01 0.00 0.03 -0.03 0.09 0.23 0.05 -0.02 0.93 -0.01 -0.21 -0.04 -0.01 -0.01 0.01 spnblsr -0.02 -0.01 0.00 0.01 0.07 0.01 0.03 0.03 0.98 0.01 -0.02 -0.05 -0.01 0.01 0.00 spndbpr -0.02 0.00 -0.03 0.10 -0.05 -0.07 -0.05 0.13 -0.22 0.07 0.84 0.11 0.02 0.01 0.01 spnblbp 0.09 0.05 0.09 -0.01 0.09 0.94 0.04 0.00 0.08 0.02 0.06 0.01 0.00 0.00 0.00 spnblbf -0.07 -0.03 -0.05 0.04 -0.07 -0.82 -0.03 0.03 -0.13 0.02 0.14 0.04 0.00 0.00 0.01 Table 4 Principal components and variables loaded PCs PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 3.5 Role Middle order performance Lower middle order performance Top order performance Top order performance Fast bowler performance Spin bowler performance Fast bowler performance Middle order performance Spin bowler performance Middle order performance Variables loaded Batmidbp, batmidrss, batmdbf Batlmsr, batlmbp, batlmrss, batlmbf batopnbp, batopnrss, batopnbf Batopnar, batopnsr, batopndbp Fsteco, fsblbp, fsblbf Spneco, spnblbp, spnblbf Fstblavg, fstblsr Batmidar, batmidsr, batmiddbp Spnblavg, spnblsr batlmdbpr Eigen value 3.012 2.949 2.568 2.452 2.156 2.110 2.049 2.032 1.972 1.419 cumulative variance 11.40% 22.56% 32.28% 41.56% 49.72% 57.70% 65.46% 73.15% 80.61% 85.98% Logistic Regression Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine a dichotomous outcome. The goal of logistic regression is to find the best fitting model to describe the relationship between the dichotomous characteristic of interest and a set of independent variables. Logistic regression generates the coefficients (and its standard errors and significance levels) of a formula to predict a logit transformation of the probability of presence of the characteristic of interest. The simple logistic model has the form ( ) ( ) ( | ) The probability of the outcome of interest is given as ( | ) where the coefficient determine the direction of the relationship between independent variable and the logit of Y. Outcome of a cricket match is a binary variable (either win or loss) and is influenced by various factors. Here, we have used principal component scores from Section 3.4 as independent variables in stepwise logistic regression analysis to determine the relationship between PCs and outcome of the match. The model is significant and the coefficient of determination for the model, pseudo R2 is 0.725 i.e. the proportion of the variation in the outcome of the match explained by the independent variables in the model is 72.5 percent. Table 5 Regression modeling results 3.6 0.71 Std Coef. 0.88 Std. Err. 0.09 0.97 1.18 PCs Coef. PC1 PC2 Prob [95% Conf. Interval] 0.00 0.53 0.88 0.11 0.00 0.77 1.18 PC3 0.33 0.50 0.09 0.00 0.16 0.50 PC4 -0.93 -0.75 0.10 0.00 -1.12 -0.75 PC5 -1.31 -1.09 0.11 0.00 -1.53 -1.09 PC6 -0.71 -0.53 0.09 0.00 -0.89 -0.53 PC7 -0.99 -0.80 0.10 0.00 -1.18 -0.80 PC8 -1.01 -0.81 0.10 0.00 -1.21 -0.81 PC9 -0.76 -0.58 0.09 0.00 -0.94 -0.58 PC10 -0.85 -0.65 0.10 0.00 -1.05 -0.65 constant -0.01 0.15 0.08 0.92 -0.17 0.15 Player evaluation and Team selection One of the greatest challenge when it comes to player evaluation in cricket is the performance metrics that should be considered to evaluate a player. The question is whether one should consider all variables or only a subset of variables and whether one should evaluate all players on the same set of variables or it should depend on players’ role. There are various performance metrics to measure the performance of batsman and bowlers. In our case, we are going to evaluate players based on their role and within each role, the significant variables from Table 4 are considered for player evaluation. An integer programming model is proposed for team selection whereby players are selected in the team using binary response decision variables. The decision variables are defined to determine whether or not an individual is good enough for selection based on the optimization of a linear function for predefined criteria. Utility is used as the objective function which is to be maximized. To evaluate the utility of a player, we need to scale the values of the variables in order to measure the multidimensional attributes independent of their units and ranges. The set of performance variables can be classified into two subsets: positive and negative ones. The values of positive ones need to be maximized (strikerate, batting average etc.) while the values of negative ones (economy, bowling average etc.) need to be minimized. To normalize, the attributes are scaled in the range of 0-1 with 1 representing the best player and 0 representing the worst player using min-max normalization procedure as defined in equation 1 and 2. Let be the observed values of the player for the variable (i.e. Strike rate, runs etc.) for the role (i.e. top, middle, etc.). If a variable represents a positive dimension, then normalized variable is ( ( ) ) ( (1) ) and if a variable represents a negative dimension, then normalized variable is ( ( ( ) ) ( ) ) (2) Now utility of a player is defined using the principles of TOPSIS whereby we select the player that is closet the best performance and farest from the worst performance. So utility for player p is ∑ ( ( ( ( ) ) ) ( ( )) ) (3) where is the weightage assigned to particular performance variable for a particular role based on logistic regression results described in Table 4. Standardized beta coefficients are used arrive at the relative weightages such that the sum of weights is equal to 1. Figure 3 Weightages for performance variables by player role based on IPL 2014 data The next step is to select a team based on performance utility. We have used isnteger programming model with the following decision variables: { } { } The objective value of the model is to maximize the utility of players in the team. Utility of the team is defined as the sum of player’s utility across all roles. Each variable within each role is assigned a particular weightage depending on how much they influence the outcome of the match( ). ∑ ∑ ( ( ( ( ) ) ) ( ( )) ) (4) The constraints of the model could be changed according to the requirements of the selection. Constraint 1 (equation 5) ensures that atleast players are selected for a particular role r. Generally depends on the team composition required and should be greater than or equal to 2 for top order batsman, 3 for middle order batsman, 2 for lower middle order batsman, 2 for fast bowlers and 2 for spin bowlers. Constraint 2 makes sure that maximum number of players selected is 15. Constraint 3 ensures that a player is selected only once. This is important in cricket as players play multiple roles as both batsman and bowler. Additional constraints, like availability or national quota rules etc. subject to the requirements of the selectors, could easily be added to this model. ∑ ∑ ∑ ∑ (5) 5 (6) (7) ( 4 ) (8) Model validation and Results Indian Premier League (IPL) is one of the finest Twenty20 competitions in the world of cricket based on the lines of English Premier League and the National Basketball League where players from all cricket playing countries play for franchise. The tournament was inaugurated in 2008 and has taken Indian cricket to new heights. To illustrate the proposed model, data captured from the 2013 and 2014 IPL is used to rate players based on their role. The final output of player ranking for 2013 from the model is graphically shown in Figure 3. The player rankings are shown based on the player role in x-axis and player utility along y-axis. Similar results were obtained for 2014. Figure 3 Utility of players in IPL 2013 The optimization model proposed in Section 3.6 is run using Lingo, optimization software on same dataset to find an optimal team. The optimal team suggested by the model based on players performance in IPL 2014 is Lendl Simmons, Robin Uthappa, David Warner (top order), Glenn Maxwell, AB De Villers, Suresh Raina (middle order), MS Dhoni, James Faulkner, Yusuf Pathan (lower middle order), Lasith Malinga, Bhuvaneshwar Kumar, Mitchell Starc (fast), Sunil Narine, Axar Patel and Harbhajan Singh (spin). Few interesting observation from the output of the model is selection of James Faulkner as a lower middle order batsman, omission of purple cap holder Mohit Sharma, selection of Harbhajan Singh and Yusuf Pathan, who are not part of international Indian team. The possible reason for selection of James Faulkner is his allrounder capability and rejection of Mohit sharma is poor bowling performance except wickets taken. To validate the optimization model, the actual performance of selected players in matches played post IPL 2014 is compared with the actual performance of other players. However, other players are restricted to only players in similar role and those who played in IPL and were not recommended by the model. A two-sample t-test was performed to check if there is significant difference in average runs, strike rate and boundary percent for batsman and for bowlers on strike rate, economy, bowling average and boundary frequency. Two-sample t-test results are summarized in Table 6 and 7 for bowling and batting respectively. Table 6 Two-sample t-test results for bowling Role Bowl strike rate Bowl economy Bowling average Fast Spin Boundary frequency Table 7 Two-sample t-test results for batting Role Average runs SR Boundary percent Top Middle LowerMiddle - Significant at 0.001 From Table 6 and 7, we infer that there is statistically significant difference in players’ performance between players identified by the model and other players. However, we also see that there is no statistical difference for certain variables like economy for spin bowlers, boundary frequency for both fast and spin bowlers, runs scored by middle order batsman and SR for lowermiddle order batsman. The possible reason for this could be the close competition for slots in the team and players optimally selected by the model based on overall performance and not just one performance variable. The optimization model was run for every team selection done by various teams for all tours and tournaments in 2014. The results were compared with the actual selection done by different selection committees. Our model’s team selection accuracy was at 83% i.e 8 out of 10 players selected by committees were rightly predicted. Overall, we found that 75% of good performers were rightly predicted i.e. there was a significant difference in performance between the players selected by the model and other players. We also validated how the model can be used to predict the rankings of the team based on player potential. For this, the players were first ranked based upon their past one year performance using the proposed methodology. Then IPL 2014 teams were scored based upon their player’s score computed in the previous step and the weightages given to each role. The overall team score is then compared to their actual performance in IPL 2014 season. The team’s actual ranks were in close resemblance to their relative scores obtained from the model. 4 Conclusion We have proposed a new methodology for objective evaluation of players for team selection. The proposed integer programming model based on player utility is more realistic than traditional evaluation and selection methods. This is first of its kind in the game of cricket because the model considers multiple variables across dimensions viz. batting and bowling that impact the outcome of the match and distribute weightages accordingly, based on roles to evaluate players making it a self-adaptive intelligent model. Our model’s team selection accuracy is at 83% and 75% of the good performers were rightly identified. The proposed model can be used as an effective enabler to rate players; select teams; define salary caps etc. The results of the model are demonstrated for cricket and can be easily extended to other sports. Our future work includes use of players’ fitness and workload information for player evaluation and team selection. 5 Acknowledgements We would like to thank CKM Dhananjai and Gaurav Sundararaman for their help and support for this work. 6 References [1] De Silva, Basil M., and Tim B. Swartz. Winning the coin toss and the home team advantage in one-day international cricket matches. Department of Statistics and Operations Research, Royal Melbourne Institute of Technology, 1998. [2] Durbach, Ian N., and Jani Thiart. "On a common perception of a random sequence in cricket: application." South African Statistical Journal 41.2 (2007): 161-187. [3] http://sporteology.com/top-10-popular-sports-world/ [4] Lemmer, Hermanus H. "The combined bowling rate as a measure of bowling performance in cricket." South African Journal for Research in Sport, Physical Education and Recreation 24.2 (2002): p-37. [5]Lemmer, Hermanus H. "A measure for the batting performance of cricket players." South African Journal for Research in Sport, Physical Education and Recreation 26.1 (2004): p-55. [6] Lemmer, Hermanus H. "A measure of the current bowling performance in cricket." South African Journal for Research in Sport, Physical Education and Recreation 28.2 (2006): p-91. [7] Beaudoin, David, and Tim Swartz. "The best batsmen and bowlers in one-day cricket: general." South African Statistical Journal 37.2 (2003): 203-222. [8] Barr, G. D. I., C. G. Holdsworth, and B. S. Kantor. "Evaluating performances at the 2007 cricket world cup." South African Statistical Journal 42.2 (2008): 125-142. [9] Bracewell Paul, J., and Ruggiero Katya. "A parametric control chart for monitoring individual batting performances in cricket." Journal of Quantitative Analysis in Sports 5.3 (2009): 1-21. [10] Bloomfield, Jonathan, Remco Polman, and Peter O'Donoghue. "Physical demands of different positions in FA Premier League soccer." Journal of sports science & medicine 6.1 (2007): 63. [11] Carvajal, Wiliam, et al. "Body type and performance of elite Cuban baseball players." MEDICC review 11.2 (2009): 15-20. [12] Clerke, S. R. "Performance Modeling in Sports." Unpublished Ph. D dissertation, Submitted to the School of Mathematical Sciences, Swinburne University of Technology (1997). [13] Gabbett, T. J. "Physiological characteristics of junior and senior rugby league players." British Journal of Sports Medicine 36.5 (2002): 334-339. [14] McGee, Kimberly J., and Lee N. Burkett. "The National Football League combine: a reliable predictor of draft status?." The Journal of Strength & Conditioning Research 17.1 (2003): 6-11. [15] Bozbura, F. Tunç, Ahmet Beşkese, and Tuna Sorgun Kaya. "TOPSIS METHOD ON PLAYER SELECTION IN MBA." [16] Lee, Chih-Cheng Chen1 Yung-Tan, and Chung-Ming Tsai. "A Hybrid Assessment Method for Evaluating the Performance of Starting Pitchers in a Professional Baseball Team." (2013). [17] Dey, Pabitra Kumar, Dipendra Nath Ghosh, and Abhoy Chand Mondal. "A MCDM Approach for Evaluating Bowlers Performance in IPL." Journal of Emerging Trends in Computing and Information Sciences 2.11 (2011). [18] Dey, Pabitra Kumar, Dipendra Nath Ghosh, and Abhoy Chand Mondal. "Statistical Based Multi-Criteria Decision Making Analysis for Performance Measurement of Batsmen in Indian Premier League." International Journal of Advanced Research in Computer Science 3.4 (2012). [19] Tavana, Madjid, et al. "A fuzzy inference system with application to player selection and team formation in multi-player sports." Sport Management Review 16.1 (2013): 97-110. [20] Arnason, Arni, et al. "Risk factors for injuries in football." The American Journal of Sports Medicine 32.1 suppl (2004): 5S-16S. [21] Gerber, Hannah, and Gary D. Sharp. "Selecting a limited overs cricket squad using an integer programming model." South African Journal for Research in Sport, Physical Education and Recreation 28.2 (2006): p-81. [22] Iyer, Subramanian Rama, and Ramesh Sharda. "Prediction of athletes performance using neural networks: An application in cricket team selection."Expert Systems with Applications 36.3 (2009): 5510-5522. [23] Sharp, G. D., et al. "Integer optimisation for the selection of a Twenty20 cricket team." Journal of the Operational Research Society 62.9 (2011): 1688-1694. [24] Lemmer, Hermanus Hofmeyr. "Team selection after a short cricket series."European Journal of Sport Science 13.2 (2013): 200-206. [25] Amin, Gholam R., and Sujeet kumar Sharma. "Cricket team selection using data envelopment analysis." European journal of sport science 14.sup1 (2014): S369-S376. Appendix A Variable inningsno Batting Role Bowling Role MatchResult Team Score Team Wickets CompetitionName GroundName Countryname Pitchtype PlayedFor Opponent BatsmanRuns BatsmanBalls OverallBatsmanDotBalls Runs per Scoring Shot (RSS) PowerPlayRuns OtherOverRuns PlayingOrder Fours Sixes SR BatsmanChances BowlerRuns Wicket BowlerLegalBalls BowlerWides BowlerNoballs Bowler Strike Rate Total_Catch Total_Stump Runs Saved Player 30+ Partnership 30+ Partnership Breaker 30 + Scores 50+ Scores Description Inningsno in which the player batted and/or bowled The role assigned to the batsmen The role assigned as a bowler What is the result of the match with respect to the player team End score (only runs) of the team with respect to the player's team Total Wickets lost by the team in the innings with respect to the player's team Name of the competition in which the match took place Ground name in which the match took place Country in which the match took place Green top/rank turner / flat track Team name of the player Name of the opponent team against whom the player played Number of runs scored by the player as batsman Number of balls faced by the player as batsman Dot balls(no runs scored) faced by the player Number of runs per scoring ball Number of runs scored by the player in the powerplay overs (1 – 6) Number of runs scored by the player other than powerplay overs (7 – 20) Batting position of the player Number of fours hit by the player as batsman Number of sixes hit by the player as batsman Strike rate of the player as batsman (Runs/balls)x100 Number of lives for the player as batsman Number of runs conceded by the player as bowler Number of wickets taken by the player as bowler Number of legal balls bowled by the player as bowler Number of wides conceded by the player as bowler Number of No Balls bowled by the player as bowler Number of balls taken for each wicket for the player (Runs/wicket) Total catches taken by the player as fielder Total stumpings made by the player as wicket keeper Net Number of runs saved in the field by the player Number of 30+ partnerships a player was involved in Number of wickets taken by the player as bowler after 30+ partnership Number of 30 + scores by an individual player Number of 50 + scores by an individual player Appendix B Table A1 Bivariate correlation table for fast bowlers Variables fsteco fstblavg fsteco fstblavg fstblsr fstdbpr fsblbp fsblbf 1 0.57 1 fstblsr 0.37 0.96 1 fstdbpr -0.68 -0.40 -0.28 1 fsblbp 0.64 0.36 0.22 -0.03 1 fsblbf -0.73 -0.40 -0.27 0.33 -0.84 1 Table A2 Bivariate correlation table for spin bowlers Variables spneco spnblavg spnblsr spndbpr spnblbp spnblbf 1 spneco spnblavg 0.51 1 spnblsr 0.20 0.93 1 spndbpr -0.65 -0.42 -0.24 1 spnblbp 0.67 0.29 0.08 -0.01 1 spnblbf -0.64 -0.34 -0.15 0.25 -0.80 1 Table A3 Bivariate correlation table for top order batsman Variables batopnar batopnsr batopndbp batopnbp batopuc batopnrss batopnar 1 Batopnsr 0.6296 1 batopndbp -0.6908 -0.6652 1 Batopnbp 0.0216 0.4076 0.2547 1 Batopuc -0.1338 -0.0945 0.1379 0.0871 1 Batopnrss 0.0067 0.4686 0.2708 0.8123 0.0381 1 batopnbdryfq -0.3587 -0.6967 0.261 -0.738 -0.0079 -0.5816 batmidbp batmiduc batopnbdryfq 1 Table A4 Bivariate correlation table for middle order batsman Variable batmidar batmidar 1 batmidsr batmiddbp batmidrss Batmidsr 0.6972 1 batmiddbp -0.5986 -0.7024 1 batmidbp 0.3479 0.5856 0.0297 1 batmiduc -0.0955 -0.0482 0.105 0.0586 1 batmidrss 0.3797 0.7024 -0.0122 0.8686 0.04 1 batmdbdryfq -0.4597 -0.6496 0.2764 -0.8019 -0.015 -0.6743 batmdbdryfq 1 Table A5 Bivariate correlation table for lower middle order batsman Variable batlmar batlmar batlmsr batlmdbpr batlmbp batlmbuc batlmrss batlmbdryfq 1 Batlmsr 0.2773 1 batlmdbpr -0.3063 -0.6595 1 Batlmbp 0.1204 0.6204 -0.037 1 batlmbuc -0.0734 -0.13 0.1265 -0.0474 1 Batlmrss 0.1288 0.763 -0.0928 0.8153 -0.092 1 batlmbdryfq -0.2151 -0.6272 0.3192 -0.7958 0.0528 -0.6225 1
© Copyright 2026 Paperzz