Skyline - The Big Sky Undergraduate Journal Volume 1 | Issue 1 Article 2 2013 Analysis of Baseball Team Construction Kevin Ferris Montana State University, [email protected] Follow this and additional works at: http://skyline.bigskyconf.com/journal Part of the Sports Studies Commons, and the Statistics and Probability Commons Recommended Citation Ferris, Kevin (2013) "Analysis of Baseball Team Construction," Skyline - The Big Sky Undergraduate Journal: Vol. 1 : Iss. 1 , Article 2. Available at: http://skyline.bigskyconf.com/journal/vol1/iss1/2 This Research Article is brought to you for free and open access by Skyline - The Big Sky Undergraduate Journal. It has been accepted for inclusion in Skyline - The Big Sky Undergraduate Journal by an authorized editor of Skyline - The Big Sky Undergraduate Journal. Analysis of Baseball Team Construction Acknowledgments This work was supported by funding from the Undergraduate Scholars Program at Montana State University. I would also like to thank Jim Robison-Cox for all the help he provided with the project. Further ideas and influence came from Steve Cherry and Mark Greenwood, and I greatly appreciate their contributions. Finally, I would like to thank Gregg Ferris, Lik Ming Aw, Blaine Ferris, and Else Trygstad-Burke for their help in preparing this paper. This research article is available in Skyline - The Big Sky Undergraduate Journal: http://skyline.bigskyconf.com/journal/vol1/iss1/2 Ferris: Analysis of Baseball Team Abstract To succeed in the competitive environment of Major League Baseball, teams must assemble the best possible collection of players. While doing so primarily involves acquiring talented players, teams must also account for how these players fit together. In this paper, I used the WAR metric to explore how the distribution of player talent affects team performance. Specifically, an analysis was conducted to determine the effect that altering the moments of this distribution has on team performance. I found that increasing the standard deviation of a team's position players tends to negatively impact the team, and that teams with higher standard deviation and skewness for pitchers also tend to perform slightly worse. These results suggest that teams could benefit by accounting for the spread of player talent. Published by Skyline - The Big Sky Undergraduate Journal, 2013 1 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 Introduction “But for 2013, the question [of how successful the Red Sox will be] centers on the star power [they] have retained." -- Gordon Edes, ESPN Boston [3] “We're probably a ball club where people will say, `They're missing the superstar.' Well, I don't really care. I think this team will challenge one another. I think they'll play hard. We're going to be that pest that never goes away." -- Kevin Towers, Arizona Diamondbacks General Manager [6] Beginning with Bill James's Baseball Abstracts in the mid 1980's, and popularized by Michael Lewis's Moneyball, the rise in popularity of “advanced statistics" has seen a plethora of new statistics sweep across the sports landscape. From hockey to tennis, teams and fans are analyzing sports differently now than they did 10 years ago. The style of the statistical metrics differs wildly from sport to sport. Since baseball primarily involves a faceoff between the batter and the pitcher, determining the “winner" of each plate appearance is a relatively straightforward matter. A substantial amount of work has been done on evaluating the impact a single player has on his team by determining the number of plate appearances he “won.” Conversely, in basketball and hockey, how a group of individuals plays together is more heavily scrutinized. In these sports, one notable statistic is the plus/minus stat, which simply tells how many more points a group of 5 players scored when they were playing together compared to the other team. This approach, which focuses on analyzing a team rather than an individual, arises because these sports feature a group of players acting together to achieve a common goal. Reducing their collective performance to an individual level is http://skyline.bigskyconf.com/journal/vol1/iss1/2 2 Ferris: Analysis of Baseball Team extremely difficult, so more emphasis has been placed on team-oriented analysis in these sports. Here, I attempt to explore baseball performance through team-oriented analysis. Specifically, I set out to analyze how different types of players work together by answering the question, “Which is better for a team: to have two average players or one All Star caliber player and one subpar player?” It could be that there is no difference between the two scenarios, in which case teams should simply try to acquire the best possible collection of players. Alternatively, the effect of having one good player could outweigh the effect of a bad player or vice versa. If that is the case, then teams will have to account for the team dynamic when trying to improve. Data To explore the effect that the composition of player talent can have on a baseball team’s performance, this analysis needed a reliable estimator of player talent in a given year. This is made difficult because player talent manifests itself in different ways. Some players excel on the pitching mound while others contribute on the field or at the plate. Ideally, this study could account for fielding, hitting, and pitching ability when analyzing distributions of player talent. Traditionally, hitters have been evaluated by Batting Average, RBIs, and Home Runs, pitchers by Wins, ERA, and Strikeouts, and fielding performance by Errors and Fielding Percentage.1 These statistics have been the subject of much research, and it has been found that they do not do a very good job of evaluating player performance. The batting and fielding statistics are not representative of everything that a batter or fielder does, while the pitching statistics are influenced by many other factors than just the pitcher [7]. Other player evaluation metrics had to be used. 1 If the reader is unfamiliar with any of the statistics discussed in this section, I would recommend checking the Baseball Statistics Wikipedia page. Published by Skyline - The Big Sky Undergraduate Journal, 2013 3 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 There are multiple outcomes that a hitter can have for each plate appearance. He could strikeout (his worst scenario), hit a home run (his best scenario), or do something in between. Since each unique outcome of a plate appearance either helps or hurts a team, a useful hitting statistic would apply a weight to each outcome where each weight depends upon how much its respective outcome helps or hurts the team. A baseball player’s hitting value would then be the sum of his weighted outcomes throughout the year. The Weighted On-Base Average (wOBA) statistic attempts to do just that. The weights it uses have been derived by calculating how many runs each outcome is worth on average [13]. Because it fulfills the desired properties, wOBA was used as a measure of the value of a player’s batting performance over the course of a baseball season. Fielding performance is the most difficult aspect to measure. Presently, the best way to measure a player’s fielding performance is to begin by comparing his performance to the performance of an “average” fielder. Each fielder is given credit if he makes a play that an “average” fielder would fail to make, and deducted credit if he fails to make a play that an “average” fielder would make. A player’s total fielding value for a year would be the sum of all these credits and deductions. Fielders could also be given credit for how often they throw out or otherwise influence base runners in a similar manner. Ultimate Zone Rating (UZR), Defensive Runs Saved (DRS), and Total Zone Rating (TZR) are three statistics that model fielder performance in this manner [11, 14]. For this analysis, they were used to incorporate a player’s fielding ability during a season. How to properly evaluate pitchers has been the subject of much scrutiny. Most studies have found that pitchers have relatively little control over what happens when a ball is put in play [7]. These results suggest that pitching performance should be analyzed primarily based on outcomes over which a pitcher has control – walks, home runs, and strikeouts. The metrics built off of these studies apply a weight to each of these outcomes. These weights are then used to calculate how many runs that pitcher “should” have allowed while http://skyline.bigskyconf.com/journal/vol1/iss1/2 4 Ferris: Analysis of Baseball Team “ignoring” other outcomes.2 Fielding Independent Pitching (FIP) is a metric developed along these guidelines. The calculation of some metrics, however, is still based on how many runs a pitcher did allow. The number of runs is then adjusted according to run scoring environment and the defense of the pitcher’s team. An example of this type of metric is adjusted Runs Allowed (xRA). Despite the differences in calculation, the two metrics generally tend to give similar results. These metrics give a very good idea of a player’s contribution in each facet of a baseball game. Since the purpose of this paper was to look at a player’s overall talent level, the next step was to combine these statistics into one metric which can summarize a player’s total contribution to his team. This is exactly what the Wins Above Replacement (WAR) statistic does. It takes a player’s hitting, fielding, and pitching performance, puts them on a common scale, and the sums them.3 In doing so, WAR attempts to answer the question, “If [a] player got injured and his team had to replace him with a minor leaguer, how many fewer games would his team win?” [12] WAR would therefore appear to be a very good evaluation of a player’s overall contribution to his team during a season. It is important to note that WAR is not a perfect statistic. Especially with respect to fielding, the current estimates are not perfect representations of a player’s value. However, WAR has been shown to have a very strong relationship with wins [1]. Using linear regression, it was found that on average a one WAR increase is associated with winning almost exactly one more game per season [2]. These results suggest that WAR is a fairly good representation of a player’s contributions. This study proceeded using WAR as a measure of how much a player helped his team over the course of a season. 2 The metrics don’t ignore the other outcomes entirely, but they are almost entirely driven by walks, strikeouts, and home runs. 3 WAR also considers base running, but base running is a minor part of the game and does not contribute very much to WAR. As a result, I chose not to discuss it here. Published by Skyline - The Big Sky Undergraduate Journal, 2013 5 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 Because there are multiple metrics used to calculate player performance, there are many different ways that a player’s WAR could be calculated. However, two – fWAR and rWAR – are the most used. fWAR uses wOBA, UZR, and FIP while rWAR uses wOBA, DRS, and xRA.4 To avoid any problems with combining fWAR and rWAR, this paper uses two separate models: one for fWAR, and one for rWAR. However, the metrics are very similar (the correlation between fWAR and rWAR for position players is 0.92) so the two models should yield similar results. The data were collected in June, 2012. The data used for this analysis begin in 1974 (the first year fWAR was available for pitchers) and end in 2011. The three strike-shortened seasons of 1981, 1994, and 1995 were omitted. The data consist of both fWAR and rWAR values for each player over this time period. Methodology To analyze the theoretical difference between a team acquiring two average players and a team acquiring one good and one bad player, the moments of each team’s distribution of player WAR were analyzed. Since the WAR for two average players should be much closer together than the WAR for a good and a bad player, the standard deviation of the two average players should be smaller. This concept may be extended beyond the simple case of two players to the entire team: teams which emphasize average players should have smaller standard deviations than teams that emphasize the extreme players. Furthermore, a WAR above 4 is considered good, while an average WAR is close to 2 [12], so a team of more average players will also tend to have lower skewness values. This 4 Prior to 2002, UZR and DRS are not available so fWAR and rWAR use statistics which are similar to TZR. http://skyline.bigskyconf.com/journal/vol1/iss1/2 6 Ferris: Analysis of Baseball Team analysis proceeds by checking to see if standard deviation and skewness of a team’s distribution of player talent influences that team’s outcome. Distributions for 10 Worst Teams Distributions for 10 Best Teams 1.5 Density 1.0 0.5 0.0 0 4 8 12 0 4 8 12 Player WAR Figure 1: Batter WAR Density Plot Figure 1 contrasts the distribution of position player WAR for the 10 most successful teams of the last 30 years with the distribution of the 10 least successful teams. It is clear that the successful teams have distributions that are more spread out; however, they also have fewer players with a WAR close to 0. Players with 0 WAR are near replacement level, and provide barely any value to the team. It could be that successful teams are successful simply because they avoid having too many replacement level players on the team. To avoid this possibility, the model of team success will have to control for a team’s overall talent level. Published by Skyline - The Big Sky Undergraduate Journal, 2013 7 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 The model this paper uses therefore begins by accounting for the sum of the WAR for the players on each team (an estimate of that team’s overall talent level). It then examines the effect of changing a team’s standard deviation and skewness. The model takes the following form: While WAR attempts to put batters and pitchers on the same scale, there is some evidence that 1 WAR for batters does not have the same effect as 1 WAR for pitchers. As a result, the model is updated so the values for pitchers and position players are evaluated separately. rWAR separates pitchers and hitters while fWAR goes further and separates starting and relief pitchers. Since WAR for relief pitchers (relievers) could differ from WAR for starting pitchers (starters), the model using fWAR looks like: The model using rWAR looks similar, but starter and reliever WAR are merger into a single pitcher term. Finally, some controlling variables such as league, year, team’s wins in the previous year, and run scoring environment were considered. Year and run scoring environment did not provide any useful information and were removed for the model. The team’s wins in the previous year were used for both the fWAR and rWAR models, while league was only needed in the rWAR model. Results and Discussion http://skyline.bigskyconf.com/journal/vol1/iss1/2 8 Ferris: Analysis of Baseball Team Effects of Batter Standard Deviation for different WAR 120 Sum = -6 Sum = 6 Sum = 17 Sum = 29 Sum = 41 Sum = 53 Wins 100 80 60 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Batter SD Figure 2: Batter Effects Plot See Table 1 on page 10 for a full summary of the regression results. Skewness dropped out of the model for batters and relievers (p-value testing the final model versus a model with a skewness interaction was 0.9071). However, in the rWAR model, it was found to be important for pitchers while standard deviation was not. Both were important for starting pitchers in the fWAR model. Since the results were similar and the fWAR model has a slightly lower AIC, the subsequent discussion uses the fWAR model only. A plot of the relationship between batter standard deviation and wins is presented in Figure 2. Each panel in the plot corresponds to a different sum of a team’s batter fWAR. Published by Skyline - The Big Sky Undergraduate Journal, 2013 9 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 The slope of the estimated regression line changes substantially across the panels. It appears that teams with poor batters (i.e. teams with a low sum) are generally helped by increasing the standard deviation of the team’s position players. For teams with talented batters, the relationship changes, and it is generally harmful to increase the standard deviation. The visualizations are supported by the regression results: the estimated coefficient for the standard deviation term is 3.575, while the coefficient of the interaction term is estimated to be -0.134. These results suggest that for weak teams, the negative effect of the interaction term is offset by the positive effect of the standard deviation term. For strong teams, however, the positive effect no longer outweighs the negative effect. For starting pitchers, skewness is most important in the fWAR model. In Figure 3, it can be seen that starting pitcher skewness is negatively related to team performance regardless of the level of pitching talent. These results suggest that increasing the skewness of a team’s starting pitchers by one while holding the sum constant is estimated to be associated with a decrease of between 0.274 and 3.877 wins on average. High skewness values for a team usually mean that team has one or two elite starters. In baseball, starting pitchers pitch approximately once every five days. These results might be suggesting that a team is better off acquiring two average starters – who combine to pitch two out of every five days – than one elite starter. http://skyline.bigskyconf.com/journal/vol1/iss1/2 10 Ferris: Analysis of Baseball Team Effects of Starting Pitcher Skewness Sum = -0.2 Sum = 5 Sum = 10 Sum = 16 Sum = 21 Sum = 26 95 90 Wins 85 80 75 70 -1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2 -1 0 1 2 Starter Skewness Figure 3: Starter Effects Plot The results of the fWAR regression provide strong evidence that reliever interactions are important. Figure 4 shows the effects of changing reliever standard deviation for teams with different reliever WAR. Here, it looks as though teams with different standard deviations can have fairly different records. Published by Skyline - The Big Sky Undergraduate Journal, 2013 11 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 Effects of Relief Pitcher SD Sum = -4 Sum = -1 0.5 1.0 1.5 0.5 1.0 1.5 Sum = 2 Sum = 4 Sum = 7 Sum = 10 0.5 1.0 1.5 0.5 1.0 1.5 95 90 Wins 85 80 75 70 0.5 1.0 1.5 0.5 1.0 1.5 Reliever SD Figure 4: Reliever Effects Plot Teams with a fairly weak bullpen and a low standard deviation appear to perform much worse than teams with a weak bullpen and high standard deviation. A team with a weak bullpen but a high standard deviation would probably have many poor relievers, and only one or two strong relievers. Perhaps, in the past such team benefitted by using the good relievers in close games and the weak ones in blowouts. This would also explain why the effect of reliever standard deviation decreases as the bullpen improves. The benefit of having one elite reliever for close games is not as powerful for a good bullpen – in this case, the next best option is still an above average reliever. http://skyline.bigskyconf.com/journal/vol1/iss1/2 12 Ferris: Analysis of Baseball Team Conclusion This paper set out to answer the question of whether it is more beneficial for a team to acquire two average players or one good player and one bad player. The evidence found in this paper suggests that there is a difference between the approaches. Interestingly, the more beneficial option depends on the level of team talent. Teams with good position players might opt for a different decision than teams with poor position players. This suggests that team performance in baseball is far more complicated than just the overall talent of a team’s players. In the previous discussion of the effects of starter skewness, it was noted that increasing starter skewness is usually harmful to team performance. Based on this result, when teams try to improve their pitching staffs they should also try to minimize the increase in skewness. However, increasing talent level while holding skewness constant is challenging (the correlation between the two variables is estimated to be 0.59). Therefore, teams face a trade-off when considering how to improve their starting pitching staffs. They must decide whether the benefit from improving with a single starter is worth the skewness cost. As a final note, this study is not meant to draw firm conclusions about the nature of baseball teams. Rather, it is more of an exploration into the mechanics of team performance. It has been shown that there is a strong relationship between distribution of player talent and team performance. However, a team with poor relievers should not expect immediate improvement by increasing the standard deviation of its relievers. This study suggests that teams may be able to improve by accounting for their distribution of player talent. Published by Skyline - The Big Sky Undergraduate Journal, 2013 13 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 Acknowledgements This work was supported by funding from the Undergraduate Scholars Program at Montana State University. I would also like to thank Jim RobisonCox for all the help he provided with the project. Further ideas and influence came from Steve Cherry and Mark Greenwood, and I greatly appreciate their contributions. Finally, I would like to thank Gregg Ferris, Lik Ming Aw, Blaine Ferris, and Else Trygstad-Burke for their help in preparing this paper. http://skyline.bigskyconf.com/journal/vol1/iss1/2 14 Ferris: Analysis of Baseball Team References [1] Dave Cameron. War: It works. Fangraphs, 2009. [2] Glenn DuPaul. What is war good for? Hardball Times, 2012. [3] Gordon Edes. Enough sox star power? ESPN Boston, 2013. [4] J. Fox and J. Hong. Effect displays in r for multinomial and proportional-odds logit models: Extensions to the effects package. Journal of Statistical Software, 2009. [5] Marek Hlavac. stargazer: LaTeX code for well-formatted regression and summary statistics tables. Harvard University, Cambridge, USA, 2013. R package version 3.0.1. [6] Tyler Kepner. Boom-and-bust diamondbacks try for a steady approach. New York Times, 2013. [7] J. Keri and B.B. Prospectus. Baseball Between the Numbers: Why Everything You Know about the Game Is Wrong. Basic Books, 2007. [8] D Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, and F. Leisch. e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, 2012. R package version 1.6-1. [9] Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, and R Core Team. nlme: Linear and Nonlinear Mixed Effects Models, 2013. R package version 3.1-109. [10] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2013. [11] Steve Slowinski. Uzr. Fangraphs, 2013. [12] Steve Slowinski. What is war? Fangraphs, 2013. [13] Steve Slowinski. woba. Fangraphs, 2013. [14] Baseball Reference Staff. Position player war calculations and details. Baseball Reference, 2013. Published by Skyline - The Big Sky Undergraduate Journal, 2013 15 Skyline - The Big Sky Undergraduate Journal, Vol. 1 [2013], Iss. 1, Art. 2 [15] Hadley Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009. [16] Hadley Wickham. The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40(1):1{29, 2011. [17] Yihui Xie. knitr: A general-purpose package for dynamic report generation in R, 2013. R package version 1.1. http://skyline.bigskyconf.com/journal/vol1/iss1/2 16 Ferris: Analysis of Baseball Team Table 1: Regression Results Bsum Bsd Bsum:Bsd fSSum fSsd fSSkew fSSum:fSSkew fRsum fRsd fRsum:fRsd fWAR Model 1.069*** (0.069) 3.575*** (1.252) -0.134*** (0.041) 0.583*** (0.090) 2.128*** (0.588) -2.075** (0.919) 0.093 (0.068) 1.532*** (0.191) 8.086*** (1.616) -1.101*** (0.296) rPsum rPskew Constant Note: Published by Skyline - The Big Sky Undergraduate Journal, 2013 39.470*** (1.913) rWAR Model 1.125*** (0.056) 2.265** (0.935) -0.087** (0.036) 0.966*** (0.023) -0.812*** (0.215) 48.122*** (1.217) *p<0.1; **p<0.05; ***p<0.01 17
© Copyright 2026 Paperzz