In Search of David Ross Scott A. Brave, R. Andrew Butters, and Kevin Roberts 1. Introduction Baseball 1636 Chemistry, intangibles, and a whole that is greater than the sum of its parts: These are the euphemisms that often get thrown around in locker rooms and the sports media in an effort to rationalize how a team made up of seemingly inferior players manages to outperform another that on paper looks unbeatable. While these David vs. Goliath analogies are plentiful, little consensus exists on the proper way to attribute a team’s performance to its chemistry. Here, we set out on a journey to accomplish just that. While we are certainly not the first to go in search of this “holy grail” of sports analytics, we take a novel approach, drawing on spatial and network statistics to offer a new lens for viewing what it means for a team or player to exhibit good chemistry. 1 Major League Baseball (MLB) represents an intriguing opportunity for such an analysis given the level of sophistication that has been developed in measuring the impact of individual performances on team outcomes. Furthermore, as fans of the 2016 World Champion Chicago Cubs, a personal motivation for this choice exists as well: a “search for David Ross.” David Ross is the epitome of where advanced metrics and player intangibles are at odds. As a back-up catcher, David Ross’ individual performances define him as nothing more than a serviceable role player; but as a teammate, David Ross is routinely characterized as someone who makes everyone around him better. Our aim is to quantify the “David Ross Effect,” or the indirect impact that an individual player can have on team wins through making their teammates better. We begin our analysis by using FanGraphs’ wins-above-replacement metric, fWAR, to construct MLB player productivity residuals for the 1998-2015 seasons. These residuals reflect the difference between the expected and actual number of team wins that can be attributed to each player in a given season. When aggregated across teammates, by construction they measure the difference between a team’s actual win count and what it would be expected to be based solely on individual player performances. This feature allows us to analyze the element of team performance that could instead be due to interactions between the players on a team. Our analysis suggests that the scope for this explanation of the win-loss ledger of MLB teams could be quite large, with a range of as much as 40 wins, or roughly 20 percent of the variation in wins across teams. To account for player interactions, we use a spatial factor model to decompose our individual player productivity residuals into two separate unobserved components. The first component identifies what we call character players, or those players who positively influence their teammates regardless of the team that they play for; while the second component accounts for the role that a team’s field and front office staff have on team performance to isolate what we call team players. 1 See for instance SyncStrength (2016), Kelly (2016), Levine (2015), Phillips (2014), and Carleton (2013). 1 2017 Research Papers Competition Presented by: This second component also makes it possible to capture a team’s historical ability to consistently turn individual player talents into extraordinary team outcomes, allowing for a relative ranking of MLB teams that can be used to measure front office performance on the dimension of team chemistry, or what we refer to as organizational culture. Our methodology has a natural extension to network statistics that then allows us to construct refinements of fWAR that isolate a player’s own contribution to team wins irrespective of his teammates, fWAR − , and his contribution adjusted for his effect on his teammates through our two team chemistry factors, fWAR + . Using fWAR − to adjust for player interactions, we demonstrate that roughly 50% of the discrepancy between the sum of a team’s players’ fWAR and team wins can indeed be explained by our definition of team chemistry. Similarly, using fWAR + , we show that fWAR tends to overvalue the relative contribution of low impact players and undervalue the relative contributions of high impact players to their team’s performance. We refer to the total network effect of a team’s players on each other, obtained by summing the differences between fWAR − and fWAR, as tcWAR, or team chemistry WAR. With this new metric, we document that high winning percentage teams do in fact tend to exhibit good team chemistry. That said, not all good teams exhibit good chemistry, and not all bad teams exhibit bad chemistry. Relating tcWAR to a team’s wins-above-average, we show that there exists considerable variation on this dimension, and identify teams for which our team chemistry factors played either a surprisingly large positive or negative role in it’s performance. A player’s net impact on his team’s performance through his teammates, i.e. fWAR + − fWAR , is then what we refer to as pcWAR, or player chemistry WAR. By constructing age-position profiles for pcWAR conditional on player and team characteristics, we show that the conventional wisdom that good players and older players make for good teammates has support empirically. However, the latter tends to vary by position, with designated hitters, relief pitchers, first basemen, and catchers making positive contributions to team chemistry at younger ages on average than other players. Players who play more than one position also tend to have higher pcWAR values on average. Using our conditional age-position profiles, we then classify players based on their “intangibles,” defined by whether or not they exceed or fall short of their conditional age-position profile, and rank them on this dimension and their talent level. It is here where our journey comes full circle. Looking at David Ross’ intangibles reveals a player who not only consistently outperformed his conditional age-position profile for much of his career even at low levels of fWAR − , but did so at a position that tends to support team chemistry more generally for older players. 2. Measuring Team Chemistry The first step in our analysis of team chemistry is to construct individual player productivity residuals capturing the difference between the expected number of team wins arising from a player’s performance relative to how many games that player’s team actually won. 2 To measure a player’s individual performance, we make use of FanGraphs’ wins-above-replacement metric, fWAR, an advanced sabermetric that captures how many total wins a player contributes to his team 2 Details on the data and their sources can be found in the Appendix. 2 2017 Research Papers Competition Presented by: above a replacement level player at the same position (FanGraphs, 2016a). With these measures in hand, we then move to modeling the interactions between teammates and the indirect effect they may have on team performance. 2.1. fWAR and Team Wins The strength of fWAR is its convenience. It compresses all of the things that an individual baseball player can do to help his team win, both at the plate and in the field, into one number. fWAR is not perfect, however, and many have disagreed as to its value in judging the relative performance of players (e.g. Passan (2014), Keller (2014a)). Another shortcoming of fWAR, and the focus of our analysis, is the lack of a role for interactions among players to impact team performance. We show in this paper that this tends to manifest itself in the fact that simply summing the fWAR values for a team across its players does not perfectly replicate its wins above those expected of a team composed entirely of replacement-level players. To get a sense of exactly how important player interactions may be to team performance, we regressed the number of wins for each team on the sum total of its players’ fWAR. Specifically, we ran a linear regression of the form 3 Wnt = α + βfWARnt + ε nt , where Wnt is the number of wins of team n in season t and fWARnt is the sum total of FanGraphs’ wins-above-replacement statistics for all players on team n in season t. The εnt in this regression are what we call team productivity residuals. A team with a large and positive εnt was a team who outperformed, or won more games than what could be attributed to the sum of its individual player performances. Alternatively, a team with a large negative residual would be a team who despite having a high number of strong individual performances (as measured by fWAR) under-performed as it pertains to their number of team wins. The results from this regression using MLB team data from the 1998-2015 seasons provide several insights. First, it is clear that the estimate of β ends up very close to 1. 4 This is intuitive given how fWAR is constructed (FanGraphs, 2016a), but also allows us to confidently use the idea that increasing a team’s fWAR should have a one-to-one relationship with their number of wins. Furthermore, the estimate for α comes out to be near 48. This estimate also has a natural interpretation of being the number of wins one would expect a team full of replacement level players to accrue. At 48, clearly a team with only replacement level players is far from an average, or 0.500 winning percentage, baseball team. With that being said, it is consistent with the construction of fWAR; and, thus, serves as a benchmark for us to evaluate teams. Re-arranging the regression equation and substituting in our estimates of α and β, team productivity residuals are then given by, εˆnt = Wnt − 48 − fWARnt . 3 4 Keller (2014a, b) conducted a similar analysis in his defense of fWAR. In fact, the null hypothesis of β = 1 cannot be rejected at any standard confidence levels. 3 2017 Research Papers Competition Presented by: The εˆnt are our estimate of the element of team performance that is unexplained by the sum of its players’ individual performances, and the variation that we may potentially attribute to a team’s chemistry. Based on the R2 value of the previous regression, this amounts to about 20% of the variation in team wins in our sample. Figure 1 further demonstrates just how important this element is by plotting a kernel density function of εˆnt . With a standard deviation of 5 wins and a range equal to approximately 40 wins, it is evident that a considerable portion of the variability in team performance cannot be explained by the sum of individual player performances alone. Despite the immense progress sabermetricians have made in the sport of baseball, there exists significant room for the role of interactions among players to factor into the variation in team performances. .15 Density .1 .05 0 -20 -10 0 Wins Above Team WAR fWAR 10 20 fWAR- kernel = gaussian, bandwidth = 1.2900 Figure 1: Kernel Densities of Team Productivity Residuals 2.2. Team Wins and Player Interactions Given the seemingly large role empirically that team chemistry may have on wins and losses, we next focus on decomposing these team residuals into player-specific productivity residuals. To decompose team productivity residuals into contributions from individual players, we assume εˆnt = ∑εˆint i = ∑Wˆint − ∑ fWARint , i 4 i 2017 Research Papers Competition Presented by: where Ŵint is a measure of the expected contribution of player i to team wins taking into account his position and amount of playing time such that Wˆ = W − 48 . Player position weights are ∑ i int nt defined following FanGraphs’ methodology (FanGraphs, 2016a) and appearance weights are derived from at-bats and defensive outs for position players and outs recorded for pitchers after adjusting for the relative importance of defensive positions and starting versus relief pitchers. 5 When aggregated across players on a given team in a given season, our player productivity residuals by construction measure the difference between a team’s actual win count and what it would be expected to be based on the sum total of individual player performances. We then model the interactions between players as a spatial autoregression (SAR), εˆint = ρAεˆint + υint , where A is an adjacency matrix identifying teammates in a given season. 6 Typically, an adjacency matrix is a symmetric matrix with 0’s on the diagonal and 1’s off the diagonal “connecting” teammates. However, in order to capture potential dynamics in teammate relationships, we replace the 1’s with the number of MLB teams that teammates have played together on through the end of each season. This allows for added weight to repeated “connections” in the SAR in explaining player performance interactions and takes into account the panel data nature of our dataset. Furthermore, we assume that a common factor structure exists for the SAR residuals, νint, such that player productivity residuals are driven by a player-season specific (fit) component, as well as a team specific component (λn). The team specific component is constant over time and primarily reflects an organization’s tendency to over- or under-perform relative to the collection of its players’ fWARs. The player-season specific component instead traces out a player’s career arc, potentially across several teams, and reflects whether that player finds himself among over- or under- performing teammates in each season. Solving for εˆint then yields our spatial factor model with the spatial weight matrix W = ( I − ρA) −1 . εˆint = ( I − ρA) −1 ( f it λn ) = WFΛ This model can be consistently estimated using spatial principal component analysis (SPCA) to extract the latent player-season and team specific components by imposing scale normalizations on either F or the factor loadings λ as well as ρ and A. 7 In the next section, we provide motivation for what these factors may capture. 5 Further details on the construction of Ŵint can be found in the Appendix. For more information on spatial autoregressions, see Conley (2008). Section (6.2.2) in the appendix provides a more detailed discussion of the required normalizations. For more information on spatial principal components analysis, see Demsar et al. (2012). 6 7 5 2017 Research Papers Competition Presented by: 3. The Network Effects of Team Chemistry Our spatial factor model fits the definition of a “network.” The players on a team in a given season make up the “nodes” of the network, with the strength of the connections between teammates summarized by our factors and their loadings. In other words, our model is simply a statistical framework for measuring the importance of correlations across player performances. In this section, we refine fWAR in order to take into account the correlations in the performance of teammates; and, at the same time, construct new measures of team and player chemistry. 8 3.1. Sources of Team Chemistry The primary difficulty that others have faced when trying to measure team chemistry has been their focus on identifying a priori the factors that drive the correlations between the performances of teammates. Our approach is different in that we treat these factors as latent variables and identify them off the correlations themselves. We view this as being consistent with the conventional wisdom that team chemistry is anything that makes teams better than they otherwise would be as individuals. Seen in this light, our methodology for measuring team chemistry boils down to nothing more than a decomposition of the spatial correlation matrix of teammates’ productivity residuals into an exact linear combination of latent factors. To see this, consider that we can decompose our player productivity residuals into two parts: 1) a part that is unique to each player that we attribute to measurement error in team productivity residuals, and 2) a part that can be explained by each player’s interactions with his teammates that we attribute to team chemistry. ( f itλn) + εˆint = w ii "Own Contribution" ∑w ( f λ ) . ij jt n j ≠i "Teammate Contribution" We associate positive spill-overs with “good team chemistry” and negative spill-overs with “bad team chemistry.” This is because, given that wij < 0 as constructed, a player will exhibit positive spill-overs to his teammates’ productivity residuals as long as fitλn < 0. Conversely, a player with fitλn > 0 will necessarily exhibit negative spill-overs. We do not take a stance on what drives these spill-overs between teammates; and, in all likelihood, our latent factors probably capture a combination of many of the determinants of team chemistry that others have already explored. However, by not restricting them ex-ante, they likely also embody elements of team chemistry that have not previously been able to be measured. The extent to which we provide context for our factors is thus to appeal to the work of other social scientists who have singled out certain psychological traits, such as “character” and being a “team player,” as being attributes of individuals in groups that excel in working together. By allowing for two common factors and restricting the factor loadings across them such that F = [ch,tp] and Λ = [l,λ], where l is a unit vector across teams, we can restrict our factor model to embody similar features. εˆint = wii (chit ln + tpit λn ) + ∑wij (ch jt ln + tp jt λn ) j ≠i 8 For a comprehensive treatment of the network literature, see Jackson (2008) and the citations within. 6 2017 Research Papers Competition Presented by: We think of players with negative ch values as being good character players, as they demonstrate positive spill-overs to their teammates which do not depend on the identity of their team. In contrast, we label players with negative tp values as being good team players, because their contribution to their teammates through tp depends on the team for which they play via λ. Teams with large λ are STL SF then said to exhibit good BOS ARI PHI organizational culture, as SEA HOU they either reinforce ATL CLE positive spill-overs (tp < 0 & COL LAD LAA λ > 0) or minimize negative TEX WAS spill-overs (tp > 0 & λ < 0). NYY TOR Figure 2 plots estimated CHW DET TB values of λ for all 30 MLB KC CHC teams. Certain organizations PIT BAL MIL stand out along this NYM MIN dimension. For instance, the SD MIA CIN St. Louis Cardinals, San OAK Francisco Giants, and Boston -.3 -.2 -.1 0 .1 .2 Red Sox demonstrate very Organizational Culture large negative values of λ, Negative values denote teams that reinforce positive spill-overs from good chemistry players. Positive values denote teams that minimize negative spillovers from bad chemistry players. suggesting that historically these teams have Figure 2: MLB Team Chemistry Factor Loadings constructed their rosters in such a way as to reinforce the positive spill-overs from good chemistry players. In contrast, teams like the Oakland Athletics, Cincinnati Reds, and Miami Marlins appear to have instead minimized the negative spill-overs from bad chemistry players. 3.2. Adjusting fWAR for Team Chemistry If fWAR measurements are indeed correlated across teammates, then the regression underlying our team productivity residuals is mis-measured. Namely, fWAR may be under- or over-counting the importance of individual player contributions to team wins by ignoring the interactions between teammates. To adjust for this possible source of bias, we construct an alternative measure called fWAR − which subtracts from the fWAR of each player the portion of his productivity residual that can be explained by his teammates’s residuals. In network statistics, this is often referred to as the “in-degree” for a node. − fWARint = fWARint − ∑wij f jt λn j ≠i ``In − degree′′ Similarly, we can refine fWAR as a measure of player performance by taking into account how much a player affects his teammates’ performance. Here, we add to fWAR − the contribution of each player to all of his teammates’ productivity residuals, or what is referred to in network statistics as the “out-degree” of a node. We call this measure fWAR + . 7 2017 Research Papers Competition Presented by: + − fWARint = fWARint + ∑w ji fit λn i≠ j "Out − degree" Figure 1 demonstrates the relative importance of adjusting fWAR for correlated teammate performances by also plotting the kernel density of εˆnt constructed from fWAR − . The range of unexplained team performance shrinks by roughly 50%, with the majority of the reduction coming from under-performing teams. This would seem to suggest that “poor clubhouse chemistry” may indeed explain why teams perform poorly more so perhaps than “superior clubhouse chemistry” explains why teams perform well. We can get a sense of the impact that this adjustment has on the the productivity residual for any individual team by examining the aggregation of their differences between fWAR − and fWAR over players in each season. This is often referred to as the network’s “total-degree.” We call it “team chemistry wins-above-replacement,” or tcWAR. tcWARint = ∑∑wij f jt λn i j ≠i "Total − degree" Figure 3 scatters a team’s wins in each season above an average team (i.e. roughly 81 wins) against its tcWAR. Clearly, the old adage that good teams have good chemistry is affirmed in this figure, though the positive correlation is not as one-for-one as is sometimes argued. This can be seen in the considerable distance for some teams from the 45 degree line in the figure. 40 2001 Mariners 1998 Yankees Wins Above Average 20 2004 Yankees 2008 Angels 1998 Padres 2011 Tigers 2011 Yankees 2006 Athletics 2007 D-backs 2012 Orioles 2011 Red Sox 2009 Rays 0 2003 Royals 1999 Orioles 1998 Orioles 1998 Mariners 2008 Braves 1999 Rockies 2002 Cubs -20 1998 Tigers 2015 Reds 1999 Royals 2008 Padres 2003 Tigers -40 -20 -10 0 tcWAR 10 20 Solid red line is a 45 degree line. Figure 3: Team Chemistry and Wins-above-Average 8 2017 Research Papers Competition Presented by: The figure also marks some of the best seasons for teams on both ends of the chemistry spectrum as well as a few other outlying values. Interestingly, record-high win teams, like the 1998 Yankees and 2001 Mariners, and loss teams, like the 2003 Tigers, do not come across as particularly superior or inferior chemistry teams according to our metric. In fact, the figure makes clear that not all good teams display good chemistry and not all bad teams display bad chemistry on the basis of our metric. Figure 4 scatters fWAR − versus fWAR. Interestingly, fWAR − and fWAR on an individual playerseason basis are very highly correlated, with the plotted points clustered fairly closely around the 45 degree line. Thus, it is the aggregation of somewhat small differences at the player level that leads to the drastic reduction in the unexplained variance of team performance in figure 1. Figure 4 also contains a scatter plot of fWAR + vs. fWAR. Here, the differences are much more pronounced. In particular, fWAR overestimates the relative performance of low impact ( fWAR ≤ 1 ) and underestimates the relative performance of high impact ( fWAR ≥ 4 ) players. fWAR+ vs. fWAR fWAR- vs. fWAR 20 15 15 10 fWAR+ fWAR- 10 5 5 0 0 -5 -5 -5 0 5 fWAR 10 15 -5 0 5 fWAR 10 15 Solid red lines are 45 degree lines. Vertical lines denote thresholds for Scrub/Role (fWAR=1) and Good/Star (fWAR=4) players. Figure 4: fWAR − and fWAR + vs. fWAR The difference between fWAR + and fWAR can therefore be used to evaluate players on the basis of their contribution to team performance through their impact on their teammates. In network statistics, this is what is called the “net-degree” for each node. pcWARint = ∑w ji f it λn − ∑wij f jt λn i≠ j j ≠i "Net −degree" 9 2017 Research Papers Competition Presented by: In keeping with our terminology above, we instead refer to it as “player chemistry wins-abovereplacement,” or pcWAR. The conventional wisdom that good players make their teammates better is confirmed by our analysis of pcWAR, as figure 5 demonstrates a strong positive correlation exists between pcWAR and fWAR − for all player-season combinations in our sample. 9 In the next section, we take a closer look at the characteristics of good team chemistry players. 6 pcWAR 4 2 0 -2 -5 10 5 fWAR- 0 15 Vertical lines denote fWAR thresholds for Scrub/Role (fWAR=1) and Good/Star (fWAR=4) players. Figure 5: Player Chemistry and Wins-above-Replacement 4. The Intangibles of Team Chemistry In this section, we construct age-position profiles of pcWAR controlling for fWAR − and various other player and team characteristics in order to examine a player’s team chemistry “intangibles.” We then use these conditional age-position profiles to classify players along this dimension. 4.1. Age-Position Profiles To construct conditional age-position profiles of players, we run the following regression including up to quartic interaction terms in age, pcWARit = ∑γ p pos pit + ∑θ p ( posit * ageit ) + ∑ψ p ( posit * ageit2 ) + p p ∑τ p ∑δ k p k p ( posit * age ) + ∑ω p ( pos pit * ageit4 ) + 3 it p X kit + ∑φh Z hit + ξ it , h Consistent with figure 5, a very similar correlation exists between pcWAR and fWAR as well. However, we choose to display our results in this way such that if one were to sum across the x-axis and y-axis of the graph fWAR + would be obtained. 9 10 2017 Research Papers Competition Presented by: where pos is an indicator variable for a players’ primary field position including the designated hitter, age is a player’s age, X is a vector of player characteristics including fWAR − and controls for MLB experience and team tenure, batting and throwing hands, and multiple positions played and Z is a vector of team characteristics including both team and manager indicator variables. By conditioning these profiles on so many observable dimensions, our goal is to isolate the player intangibles of team chemistry that fall beyond alternative explanations. In other words, we want to be able to measure the individual contributions to team wins that do not depend on a player’s team or manager as well as his talent level, experience, etc. Furthermore, the estimated coefficients of the above regression demonstrate that many of these factors are indeed important elements of team chemistry. For instance, one additional win-above-replacement, adjusted for a player’s interactions with his teammates, or playing multiple positions increases his pcWAR by a statistically significant 0.33 and 0.07 wins, respectively. Average Marginal Effects with 95% CIs First Basemen Second Basemen Catcher Third Basemen .4 .2 0 -.2 -.4 21 26 31 36 41 21 26 Center Fielder 31 36 41 21 26 DH 31 36 41 21 Left Fielder 26 31 36 41 Right Fielder pcWAR .4 .2 0 -.2 -.4 21 26 31 36 41 21 Relief Pitcher 26 31 36 41 21 26 Starting Pitcher 31 36 41 36 41 21 26 31 36 41 Shortstop .4 .2 0 -.2 -.4 21 26 31 36 41 21 26 31 36 41 21 26 31 Age Conditional on fWAR-, league and team experience, batting and throwing hand, multiple positions played, and manager and team indicators Figure 6: Age-Position Team Chemistry Profiles Figure 6 plots our conditional age-position pcWAR profiles with 95% confidence intervals. These plots demonstrate the conditional mean pcWAR for each position by age, such that transitions from negative to positive values over time denote the average age when the switch occurs from being a “bad intangibles” to a “good intangibles” player. The conventional wisdom that older players make for better teammates is certainly consistent with these profiles, as they tend to slope upward with age across all positions even after controlling for team and MLB experience. However, some additional interesting patterns also emerge from this analysis. For instance, designated hitters, 11 2017 Research Papers Competition Presented by: relief pitchers, first basemen, and catchers achieve this transition earlier than others on average, most of whom do not reach this point until their late-thirties or early-forties. 10 4.2. Player Rankings We use the residuals from our conditional age-position profile regressions to construct player rankings for intangibles. Positive values for ξ it capture players whose intangible contributions to team chemistry exceed their conditional age-position profile, whereas negative values correspond to players whose intangible contributions fall short of their profile. We can then jointly classify these players along the scale established by FanGraphs for fWAR applied to our fWAR − statistic to refine them into categories that reflect both their intangibles and talent level. Figure 7 contains a scatter plot of ξ it versus fWARit− for all player-season combinations, with the color dots denoting the six types of players that we classify. Summing across the x-axis and y-axis of the graph produces an estimate of our fWARit+ metric that controls for MLB experience and team tenure, batting and throwing hands, and multiple positions played as well as team characteristics including both team and manager indicator variables. As such, it can be viewed as the combined value of the player to team performance stemming from his own performance and his intangibles. 2 Franchise Player Glue Guy Intangibles 1 Diamond-in-the-Rough 0 Prima Donna -1 Clubhouse Cancer Trade Bait -2 -4 -2 0 2 4 6 8 10 12 14 fWARIntangibles are the residuals from pcWAR regressed on fWAR-, league and team experience, age-position profile, batting and throwing hand, multiple positions played, and manager and team indicators. Vertical lines denote thresholds for Scrub/Role (fWAR=1) and Good/Star (fWAR=4) players. Figure 7: Intangibles and Wins-above-Replacement A simple hypothetical helps to put our classification, or typology, of players into context. Imagine two players with identical fWARit− (or, before this paper, fWARit ). Now, imagine one of the players 10 We want to caution anyone from taking the results from this regression as “causal” estimates of age on intangibles, as the estimated coefficient is most likely also confounding a selection effect for older players. In other words, having good intangibles may make it more likely for a player to remain in the game for longer. 12 2017 Research Papers Competition Presented by: had a positive intangibles measure, while the other one had a negative measure. Given that these intangible qualities only manifest themselves as spill-overs to the peformance of teammates, before having either player on your team both would seem equally qualified to sign. However, the player with a positive intangibles measure, once joining your team, would likely generate positive spillovers and you would probably begin to value this player even more highly. Alternatively, the player with the negative intangibles measure, once joining your team, would likely not generate as much positive spill-overs and your view of the value of this player would not change much. Of course, the extent to which this example holds true also depends on the level of these players individual performances on your team. Therefore, it is on this joint dimension that we characterize players. We begin with the two player types that have very high, or “Star” ( fWAR ≥ 4 ), quality individual performances. Among these “Star” players, our first type, the Franchise Player, are those players with good intangibles. These are the players who make their teammates better. The active player with at least one year of service time that best embodies this type based on his average intangibles and fWAR − scores is Joey Votto. Others with similar intangible scores are Giancarlo Stanton, and Mike Trout. Contrast these players with our first type of bad intangibles player, the Prima Donna, who makes a positive contribution to his team through his own performance, but has less of an impact on his teammates than his stature on the team would normally dictate. Here, we find a very small group of players, but one that includes (in descending order of intangible scores) Max Scherzer, Adrian Beltre, Clayton Kershaw, Jason Heyward, and Buster Posey. These are all players that an MLB franchise would be happy to build their team around based on their individual talent; but seem less likely to have those talents cascade through to their teammates. Franchise Player ≡ ξ it > 0, Prima Donna ≡ ξ it < 0, fWARit− ≥ 4 fWARit− ≥ 4 In the middle are those players that can be classified as “Role/Solid Starter/Good” players, i.e. 1 < fWAR < 4. These players under a conventional sabermetric approach would be sought after to fill out your roster around your “Star” players. Under our typology, a Glue Guy fits this bill under fWAR but also is a player with good intangibles. Not only do teams seek these players out for their individual performance; but once they get to the team, they also tend to positively impact their teammates. The active player best embodying this type is Kevin Keirmaier, closely followed by Chris Sale and the recently deceased Jose Fernandez. On the other side, a Trade Bait player, as the name would suggest, is one who has the sought after, or appealing, individual performance, but who otherwise has less of a meaningful impact on his teammates. Here, we find a wide range of players including, but, not limited to, a top three of Gregory Polanco, Elvis Andrus, and Evan Gattis. Glue Guy ≡ ξ it > 0, 1 < fWARit− < 4 Trade Bait ≡ ξ it < 0, 1 < fWARit− < 4 We affectionately term our third type of good intangibles player the Diamond-in-the-Rough. This player is exemplified by an fWAR ≤ 1 . An example case here would be the “Scrub” whose contribution to the performance of others is much greater than his own. Most often these are journeymen players, typically relievers, who do not stick around long with their teams despite their impact on their teammates. However, a few household names like Rich Hill and Wellington Castillo 13 2017 Research Papers Competition Presented by: do appear on this list. Conversely, the Clubhouse Cancer contributes little to the team from his own performance and tends to make his teammates worse off. Surprisingly, it is not all that difficult to find fairly well-known examples of this type, for instance: Nick Castellanos, Jeremy Hellickson, Skip Schumaker, Mitch Moreland, and James Loney. We suspect that this is because teams often favor raw talent over intangibles, holding on to such players longer than they normally otherwise would on the chance that their talent develops enough to justify their place on the team. Diamond - in - the - Rough ≡ ξ it > 0, Clubhouse Cancer ≡ ξ it < 0, fWARit− ≤ 1 fWARit− ≤ 1 Returning to our original motivation, at this point we can also address where David Ross fits into our typology. Figure 8 plots the ξ and fWAR − values for all of David Ross’ seasons played through 2015. More than one labeled instance of a season occurs whenever he was traded mid-season. The vast majority of David Ross’ playing career would characterize him as a “Glue Guy” or “Diamond-inthe-Rough,” consistent with his reputation among his teammates. While he does not fall in the upper echelon for either category, his contributions to team chemistry relative to his conditional age-position profile are not trivial, ranging from a high of about +0.25 wins in mid-career to a low of about -0.30 wins in 2015. .4 2008 2008 2009 2010 2002 2005 .2 2011 Intangibles 2005 2003 2014 2013 0 2006 2012 -.2 2007 2004 2015 -.4 -1 0 1 fWAR- 2 3 Intangibles are the residuals from pcWAR regressed on fWAR-, league and team experience, age-position profile, multiple positions played, and manager and team indicators. Figure 8: David Ross’ Intangibles Profile It will be interesting to see once the data are fully available how much of the Chicago Cubs league leading 103 wins in 2016 can be attributed to the late-career resurgence David Ross experienced. His subsequent retirement also poses a challenge for the Cubs if this was indeed the case. For instance, on November 30, 2016, the Cubs signed a center fielder, Jon Jay. In discussing the signing, the general manager of the Cubs said the following: 14 2017 Research Papers Competition Presented by: From a makeup and leadership standpoint, he’s got an off-the-charts reputation... We knew that losing David Ross would be a big void for us, and bringing in a guy like Jon would be important for us. He can come in and complement the good group of young leaders we already have... We didn’t feel like there were that many guys who could come into a team that just won a World Series and be able to fit that seamlessly and be able to help lead this team. And I think he can, given his reputation and a lot of comments we’ve gotten from his now-teammates indicate his reputation precedes him. Jed Hoyer, Chicago Cubs GM (Gonzalez, 2016) Figure 9 plots the ξ and fWAR − values for all of Jon Jay’s seasons played through 2015. Interestingly, his intangibles profile does not suggest that he has been an above-average team chemistry player in his time in MLB, with the exception of the 2015 season where we would characterize him as a “Diamond-in-the-Rough” based on his conditional age-position profile. Perhaps the Cubs are ahead of the curve in recognizing 2015 as a turning point for John Jay, but the balance of his career so far would suggest otherwise. Furthermore, we are unlikely to gain much additional information from his 2016 season given that he was injured for most of it. Thus, the 2017 season may serve as the proving ground for the Cubs’ faith in his intangible qualities. 2015 0 2012 2014 2010 Intangibles -.2 2011 -.4 -.6 2013 -.8 0 1 2 fWAR- 3 4 Intangibles are the residuals from pcWAR regressed on fWAR-, league and team experience, age-position profile, multiple positions played, and manager and team indicators. 5. Conclusion Figure 9: Jon Jay’s Intangibles Profile In this paper, we outlined a methodology for quantifying how a player may influence his teams’ performance outside of his direct contribution measured by advanced individual metrics like winsabove-replacement. We introduced in the process fWAR − , fWAR + , tcWAR, and pcWAR as new advanced metrics that quantify the indirect effects of players on their teammates and team performance while providing an intuitive analog to FanGraph’s well-documented fWAR metric. With these new metrics, we then outlined the importance of accounting for player interactions in 15 2017 Research Papers Competition Presented by: explaining team performance differentials unexplained by fWAR, and identified MLB teams that have effectively utilized these effects in their roster construction. Our efforts were motivated by a “search for David Ross,” a back-up catcher known more for the positive impact he has on his teammates than for his own performance. We showed that certain types of players are more likely than others to serve in this role (e.g. those with high fWAR − values, that play multiple positions, and are older), and that designated hitters, relievers, first basemen, and catchers tend to contribute positively to team chemistry at an earlier age on average than other players. We were then able to rank players on the basis of how they performed relative to their conditional age-position team chemistry profiles, or intangibles. Doing so, David Ross’ intangibles profile was shown to align with his reputation. It should also be noted that the team chemistry effects that we find for individual players are not trivial. For instance, with a team win valued at roughly $6 million in MLB, a player with an fWAR − value of 0 and a pcWAR value of as low as 0.1 would still be worth paying the minimum salary. Considering that for some of the best players we estimate pcWAR values of upwards of 4 wins, the value of team chemistry to an MLB team can be just as high as what fWAR would currently assign to a typical borderline “Star” player. In future work, we plan to verify whether or not using alternative measures of wins-abovereplacement, like that produced by Baseball Reference, lead to similar results. In addition, we plan a richer exploration of the strength of the interconnections between teammates. For example, the chemistry effect of catchers might be stronger among the pitchers they catch for or that middle infielders might have stronger interactions than other pairs of position players. It would also be natural to imagine that organizational culture has its own set of dynamics as well. One way to capture this would be to include a team’s field and front office staff in our model. Furthermore, most of the analysis here leveraged the playing time of individual players to explain player interactions and their impact on team performance differences. As a consequence, what is still left to understand is how to estimate the effect of players who have positive/negative spill-overs to their teammates through their off-the-field interactions. 16 2017 Research Papers Competition Presented by: References [1] Carleton, R. A. (2013). Is Brandon Inge worth 10 wins behind closed doors? http://www.baseballprospectus.com/article.php?articleid=19944. [2] Conley, T. G. (2008). Spatial Econometrics. The New Palgrave Dictrionary of Economics. Palgrave Macmillan, second edition. [3] Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Stastical Society, 39(1):1–38. [4] Demsar, U., P. Harris, C. Brunsdon, A. S. Fortheringham, and S. McLoone (2012). Principal Component Analysis on Spatial Data: An Overview. Annals of the Association of American Geographers. [5] FanGraphs (2016a). What is WAR? http://www.fangraphs.com/library/misc/war/. [6] FanGraphs (2016b). Positional Adjustment. http://www.fangraphs.com/library/misc/ war/positional-adjustment/. [7] Gonzalez, M. (November 30 2016). Cubs newcomer Jon Jay targeted to fill roles of David Ross, Dexter Fowler. Chicago Tribune. http://www.chicagotribune.com/sports/baseball/cubs/. [8] Jackson, M. O. (2008). Social and Economic Networks. Princeton University Press. [9] Keller, J. J. (2014a). In defense of WAR: My response to Jeff Passan. http://fansided.com/2014/09/11/defense-war-response-jeff-passan/. [10] Keller, J. J. (2014b). MLB: An update on the correlation between fWAR and wins. http://statliners.com/2014/11/21/mlb-update-correlation-fwar-wins/. [11] Kelly, D (2016). Measuring team chemistry in MLB. http://www.slideshare.net/DavidKelly75/measuring-team-chemistry-in-mlb. [12] Levine, B. (2015). Measuring team chemistry with social science theory. http://www.fangraphs.com/community/measuring-team-chemistry-with-social-science-theory/. [13] Passan, J. (2014). Why WAR doesn’t always add up. http://sports.yahoo.com/news/10degrees–why-war-doesn-t-always-add-up-030133203.html. [14] Phillips, J. (2014). Chemistry 162. http://insider.espn.com/mlb/story/_/id/10628418/mlbdivision-previews-based-formula-clubhouse-chemistry-espn-magazine. [15] Reis, R. and M. W. Watson (2010). Relative goods’ prices, pure inflation, and the Phillips correlation. American Economic Journal: Macroeconomics, 2(3):128–157. [16] Shumway, R. H. and D. S. Stoffer (1982). An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis, 3(4):253–264, 1982. [17] SyncStrength (2016). Measuring team chemistry using player biology. http://www.syncstrength.com/team_chemistry/. [18] Watson, M. W. and R. F. Engle (1983). Alternative algorithms for the estimation of dynamic factor, MIMIC and varying coefficient regression models. Journal of Econometrics, 23:385–400. 17 2017 Research Papers Competition Presented by: Appendix 6.1 Data Our data comprise 24,668 player-season observations over the 1998-2015 period. Nearly all players who participated in an MLB game during the 1998-2015 seasons appear in our analysis. The only exceptions are players who appeared in a game but failed to record an at-bat or an out, which excludes 21 observations from our sample. fWAR data come from the online database at fangraphs.com, while all additional player, team, and performance information come from the databases maintained by Sean Lahman at seanlahman.com. While the Lahman database allows us to observe performance data by team for players that change teams within a season, FanGraphs only publishes fWAR at the season level of observation. In these cases, we divide a player’s season fWAR proportionally by his appearances for his respective teams, following the appearance weighting described below. Thus, our dataset includes multiple observations within seasons for such players corresponding to each team on which they appear. 6.1.1 Player Productivity Residuals In order to construct player productivity residuals, we use the following weights to define a player’s expected contribution to his team’s wins, Ŵint , based on his position (αp) and his share of his team’s players’ appearances (gip). Wˆint = α p g ipWint 0.57 if p is a position player αp = if p is a pitcher 0.43 ABi + pi * DOutsi if p is a position player Kp ( AB + p * DOuts ) k i k ∑ k ≠i g ip = pi * POutsi if p is a pitcher. Kp pi * POuts k ∑ k ≠i FanGraphs constructs fWAR such that players contribute 1,000 WAR per 2,430 games league-wide (162 games for 30 teams). The terms 0.57 and 0.43 correspond to the proportion of league-wide WAR they apportion to position players and pitchers, respectively. This split is based on the assumption that because positional players appear on both sides of the ball, their contribution should be weighted somewhat higher (FanGraphs 2016b). To generate appearance weights, we use the sum of at-bats (AB) and defensive outs (DOuts) for position players in order to capture the contributions of different types of players, such as pinch-hitters and defensive substitutions. For pitchers, outs recorded (POuts) proves to be the most precise measure for capturing a variety of pitching contributions (middle relievers, one-out guys, etc.). We then differentially weight DOuts andPouts according to the positional run adjustments and replacement level win percentages for 18 2017 Research Papers Competition Presented by: starting and relief pitchers FanGraphs uses to construct fWAR, where we normalize pi to sum to 1 across positions and pitcher types, separately. These weights are presented in Table 1. 6.1.2 Regression Covariates Table 1: Position Weights The regression analysis presented in Section 4.1 uses several covariates from the dataset that we construct from FanGraphs and the Lahman database. Our position indicators correspond to the position that the Lahman database indicates as the primary position for each player. We include an additional indicator variable for whether the player appeared in multiple positions over his seasonteam tenure. Age is simply defined as the difference between the season year and the player’s birth year. Team and handedness indicators are pulled directly from the Lahman database, while we generate running totals for a players’ years in MLB and years with their current team to control for experience and team tenure. Finally, manager indicators correspond to each team’s manager on opening day, thus ignoring any managerial changes within seasons. 6.2 A Spatial Factor Model Here, we describe the mechanics of our spatial factor model and its estimation. In matrix form, the model can be written as Y = WFΛ + Wε (1) where Y is an ST × N matrix of outcomes, W is an ST × ST matrix of spatiotemporal weights, F is an ST × K matrix of common factors, Λ is an K × N matrix of factor loadings, and ε is an ST × N matrix of idiosyncratic determinants of Y . 6.2.1 The Reduced Form of a Spatial Autoregression Equation 1 can be viewed as the reduced form of a spatial autoregression, or SAR. To see this, consider the following representation of a SAR Y = ρAY + υ 19 (2) 2017 Research Papers Competition Presented by: where Y is a ST × N matrix of outcomes, A is a ST × ST adjacency matrix, ρ is a scalar parameter, and υ is an ST × ST matrix of residuals. Re-arranging the elements of equation 2, it can be rewritten Y = ( I − ρA) −1υ . Defining W ≡ ( I − ρA) −1 and assuming the approximate common factor structure υ = FΛ + ε , equation 2 is shown to be equivalent to equation 1. 6.2.2 Estimation Estimation of equation 1 proceeds with spatial principal components analysis, or SPCA, given a number of common factors and appropriate scale and sign normalizations. For the latter, a choice can be made to scale either the factor loadings or factors such that Λ′Λ = I or F ′W ′WF = I , respectively; and the signs of the factors set by restricting the columns of Λ to sum to zero. In addition, we set ρ = −1 and restrict the row sums of the adjacency matrix A to be equal to 1 . Combined, these normalizations satisfy the sufficient condition for W to exist that ( I − ρA) be strictly diagonally dominant, i.e. 1 − ρAii ≥ ∑ j ≠i − ρAij . Factor loading restrictions are handled by the expectation-maximization (EM) algorithm developed in Dempster, Laird, and Rubin (1977), Shumway and Stoffer (1982), and Watson and Engle (1983) extended to include unit loading restrictions by Reiss and Watson (2010). To get a sense of how the algorithm operates, consider the following: If the factors were known, then it would be possible to consistently estimate the factor loadings by a weighted least squares (WLS) regression of the form Λˆ = ( F ′W ′WF ) −1 ( F ′W ′Y ). Similarly, if the factor loadings were known, the factors are consistently estimated by Fˆ = (W −1YΛ′)(Λ′Λ ) −1. Given an unrestricted initial estimate of Λ or F, and depending on the choice of scale normalization, the EM algorithm iterates between these two WLS regressions until the sum of squared errors for equation 1 is minimized, imposing the factor loading restrictions at each iteration. While the approximate factor structure we assume here is necessary for the EM algorithm to run, we can still use it to obtain the exact factor structure of our model by setting a convergence criterion which brings the sum of squared errors arbitrarily close to zero for a given number of common factors. This is achieved quite easily with our two factor model using a criterion which stops the algorithm when successive differences in the sum of squared errors are less than 1e-6. 20 2017 Research Papers Competition Presented by:
© Copyright 2024 Paperzz