In Search of David Ross - MIT Sloan Sports Analytics Conference

In Search of David Ross
Scott A. Brave, R. Andrew Butters, and Kevin Roberts
1. Introduction
Baseball
1636
Chemistry, intangibles, and a whole that is greater than the sum of its parts: These are the
euphemisms that often get thrown around in locker rooms and the sports media in an effort to
rationalize how a team made up of seemingly inferior players manages to outperform another that
on paper looks unbeatable. While these David vs. Goliath analogies are plentiful, little consensus
exists on the proper way to attribute a team’s performance to its chemistry. Here, we set out on a
journey to accomplish just that. While we are certainly not the first to go in search of this “holy
grail” of sports analytics, we take a novel approach, drawing on spatial and network statistics to
offer a new lens for viewing what it means for a team or player to exhibit good chemistry. 1
Major League Baseball (MLB) represents an intriguing opportunity for such an analysis given the
level of sophistication that has been developed in measuring the impact of individual performances
on team outcomes. Furthermore, as fans of the 2016 World Champion Chicago Cubs, a personal
motivation for this choice exists as well: a “search for David Ross.” David Ross is the epitome of
where advanced metrics and player intangibles are at odds. As a back-up catcher, David Ross’
individual performances define him as nothing more than a serviceable role player; but as a
teammate, David Ross is routinely characterized as someone who makes everyone around him
better. Our aim is to quantify the “David Ross Effect,” or the indirect impact that an individual
player can have on team wins through making their teammates better.
We begin our analysis by using FanGraphs’ wins-above-replacement metric, fWAR, to construct
MLB player productivity residuals for the 1998-2015 seasons. These residuals reflect the difference
between the expected and actual number of team wins that can be attributed to each player in a
given season. When aggregated across teammates, by construction they measure the difference
between a team’s actual win count and what it would be expected to be based solely on individual
player performances. This feature allows us to analyze the element of team performance that could
instead be due to interactions between the players on a team. Our analysis suggests that the scope
for this explanation of the win-loss ledger of MLB teams could be quite large, with a range of as
much as 40 wins, or roughly 20 percent of the variation in wins across teams.
To account for player interactions, we use a spatial factor model to decompose our individual
player productivity residuals into two separate unobserved components. The first component
identifies what we call character players, or those players who positively influence their teammates
regardless of the team that they play for; while the second component accounts for the role that a
team’s field and front office staff have on team performance to isolate what we call team players.
1
See for instance SyncStrength (2016), Kelly (2016), Levine (2015), Phillips (2014), and Carleton (2013).
1
2017 Research Papers Competition
Presented by:
This second component also makes it possible to capture a team’s historical ability to consistently
turn individual player talents into extraordinary team outcomes, allowing for a relative ranking of
MLB teams that can be used to measure front office performance on the dimension of team
chemistry, or what we refer to as organizational culture.
Our methodology has a natural extension to network statistics that then allows us to construct
refinements of fWAR that isolate a player’s own contribution to team wins irrespective of his
teammates, fWAR − , and his contribution adjusted for his effect on his teammates through our two
team chemistry factors, fWAR + . Using fWAR − to adjust for player interactions, we demonstrate that
roughly 50% of the discrepancy between the sum of a team’s players’ fWAR and team wins can
indeed be explained by our definition of team chemistry. Similarly, using fWAR + , we show that
fWAR tends to overvalue the relative contribution of low impact players and undervalue the relative
contributions of high impact players to their team’s performance.
We refer to the total network effect of a team’s players on each other, obtained by summing the
differences between fWAR − and fWAR, as tcWAR, or team chemistry WAR. With this new metric, we
document that high winning percentage teams do in fact tend to exhibit good team chemistry. That
said, not all good teams exhibit good chemistry, and not all bad teams exhibit bad chemistry.
Relating tcWAR to a team’s wins-above-average, we show that there exists considerable variation
on this dimension, and identify teams for which our team chemistry factors played either a
surprisingly large positive or negative role in it’s performance.
A player’s net impact on his team’s performance through his teammates, i.e. fWAR + − fWAR , is
then what we refer to as pcWAR, or player chemistry WAR. By constructing age-position profiles for
pcWAR conditional on player and team characteristics, we show that the conventional wisdom that
good players and older players make for good teammates has support empirically. However, the
latter tends to vary by position, with designated hitters, relief pitchers, first basemen, and catchers
making positive contributions to team chemistry at younger ages on average than other players.
Players who play more than one position also tend to have higher pcWAR values on average.
Using our conditional age-position profiles, we then classify players based on their “intangibles,”
defined by whether or not they exceed or fall short of their conditional age-position profile, and
rank them on this dimension and their talent level. It is here where our journey comes full circle.
Looking at David Ross’ intangibles reveals a player who not only consistently outperformed his
conditional age-position profile for much of his career even at low levels of fWAR − , but did so at a
position that tends to support team chemistry more generally for older players.
2. Measuring Team Chemistry
The first step in our analysis of team chemistry is to construct individual player productivity
residuals capturing the difference between the expected number of team wins arising from a
player’s performance relative to how many games that player’s team actually won. 2 To measure a
player’s individual performance, we make use of FanGraphs’ wins-above-replacement metric,
fWAR, an advanced sabermetric that captures how many total wins a player contributes to his team
2
Details on the data and their sources can be found in the Appendix.
2
2017 Research Papers Competition
Presented by:
above a replacement level player at the same position (FanGraphs, 2016a). With these measures in
hand, we then move to modeling the interactions between teammates and the indirect effect they
may have on team performance.
2.1. fWAR and Team Wins
The strength of fWAR is its convenience. It compresses all of the things that an individual baseball
player can do to help his team win, both at the plate and in the field, into one number. fWAR is not
perfect, however, and many have disagreed as to its value in judging the relative performance of
players (e.g. Passan (2014), Keller (2014a)). Another shortcoming of fWAR, and the focus of our
analysis, is the lack of a role for interactions among players to impact team performance. We show
in this paper that this tends to manifest itself in the fact that simply summing the fWAR values for a
team across its players does not perfectly replicate its wins above those expected of a team
composed entirely of replacement-level players.
To get a sense of exactly how important player interactions may be to team performance, we
regressed the number of wins for each team on the sum total of its players’ fWAR. Specifically, we
ran a linear regression of the form 3
Wnt = α + βfWARnt + ε nt ,
where Wnt is the number of wins of team n in season t and fWARnt is the sum total of FanGraphs’
wins-above-replacement statistics for all players on team n in season t. The εnt in this regression are
what we call team productivity residuals. A team with a large and positive εnt was a team who outperformed, or won more games than what could be attributed to the sum of its individual player
performances. Alternatively, a team with a large negative residual would be a team who despite
having a high number of strong individual performances (as measured by fWAR) under-performed
as it pertains to their number of team wins.
The results from this regression using MLB team data from the 1998-2015 seasons provide several
insights. First, it is clear that the estimate of β ends up very close to 1. 4 This is intuitive given how
fWAR is constructed (FanGraphs, 2016a), but also allows us to confidently use the idea that
increasing a team’s fWAR should have a one-to-one relationship with their number of wins.
Furthermore, the estimate for α comes out to be near 48. This estimate also has a natural
interpretation of being the number of wins one would expect a team full of replacement level
players to accrue. At 48, clearly a team with only replacement level players is far from an average,
or 0.500 winning percentage, baseball team. With that being said, it is consistent with the
construction of fWAR; and, thus, serves as a benchmark for us to evaluate teams.
Re-arranging the regression equation and substituting in our estimates of α and β, team
productivity residuals are then given by,
εˆnt = Wnt − 48 − fWARnt .
3
4
Keller (2014a, b) conducted a similar analysis in his defense of fWAR.
In fact, the null hypothesis of β = 1 cannot be rejected at any standard confidence levels.
3
2017 Research Papers Competition
Presented by:
The εˆnt are our estimate of the element of team performance that is unexplained by the sum of its
players’ individual performances, and the variation that we may potentially attribute to a team’s
chemistry. Based on the R2 value of the previous regression, this amounts to about 20% of the
variation in team wins in our sample.
Figure 1 further demonstrates just how important this element is by plotting a kernel density
function of εˆnt . With a standard deviation of 5 wins and a range equal to approximately 40 wins, it
is evident that a considerable portion of the variability in team performance cannot be explained by
the sum of individual player performances alone. Despite the immense progress sabermetricians
have made in the sport of baseball, there exists significant room for the role of interactions among
players to factor into the variation in team performances.
.15
Density
.1
.05
0
-20
-10
0
Wins Above Team WAR
fWAR
10
20
fWAR-
kernel = gaussian, bandwidth = 1.2900
Figure 1: Kernel Densities of Team Productivity Residuals
2.2. Team Wins and Player Interactions
Given the seemingly large role empirically that team chemistry may have on wins and losses, we
next focus on decomposing these team residuals into player-specific productivity residuals. To
decompose team productivity residuals into contributions from individual players, we assume
εˆnt = ∑εˆint
i
= ∑Wˆint − ∑ fWARint ,
i
4
i
2017 Research Papers Competition
Presented by:
where Ŵint is a measure of the expected contribution of player i to team wins taking into account
his position and amount of playing time such that
Wˆ = W − 48 . Player position weights are
∑
i
int
nt
defined following FanGraphs’ methodology (FanGraphs, 2016a) and appearance weights are
derived from at-bats and defensive outs for position players and outs recorded for pitchers after
adjusting for the relative importance of defensive positions and starting versus relief pitchers. 5
When aggregated across players on a given team in a given season, our player productivity
residuals by construction measure the difference between a team’s actual win count and what it
would be expected to be based on the sum total of individual player performances. We then model
the interactions between players as a spatial autoregression (SAR),
εˆint = ρAεˆint + υint ,
where A is an adjacency matrix identifying teammates in a given season. 6 Typically, an adjacency
matrix is a symmetric matrix with 0’s on the diagonal and 1’s off the diagonal “connecting”
teammates. However, in order to capture potential dynamics in teammate relationships, we replace
the 1’s with the number of MLB teams that teammates have played together on through the end of
each season. This allows for added weight to repeated “connections” in the SAR in explaining
player performance interactions and takes into account the panel data nature of our dataset.
Furthermore, we assume that a common factor structure exists for the SAR residuals, νint, such that
player productivity residuals are driven by a player-season specific (fit) component, as well as a
team specific component (λn). The team specific component is constant over time and primarily
reflects an organization’s tendency to over- or under-perform relative to the collection of its
players’ fWARs. The player-season specific component instead traces out a player’s career arc,
potentially across several teams, and reflects whether that player finds himself among over- or
under- performing teammates in each season.
Solving for εˆint then yields our spatial factor model with the spatial weight matrix W = ( I − ρA) −1 .
εˆint = ( I − ρA) −1 ( f it λn )
= WFΛ
This model can be consistently estimated using spatial principal component analysis (SPCA) to
extract the latent player-season and team specific components by imposing scale normalizations on
either F or the factor loadings λ as well as ρ and A. 7 In the next section, we provide motivation for
what these factors may capture.
5
Further details on the construction of Ŵint can be found in the Appendix.
For more information on spatial autoregressions, see Conley (2008).
Section (6.2.2) in the appendix provides a more detailed discussion of the required normalizations. For more information on spatial
principal components analysis, see Demsar et al. (2012).
6
7
5
2017 Research Papers Competition
Presented by:
3. The Network Effects of Team Chemistry
Our spatial factor model fits the definition of a “network.” The players on a team in a given season
make up the “nodes” of the network, with the strength of the connections between teammates
summarized by our factors and their loadings. In other words, our model is simply a statistical
framework for measuring the importance of correlations across player performances. In this
section, we refine fWAR in order to take into account the correlations in the performance of
teammates; and, at the same time, construct new measures of team and player chemistry. 8
3.1. Sources of Team Chemistry
The primary difficulty that others have faced when trying to measure team chemistry has been
their focus on identifying a priori the factors that drive the correlations between the performances
of teammates. Our approach is different in that we treat these factors as latent variables and
identify them off the correlations themselves. We view this as being consistent with the
conventional wisdom that team chemistry is anything that makes teams better than they otherwise
would be as individuals. Seen in this light, our methodology for measuring team chemistry boils
down to nothing more than a decomposition of the spatial correlation matrix of teammates’
productivity residuals into an exact linear combination of latent factors.
To see this, consider that we can decompose our player productivity residuals into two parts: 1) a
part that is unique to each player that we attribute to measurement error in team productivity
residuals, and 2) a part that can be explained by each player’s interactions with his teammates that
we attribute to team chemistry.
( f itλn) +
εˆint = 
w
ii 
"Own Contribution"
∑w ( f
λ
)
.
ij
jt n
j ≠i

"Teammate Contribution"
We associate positive spill-overs with “good team chemistry” and negative spill-overs with “bad
team chemistry.” This is because, given that wij < 0 as constructed, a player will exhibit positive
spill-overs to his teammates’ productivity residuals as long as fitλn < 0. Conversely, a player with fitλn
> 0 will necessarily exhibit negative spill-overs.
We do not take a stance on what drives these spill-overs between teammates; and, in all likelihood,
our latent factors probably capture a combination of many of the determinants of team chemistry
that others have already explored. However, by not restricting them ex-ante, they likely also
embody elements of team chemistry that have not previously been able to be measured. The extent
to which we provide context for our factors is thus to appeal to the work of other social scientists
who have singled out certain psychological traits, such as “character” and being a “team player,” as
being attributes of individuals in groups that excel in working together. By allowing for two
common factors and restricting the factor loadings across them such that F = [ch,tp] and Λ = [l,λ],
where l is a unit vector across teams, we can restrict our factor model to embody similar features.
εˆint = wii (chit ln + tpit λn ) + ∑wij (ch jt ln + tp jt λn )
j ≠i
8
For a comprehensive treatment of the network literature, see Jackson (2008) and the citations within.
6
2017 Research Papers Competition
Presented by:
We think of players with negative ch values as being good character players, as they demonstrate
positive spill-overs to their teammates which do not depend on the identity of their team. In
contrast, we label players with negative tp values as being good team players, because their
contribution to their teammates through tp depends on the team for which they play via λ.
Teams with large λ are
STL
SF
then said to exhibit good BOS
ARI
PHI
organizational culture, as
SEA
HOU
they
either
reinforce
ATL
CLE
positive spill-overs (tp < 0 & COL
LAD
LAA
λ > 0) or minimize negative
TEX
WAS
spill-overs (tp > 0 & λ < 0).
NYY
TOR
Figure 2 plots estimated CHW
DET
TB
values of λ for all 30 MLB
KC
CHC
teams. Certain organizations
PIT
BAL
MIL
stand
out
along
this NYM
MIN
dimension. For instance, the
SD
MIA
CIN
St. Louis Cardinals, San OAK
Francisco Giants, and Boston
-.3
-.2
-.1
0
.1
.2
Red Sox demonstrate very
Organizational Culture
large negative values of λ,
Negative values denote teams that reinforce positive spill-overs from good chemistry players.
Positive values denote teams that minimize negative spillovers from bad chemistry players.
suggesting that historically
these
teams
have
Figure 2: MLB Team Chemistry Factor Loadings
constructed their rosters in
such a way as to reinforce the positive spill-overs from good chemistry players. In contrast, teams
like the Oakland Athletics, Cincinnati Reds, and Miami Marlins appear to have instead minimized
the negative spill-overs from bad chemistry players.
3.2. Adjusting fWAR for Team Chemistry
If fWAR measurements are indeed correlated across teammates, then the regression underlying our
team productivity residuals is mis-measured. Namely, fWAR may be under- or over-counting the
importance of individual player contributions to team wins by ignoring the interactions between
teammates. To adjust for this possible source of bias, we construct an alternative measure called
fWAR − which subtracts from the fWAR of each player the portion of his productivity residual that
can be explained by his teammates’s residuals. In network statistics, this is often referred to as the
“in-degree” for a node.
−
fWARint
= fWARint − ∑wij f jt λn
j ≠i



``In − degree′′
Similarly, we can refine fWAR as a measure of player performance by taking into account how much
a player affects his teammates’ performance. Here, we add to fWAR − the contribution of each
player to all of his teammates’ productivity residuals, or what is referred to in network statistics as
the “out-degree” of a node. We call this measure fWAR + .
7
2017 Research Papers Competition
Presented by:
+
−
fWARint
= fWARint
+ ∑w ji fit λn
i≠ j



"Out − degree"
Figure 1 demonstrates the relative importance of adjusting fWAR for correlated teammate
performances by also plotting the kernel density of εˆnt constructed from fWAR − . The range of
unexplained team performance shrinks by roughly 50%, with the majority of the reduction coming
from under-performing teams. This would seem to suggest that “poor clubhouse chemistry” may
indeed explain why teams perform poorly more so perhaps than “superior clubhouse chemistry”
explains why teams perform well. We can get a sense of the impact that this adjustment has on the
the productivity residual for any individual team by examining the aggregation of their differences
between fWAR − and fWAR over players in each season. This is often referred to as the network’s
“total-degree.” We call it “team chemistry wins-above-replacement,” or tcWAR.
tcWARint = ∑∑wij f jt λn
i j ≠i



"Total − degree"
Figure 3 scatters a team’s wins in each season above an average team (i.e. roughly 81 wins) against
its tcWAR. Clearly, the old adage that good teams have good chemistry is affirmed in this figure,
though the positive correlation is not as one-for-one as is sometimes argued. This can be seen in the
considerable distance for some teams from the 45 degree line in the figure.
40
2001 Mariners
1998 Yankees
Wins Above Average
20
2004 Yankees
2008 Angels
1998 Padres
2011 Tigers
2011 Yankees
2006 Athletics
2007 D-backs
2012 Orioles
2011 Red Sox
2009 Rays
0
2003 Royals
1999 Orioles
1998 Orioles
1998 Mariners
2008 Braves
1999 Rockies
2002 Cubs
-20
1998 Tigers
2015 Reds 1999 Royals
2008 Padres
2003 Tigers
-40
-20
-10
0
tcWAR
10
20
Solid red line is a 45 degree line.
Figure 3: Team Chemistry and Wins-above-Average
8
2017 Research Papers Competition
Presented by:
The figure also marks some of the best seasons for teams on both ends of the chemistry spectrum
as well as a few other outlying values. Interestingly, record-high win teams, like the 1998 Yankees
and 2001 Mariners, and loss teams, like the 2003 Tigers, do not come across as particularly
superior or inferior chemistry teams according to our metric. In fact, the figure makes clear that not
all good teams display good chemistry and not all bad teams display bad chemistry on the basis of
our metric.
Figure 4 scatters fWAR − versus fWAR. Interestingly, fWAR − and fWAR on an individual playerseason basis are very highly correlated, with the plotted points clustered fairly closely around the
45 degree line. Thus, it is the aggregation of somewhat small differences at the player level that
leads to the drastic reduction in the unexplained variance of team performance in figure 1. Figure 4
also contains a scatter plot of fWAR + vs. fWAR. Here, the differences are much more pronounced.
In particular, fWAR overestimates the relative performance of low impact ( fWAR ≤ 1 ) and
underestimates the relative performance of high impact ( fWAR ≥ 4 ) players.
fWAR+ vs. fWAR
fWAR- vs. fWAR
20
15
15
10
fWAR+
fWAR-
10
5
5
0
0
-5
-5
-5
0
5
fWAR
10
15
-5
0
5
fWAR
10
15
Solid red lines are 45 degree lines. Vertical lines denote thresholds for Scrub/Role
(fWAR=1) and Good/Star (fWAR=4) players.
Figure 4: fWAR − and fWAR + vs. fWAR
The difference between fWAR + and fWAR can therefore be used to evaluate players on the basis of
their contribution to team performance through their impact on their teammates. In network
statistics, this is what is called the “net-degree” for each node.
pcWARint = ∑w ji f it λn − ∑wij f jt λn
i≠ j
j ≠i


"Net −degree"
9
2017 Research Papers Competition
Presented by:
In keeping with our terminology above, we instead refer to it as “player chemistry wins-abovereplacement,” or pcWAR. The conventional wisdom that good players make their teammates better
is confirmed by our analysis of pcWAR, as figure 5 demonstrates a strong positive correlation exists
between pcWAR and fWAR − for all player-season combinations in our sample. 9 In the next section,
we take a closer look at the characteristics of good team chemistry players.
6
pcWAR
4
2
0
-2
-5
10
5
fWAR-
0
15
Vertical lines denote fWAR thresholds for Scrub/Role (fWAR=1) and Good/Star (fWAR=4) players.
Figure 5: Player Chemistry and Wins-above-Replacement
4. The Intangibles of Team Chemistry
In this section, we construct age-position profiles of pcWAR controlling for fWAR − and various other
player and team characteristics in order to examine a player’s team chemistry “intangibles.” We
then use these conditional age-position profiles to classify players along this dimension.
4.1. Age-Position Profiles
To construct conditional age-position profiles of players, we run the following regression including
up to quartic interaction terms in age,
pcWARit = ∑γ p pos pit + ∑θ p ( posit * ageit ) + ∑ψ p ( posit * ageit2 ) +
p
p
∑τ
p
∑δ
k
p
k
p
( posit * age ) + ∑ω p ( pos pit * ageit4 ) +
3
it
p
X kit + ∑φh Z hit + ξ it ,
h
Consistent with figure 5, a very similar correlation exists between pcWAR and fWAR as well. However, we choose to display our results
in this way such that if one were to sum across the x-axis and y-axis of the graph fWAR + would be obtained.
9
10
2017 Research Papers Competition
Presented by:
where pos is an indicator variable for a players’ primary field position including the designated
hitter, age is a player’s age, X is a vector of player characteristics including fWAR − and controls for
MLB experience and team tenure, batting and throwing hands, and multiple positions played and Z
is a vector of team characteristics including both team and manager indicator variables.
By conditioning these profiles on so many observable dimensions, our goal is to isolate the player
intangibles of team chemistry that fall beyond alternative explanations. In other words, we want to
be able to measure the individual contributions to team wins that do not depend on a player’s team
or manager as well as his talent level, experience, etc. Furthermore, the estimated coefficients of the
above regression demonstrate that many of these factors are indeed important elements of team
chemistry. For instance, one additional win-above-replacement, adjusted for a player’s interactions
with his teammates, or playing multiple positions increases his pcWAR by a statistically significant
0.33 and 0.07 wins, respectively.
Average Marginal Effects with 95% CIs
First Basemen
Second Basemen
Catcher
Third Basemen
.4
.2
0
-.2
-.4
21
26
31
36
41
21
26
Center Fielder
31
36
41
21
26
DH
31
36
41
21
Left Fielder
26
31
36
41
Right Fielder
pcWAR
.4
.2
0
-.2
-.4
21
26
31
36
41
21
Relief Pitcher
26
31
36
41
21
26
Starting Pitcher
31
36
41
36
41
21
26
31
36
41
Shortstop
.4
.2
0
-.2
-.4
21
26
31
36
41
21
26
31
36
41
21
26
31
Age
Conditional on fWAR-, league and team experience, batting and throwing hand, multiple positions played,
and manager and team indicators
Figure 6: Age-Position Team Chemistry Profiles
Figure 6 plots our conditional age-position pcWAR profiles with 95% confidence intervals. These
plots demonstrate the conditional mean pcWAR for each position by age, such that transitions from
negative to positive values over time denote the average age when the switch occurs from being a
“bad intangibles” to a “good intangibles” player. The conventional wisdom that older players make
for better teammates is certainly consistent with these profiles, as they tend to slope upward with
age across all positions even after controlling for team and MLB experience. However, some
additional interesting patterns also emerge from this analysis. For instance, designated hitters,
11
2017 Research Papers Competition
Presented by:
relief pitchers, first basemen, and catchers achieve this transition earlier than others on average,
most of whom do not reach this point until their late-thirties or early-forties. 10
4.2. Player Rankings
We use the residuals from our conditional age-position profile regressions to construct player
rankings for intangibles. Positive values for ξ it capture players whose intangible contributions to
team chemistry exceed their conditional age-position profile, whereas negative values correspond
to players whose intangible contributions fall short of their profile. We can then jointly classify
these players along the scale established by FanGraphs for fWAR applied to our fWAR − statistic to
refine them into categories that reflect both their intangibles and talent level.
Figure 7 contains a scatter plot of ξ it versus fWARit− for all player-season combinations, with the
color dots denoting the six types of players that we classify. Summing across the x-axis and y-axis of
the graph produces an estimate of our fWARit+ metric that controls for MLB experience and team
tenure, batting and throwing hands, and multiple positions played as well as team characteristics
including both team and manager indicator variables. As such, it can be viewed as the combined
value of the player to team performance stemming from his own performance and his intangibles.
2
Franchise Player
Glue Guy
Intangibles
1
Diamond-in-the-Rough
0
Prima Donna
-1
Clubhouse Cancer
Trade Bait
-2
-4
-2
0
2
4
6
8
10
12
14
fWARIntangibles are the residuals from pcWAR regressed on fWAR-, league and team experience,
age-position profile, batting and throwing hand, multiple positions played, and manager and team
indicators. Vertical lines denote thresholds for Scrub/Role (fWAR=1) and Good/Star (fWAR=4) players.
Figure 7: Intangibles and Wins-above-Replacement
A simple hypothetical helps to put our classification, or typology, of players into context. Imagine
two players with identical fWARit− (or, before this paper, fWARit ). Now, imagine one of the players
10 We want to caution anyone from taking the results from this regression as “causal” estimates of age on intangibles, as the estimated
coefficient is most likely also confounding a selection effect for older players. In other words, having good intangibles may make it more
likely for a player to remain in the game for longer.
12
2017 Research Papers Competition
Presented by:
had a positive intangibles measure, while the other one had a negative measure. Given that these
intangible qualities only manifest themselves as spill-overs to the peformance of teammates, before
having either player on your team both would seem equally qualified to sign. However, the player
with a positive intangibles measure, once joining your team, would likely generate positive spillovers and you would probably begin to value this player even more highly. Alternatively, the player
with the negative intangibles measure, once joining your team, would likely not generate as much
positive spill-overs and your view of the value of this player would not change much. Of course, the
extent to which this example holds true also depends on the level of these players individual
performances on your team. Therefore, it is on this joint dimension that we characterize players.
We begin with the two player types that have very high, or “Star” ( fWAR ≥ 4 ), quality individual
performances. Among these “Star” players, our first type, the Franchise Player, are those players
with good intangibles. These are the players who make their teammates better. The active player
with at least one year of service time that best embodies this type based on his average intangibles
and fWAR − scores is Joey Votto. Others with similar intangible scores are Giancarlo Stanton, and
Mike Trout. Contrast these players with our first type of bad intangibles player, the Prima Donna,
who makes a positive contribution to his team through his own performance, but has less of an
impact on his teammates than his stature on the team would normally dictate. Here, we find a very
small group of players, but one that includes (in descending order of intangible scores) Max
Scherzer, Adrian Beltre, Clayton Kershaw, Jason Heyward, and Buster Posey. These are all players
that an MLB franchise would be happy to build their team around based on their individual talent;
but seem less likely to have those talents cascade through to their teammates.
Franchise Player ≡ ξ it > 0,
Prima Donna ≡ ξ it < 0,
fWARit− ≥ 4
fWARit− ≥ 4
In the middle are those players that can be classified as “Role/Solid Starter/Good” players, i.e.
1 < fWAR < 4. These players under a conventional sabermetric approach would be sought after to
fill out your roster around your “Star” players. Under our typology, a Glue Guy fits this bill under
fWAR but also is a player with good intangibles. Not only do teams seek these players out for their
individual performance; but once they get to the team, they also tend to positively impact their
teammates. The active player best embodying this type is Kevin Keirmaier, closely followed by
Chris Sale and the recently deceased Jose Fernandez. On the other side, a Trade Bait player, as the
name would suggest, is one who has the sought after, or appealing, individual performance, but
who otherwise has less of a meaningful impact on his teammates. Here, we find a wide range of
players including, but, not limited to, a top three of Gregory Polanco, Elvis Andrus, and Evan Gattis.
Glue Guy ≡ ξ it > 0, 1 < fWARit− < 4
Trade Bait ≡ ξ it < 0, 1 < fWARit− < 4
We affectionately term our third type of good intangibles player the Diamond-in-the-Rough. This
player is exemplified by an fWAR ≤ 1 . An example case here would be the “Scrub” whose
contribution to the performance of others is much greater than his own. Most often these are
journeymen players, typically relievers, who do not stick around long with their teams despite their
impact on their teammates. However, a few household names like Rich Hill and Wellington Castillo
13
2017 Research Papers Competition
Presented by:
do appear on this list. Conversely, the Clubhouse Cancer contributes little to the team from his own
performance and tends to make his teammates worse off. Surprisingly, it is not all that difficult to
find fairly well-known examples of this type, for instance: Nick Castellanos, Jeremy Hellickson, Skip
Schumaker, Mitch Moreland, and James Loney. We suspect that this is because teams often favor
raw talent over intangibles, holding on to such players longer than they normally otherwise would
on the chance that their talent develops enough to justify their place on the team.
Diamond - in - the - Rough ≡ ξ it > 0,
Clubhouse Cancer ≡ ξ it < 0,
fWARit− ≤ 1
fWARit− ≤ 1
Returning to our original motivation, at this point we can also address where David Ross fits into
our typology. Figure 8 plots the ξ and fWAR − values for all of David Ross’ seasons played through
2015. More than one labeled instance of a season occurs whenever he was traded mid-season. The
vast majority of David Ross’ playing career would characterize him as a “Glue Guy” or “Diamond-inthe-Rough,” consistent with his reputation among his teammates. While he does not fall in the
upper echelon for either category, his contributions to team chemistry relative to his conditional
age-position profile are not trivial, ranging from a high of about +0.25 wins in mid-career to a low of
about -0.30 wins in 2015.
.4
2008
2008
2009
2010
2002
2005
.2
2011
Intangibles
2005
2003
2014
2013
0
2006
2012
-.2
2007
2004
2015
-.4
-1
0
1
fWAR-
2
3
Intangibles are the residuals from pcWAR regressed on fWAR-, league and team experience,
age-position profile, multiple positions played, and manager and team indicators.
Figure 8: David Ross’ Intangibles Profile
It will be interesting to see once the data are fully available how much of the Chicago Cubs league
leading 103 wins in 2016 can be attributed to the late-career resurgence David Ross experienced.
His subsequent retirement also poses a challenge for the Cubs if this was indeed the case. For
instance, on November 30, 2016, the Cubs signed a center fielder, Jon Jay. In discussing the signing,
the general manager of the Cubs said the following:
14
2017 Research Papers Competition
Presented by:
From a makeup and leadership standpoint, he’s got an off-the-charts reputation... We knew
that losing David Ross would be a big void for us, and bringing in a guy like Jon would be
important for us. He can come in and complement the good group of young leaders we
already have... We didn’t feel like there were that many guys who could come into a team
that just won a World Series and be able to fit that seamlessly and be able to help lead this
team. And I think he can, given his reputation and a lot of comments we’ve gotten from his
now-teammates indicate his reputation precedes him.
Jed Hoyer, Chicago Cubs GM (Gonzalez, 2016)
Figure 9 plots the ξ and fWAR − values for all of Jon Jay’s seasons played through 2015.
Interestingly, his intangibles profile does not suggest that he has been an above-average team
chemistry player in his time in MLB, with the exception of the 2015 season where we would
characterize him as a “Diamond-in-the-Rough” based on his conditional age-position profile.
Perhaps the Cubs are ahead of the curve in recognizing 2015 as a turning point for John Jay, but the
balance of his career so far would suggest otherwise. Furthermore, we are unlikely to gain much
additional information from his 2016 season given that he was injured for most of it. Thus, the 2017
season may serve as the proving ground for the Cubs’ faith in his intangible qualities.
2015
0
2012
2014
2010
Intangibles
-.2
2011
-.4
-.6
2013
-.8
0
1
2
fWAR-
3
4
Intangibles are the residuals from pcWAR regressed on fWAR-, league and team experience,
age-position profile, multiple positions played, and manager and team indicators.
5. Conclusion
Figure 9: Jon Jay’s Intangibles Profile
In this paper, we outlined a methodology for quantifying how a player may influence his teams’
performance outside of his direct contribution measured by advanced individual metrics like winsabove-replacement. We introduced in the process fWAR − , fWAR + , tcWAR, and pcWAR as new
advanced metrics that quantify the indirect effects of players on their teammates and team
performance while providing an intuitive analog to FanGraph’s well-documented fWAR metric.
With these new metrics, we then outlined the importance of accounting for player interactions in
15
2017 Research Papers Competition
Presented by:
explaining team performance differentials unexplained by fWAR, and identified MLB teams that
have effectively utilized these effects in their roster construction.
Our efforts were motivated by a “search for David Ross,” a back-up catcher known more for the
positive impact he has on his teammates than for his own performance. We showed that certain
types of players are more likely than others to serve in this role (e.g. those with high fWAR − values,
that play multiple positions, and are older), and that designated hitters, relievers, first basemen,
and catchers tend to contribute positively to team chemistry at an earlier age on average than other
players. We were then able to rank players on the basis of how they performed relative to their
conditional age-position team chemistry profiles, or intangibles. Doing so, David Ross’ intangibles
profile was shown to align with his reputation.
It should also be noted that the team chemistry effects that we find for individual players are not
trivial. For instance, with a team win valued at roughly $6 million in MLB, a player with an fWAR −
value of 0 and a pcWAR value of as low as 0.1 would still be worth paying the minimum salary.
Considering that for some of the best players we estimate pcWAR values of upwards of 4 wins, the
value of team chemistry to an MLB team can be just as high as what fWAR would currently assign to
a typical borderline “Star” player.
In future work, we plan to verify whether or not using alternative measures of wins-abovereplacement, like that produced by Baseball Reference, lead to similar results. In addition, we plan a
richer exploration of the strength of the interconnections between teammates. For example, the
chemistry effect of catchers might be stronger among the pitchers they catch for or that middle
infielders might have stronger interactions than other pairs of position players. It would also be
natural to imagine that organizational culture has its own set of dynamics as well. One way to
capture this would be to include a team’s field and front office staff in our model. Furthermore,
most of the analysis here leveraged the playing time of individual players to explain player
interactions and their impact on team performance differences. As a consequence, what is still left
to understand is how to estimate the effect of players who have positive/negative spill-overs to
their teammates through their off-the-field interactions.
16
2017 Research Papers Competition
Presented by:
References
[1]
Carleton, R. A. (2013).
Is Brandon Inge worth 10 wins behind closed doors?
http://www.baseballprospectus.com/article.php?articleid=19944.
[2] Conley, T. G. (2008). Spatial Econometrics. The New Palgrave Dictrionary of Economics.
Palgrave Macmillan, second edition.
[3] Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data
via the EM algorithm. Journal of Royal Stastical Society, 39(1):1–38.
[4] Demsar, U., P. Harris, C. Brunsdon, A. S. Fortheringham, and S. McLoone (2012). Principal
Component Analysis on Spatial Data: An Overview. Annals of the Association of American
Geographers.
[5] FanGraphs (2016a). What is WAR? http://www.fangraphs.com/library/misc/war/.
[6]
FanGraphs (2016b). Positional Adjustment. http://www.fangraphs.com/library/misc/
war/positional-adjustment/.
[7] Gonzalez, M. (November 30 2016). Cubs newcomer Jon Jay targeted to fill roles of David Ross,
Dexter Fowler. Chicago Tribune. http://www.chicagotribune.com/sports/baseball/cubs/.
[8] Jackson, M. O. (2008). Social and Economic Networks. Princeton University Press.
[9]
Keller, J. J. (2014a).
In defense of WAR: My response to Jeff Passan.
http://fansided.com/2014/09/11/defense-war-response-jeff-passan/.
[10] Keller, J. J. (2014b). MLB: An update on the correlation between fWAR and wins.
http://statliners.com/2014/11/21/mlb-update-correlation-fwar-wins/.
[11]
Kelly,
D
(2016).
Measuring
team
chemistry
in
MLB.
http://www.slideshare.net/DavidKelly75/measuring-team-chemistry-in-mlb.
[12]
Levine, B. (2015).
Measuring team chemistry with social science theory.
http://www.fangraphs.com/community/measuring-team-chemistry-with-social-science-theory/.
[13] Passan, J. (2014). Why WAR doesn’t always add up. http://sports.yahoo.com/news/10degrees–why-war-doesn-t-always-add-up-030133203.html.
[14] Phillips, J. (2014). Chemistry 162. http://insider.espn.com/mlb/story/_/id/10628418/mlbdivision-previews-based-formula-clubhouse-chemistry-espn-magazine.
[15] Reis, R. and M. W. Watson (2010). Relative goods’ prices, pure inflation, and the Phillips
correlation. American Economic Journal: Macroeconomics, 2(3):128–157.
[16] Shumway, R. H. and D. S. Stoffer (1982). An approach to time series smoothing and
forecasting using the em algorithm. Journal of Time Series Analysis, 3(4):253–264, 1982.
[17]
SyncStrength (2016).
Measuring team chemistry using player biology.
http://www.syncstrength.com/team_chemistry/.
[18] Watson, M. W. and R. F. Engle (1983). Alternative algorithms for the estimation of dynamic
factor, MIMIC and varying coefficient regression models. Journal of Econometrics, 23:385–400.
17
2017 Research Papers Competition
Presented by:
Appendix
6.1 Data
Our data comprise 24,668 player-season observations over the 1998-2015 period. Nearly all
players who participated in an MLB game during the 1998-2015 seasons appear in our analysis.
The only exceptions are players who appeared in a game but failed to record an at-bat or an out,
which excludes 21 observations from our sample. fWAR data come from the online database at
fangraphs.com, while all additional player, team, and performance information come from the
databases maintained by Sean Lahman at seanlahman.com. While the Lahman database allows us to
observe performance data by team for players that change teams within a season, FanGraphs only
publishes fWAR at the season level of observation. In these cases, we divide a player’s season fWAR
proportionally by his appearances for his respective teams, following the appearance weighting
described below. Thus, our dataset includes multiple observations within seasons for such players
corresponding to each team on which they appear.
6.1.1 Player Productivity Residuals
In order to construct player productivity residuals, we use the following weights to define a player’s
expected contribution to his team’s wins, Ŵint , based on his position (αp) and his share of his team’s
players’ appearances (gip).
Wˆint = α p g ipWint
0.57 if p is a position player
αp = 
if p is a pitcher
0.43
 ABi + pi * DOutsi
if p is a position player
 Kp
 ( AB + p * DOuts )
k
i
k
∑
k ≠i
g ip = 
pi * POutsi

if p is a pitcher.
Kp

pi * POuts k

∑
k ≠i

FanGraphs constructs fWAR such that players contribute 1,000 WAR per 2,430 games league-wide
(162 games for 30 teams). The terms 0.57 and 0.43 correspond to the proportion of league-wide
WAR they apportion to position players and pitchers, respectively. This split is based on the
assumption that because positional players appear on both sides of the ball, their contribution
should be weighted somewhat higher (FanGraphs 2016b). To generate appearance weights, we use
the sum of at-bats (AB) and defensive outs (DOuts) for position players in order to capture the
contributions of different types of players, such as pinch-hitters and defensive substitutions. For
pitchers, outs recorded (POuts) proves to be the most precise measure for capturing a variety of
pitching contributions (middle relievers, one-out guys, etc.). We then differentially weight DOuts
andPouts according to the positional run adjustments and replacement level win percentages for
18
2017 Research Papers Competition
Presented by:
starting and relief pitchers FanGraphs uses to construct fWAR, where we normalize pi to sum to 1
across positions and pitcher types, separately. These weights are presented in Table 1.
6.1.2 Regression Covariates
Table 1: Position Weights
The regression analysis presented in Section 4.1 uses several covariates from the dataset that we
construct from FanGraphs and the Lahman database. Our position indicators correspond to the
position that the Lahman database indicates as the primary position for each player. We include an
additional indicator variable for whether the player appeared in multiple positions over his seasonteam tenure. Age is simply defined as the difference between the season year and the player’s birth
year. Team and handedness indicators are pulled directly from the Lahman database, while we
generate running totals for a players’ years in MLB and years with their current team to control for
experience and team tenure. Finally, manager indicators correspond to each team’s manager on
opening day, thus ignoring any managerial changes within seasons.
6.2 A Spatial Factor Model
Here, we describe the mechanics of our spatial factor model and its estimation. In matrix form, the
model can be written as
Y = WFΛ + Wε
(1)
where Y is an ST × N matrix of outcomes, W is an ST × ST matrix of spatiotemporal weights, F
is an ST × K matrix of common factors, Λ is an K × N matrix of factor loadings, and ε is an
ST × N matrix of idiosyncratic determinants of Y .
6.2.1 The Reduced Form of a Spatial Autoregression
Equation 1 can be viewed as the reduced form of a spatial autoregression, or SAR. To see this,
consider the following representation of a SAR
Y = ρAY + υ
19
(2)
2017 Research Papers Competition
Presented by:
where Y is a ST × N matrix of outcomes, A is a ST × ST adjacency matrix, ρ is a scalar
parameter, and υ is an ST × ST matrix of residuals. Re-arranging the elements of equation 2, it can
be rewritten
Y = ( I − ρA) −1υ .
Defining W ≡ ( I − ρA) −1 and assuming the approximate common factor structure υ = FΛ + ε ,
equation 2 is shown to be equivalent to equation 1.
6.2.2 Estimation
Estimation of equation 1 proceeds with spatial principal components analysis, or SPCA, given a
number of common factors and appropriate scale and sign normalizations. For the latter, a choice
can be made to scale either the factor loadings or factors such that Λ′Λ = I or F ′W ′WF = I ,
respectively; and the signs of the factors set by restricting the columns of Λ to sum to zero. In
addition, we set ρ = −1 and restrict the row sums of the adjacency matrix A to be equal to 1 .
Combined, these normalizations satisfy the sufficient condition for W to exist that ( I − ρA) be
strictly diagonally dominant, i.e. 1 − ρAii ≥ ∑ j ≠i − ρAij .
Factor loading restrictions are handled by the expectation-maximization (EM) algorithm developed
in Dempster, Laird, and Rubin (1977), Shumway and Stoffer (1982), and Watson and Engle (1983)
extended to include unit loading restrictions by Reiss and Watson (2010). To get a sense of how the
algorithm operates, consider the following: If the factors were known, then it would be possible to
consistently estimate the factor loadings by a weighted least squares (WLS) regression of the form
Λˆ = ( F ′W ′WF ) −1 ( F ′W ′Y ).
Similarly, if the factor loadings were known, the factors are consistently estimated by
Fˆ = (W −1YΛ′)(Λ′Λ ) −1.
Given an unrestricted initial estimate of Λ or F, and depending on the choice of scale normalization,
the EM algorithm iterates between these two WLS regressions until the sum of squared errors for
equation 1 is minimized, imposing the factor loading restrictions at each iteration.
While the approximate factor structure we assume here is necessary for the EM algorithm to run,
we can still use it to obtain the exact factor structure of our model by setting a convergence
criterion which brings the sum of squared errors arbitrarily close to zero for a given number of
common factors. This is achieved quite easily with our two factor model using a criterion which
stops the algorithm when successive differences in the sum of squared errors are less than 1e-6.
20
2017 Research Papers Competition
Presented by: