February, 2009 - Phil Birnbaum

By the Numbers
Volume 19, Number 1
The Newsletter of the SABR Statistical Analysis Committee
February, 2009
Review
Academic Research: Competitive Balance
Charlie Pavitt
The author reviews an academic study showing an increase in competitive balance due to integration and the influx for foreign players.
Martin B. Schmidt, The Nonlinear Behavior of
Competition: The Impact of Talent
Compression on Competition, Journal of
Population Economics, 2009, Volume 22, pp.
57-74
contrast, the within-season variation has decreased steadily over
that time. In the first essay, Gould showed that the difference
between the five highest batting averages and the league average
was in the range of about 90 points during the 19th century, 80
points the first three decades of the 20th, and 70 points since; the
difference between the lowest and fifth-lowest began at about 60
or 70 points and has decreased to about 35 since. In the second
essay, Gould made the same point in a more statistically
trustworthy way, presenting the fact that the standard deviation in
within-season batting averages has decreased from about .05 in at
the beginning to about .03 at the time of the essay. In both
analyses, the improvement was asymptotic, occurring quickly at
the beginning but slowing down toward some yet-unknown
limiting figure.
Schmidt has been a huge contributor to the academic sabermetric
literature, with ten relevant publications of which I am aware,
almost all relevant to competitive balance. As much as this issue
has been studied, I still think that there is room for further
analysis, as it is at yet unclear whether the steady improvement in
balance among teams over the decades has stalled or even
reversed since
the mid-1990s.
Gould’s original
What is clear is
explanation for this
that steady
effect was the
improvement
Academic Research: Competitive Balance ..........................Charlie Pavitt .........................1
standardization of
until at least that
A Model for Estimating Run Creation ................................Richard Schell .......................3
play; for example,
time, which
McCracken and Wang Revisited .........................................Pete Palmer ............................9
teams have gotten
needs no further
progressively savvier
demonstration.
in positioning their
What it does
fielders, making it
need is
harder to get hits. But by the second essay he began to realize the
explanation, and Schmidt’s latest makes what I think to be a
real explanation; the overall increase in talent. We must assume
critical contribution to this issue.
that the greatest stars have been about equally good over the
interim, which would follow from Gould’s belief that there is a
In doing this work, Schmidt sits on the shoulders of the famed
limit to ability based on the very facts of human physiology (an
paleontologist and baseball fan Stephen Jay Gould, whose work I
“outer limit of human capacity”) that the best players can
will discuss here for readers unfamiliar with it. Gould wrote a
approach but never cross. If so, then the average player has
couple of essays demonstrating through the use of batting
gotten closer to the stars, and the greater availability of
averages the steady increase in the ability of position players over
competent players has increased the replacement level such that
the course of major league baseball history. Average BAs for
.180 hitters are just too poor to play regularly anymore no matter
regulars have drifted around during the decades. They were in
how well they field. There is a critical implication of this
the .250s during the first and second decades of the twentieth
conjecture. Although the number of major league teams has
century, in the high .280s in the 1920s, and then fell to around
almost doubled since 1960, the population from which major
.260 between 1940 and 1960. They dipped below that in next
league players are procured has far more than doubled. Not only
decade before popping back toward .260 in the 1970s. In
In this issue
By the Numbers, February, 2009
Page 1
has the population of the United States kept up with expansion, the number of players from the other American nations has exploded, along
with East Asians, an ever-increasing number of Australians, and even the occasional European.
Schmidt (2009) transferred this reasoning into the context of teams. As poorer teams have as a whole have poorer players than good teams, if
player strength increases mostly at the bottom end, then it will be the poorer teams that get the biggest payoff, thus increasing competitive
balance. Further, as player improvement is asymptotic, so should be team improvement. Schmidt used team data from 1911-2005 and
concluded from his analyses that the increase in competitive balance over that period of time was indeed asymptotic, but with a significant
breakpoint of improvement in the mid 1950s, when the impact of integration on levels of talent would have been at its highest. Further, the
degree of competitive balance correlates with the proportion of foreign-born players, which has been rising since 1940 from perhaps 2
percent at best to more than 25 percent. This later result implies that improvements in competitive balance are likely at least partly a result of
that influx, consistent with Gould in at least the sense that a bigger player pool to chose from means better overall talent.
Charlie Pavitt, [email protected] ♦
Informal Peer Review
The following committee members have volunteered to be contacted by other members for informal peer review of articles.
Please contact any of our volunteers on an as-needed basis – that is, if you want someone to look over your manuscript in
advance, these people are willing. Of course, I’ll be doing a bit of that too, but, as much as I’d like to, I don’t have time to
contact every contributor with detailed comments on their work. (I will get back to you on more serious issues, like if I don’t
understand part of your method or results.)
If you’d like to be added to the list, send your name, e-mail address, and areas of expertise (don’t worry if you don’t have any –
I certainly don’t), and you’ll see your name in print next issue.
Expertise in “Statistics” below means “real” statistics, as opposed to baseball statistics: confidence intervals, testing, sampling,
and so on.
Member
Shelly Appleton
Ben Baumer
Chris Beauchamp
Jim Box
Keith Carlson
Dan Evans
Rob Fabrizzio
Larry Grasso
Tom Hanrahan
John Heer
Dan Heisman
Bill Johnson
Mark E. Johnson
David Kaplan
Keith Karcher
Chris Leach
Chris Long
John Matthew IV
Nicholas Miceli
John Stryker
Tom Thress
Joel Tscherne
Dick Unruh
Steve Wang
By the Numbers, February, 2009
E-mail
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Expertise
Statistics
Statistics
Statistics
Statistics
General
General
Statistics
Statistics
Statistics
Proofreading
General
Statistics
General
Statistics (regression)
Statistics
General
Statistics
Apostrophes
Statistics
General
Statistics (regression)
General
Proofreading
Statistics
Page 2
Study
A Model for Estimating Run Creation
Richard Schell
There are various run estimators in current use – runs created, linear weights, base runs, etc. Here, the author comes up with a new
estimator, deriving it from first principles, and tests its accuracy compared to the others.
The following describes a model for estimating team run scoring based on the idea that run scoring is directly related to the average number
of men a team has on base when a batter comes to the plate. The model can be employed for other purposes, described at the end of the
paper. It fits available data exceptionally well compared to other run estimators in use today and has an additional advantage, in that it is
related to what happens in baseball games.
Background
The goal of run estimation is manifold. The first is to take a set of data specific to a team – familiar baseball statistics -- and use that data to
estimate how many runs that team should or will score. A second use is to estimate the number of runs a player on a given team created (or
will create) for his team. Related to that is the allocation of runs a team actually scored to its players in order to determine the value of that
player to his team in runs (and wins, following Bill James’s Win Shares statistic [1].
Familiar run estimators include Bill James’s well-known Runs Created estimators, of which many versions exist. James introduced this
formula nearly thirty years ago [2] and has continued to refine it from time to time to improve its fit to available data. James’s formula for
estimating runs has a simple form: Runs = A × B C . What has varied from version to version is what James uses for A, B, and C. This
estimator is non-linear – that is, the equation is not a linear equation. Roughly contemporary with Runs Created is Pete Palmer’s Linear
Weights [3], whose very name implies that it is a linear estimator – the estimator is a linear equation relating runs to other baseball statistics.
Obviously, all run estimators are either linear or non-linear. Among the first category, in addition to Linear Weights, Jim Furtado has
developed an improvement called Extrapolated Runs (XR) [4], which is quite effective. David Smyth created a very elegant non-linear
model, called Base Runs [5], which is similar to Runs Created in some ways, and different in a couple of very important features.
Dan Fox explains that run estimators can be differentiated in a different way: statistical and intuitive [6]. Most linear estimators, such as
Linear Weights and XR, are statistical in nature. They employ linear regression to determine the coefficients, or weights, they assign to
various baseball events. Non-linear estimators, such as Bill James’s various Runs Created formulas, as well as BsR are intuitive in nature.
These categories are useful, even if they are not pure in nature. (Both James and Smyth tweaked their formulas to fit the available data. Paul
Johnson’s Estimated Run Production [7], which is linear in nature, is driven by an intuitive look at how runs are scored.)
To Fox’s two categories, I will add an additional one: constructive models. Constructive models start with an intuition, such as the one I
described earlier – namely that run scoring is connected to the average number of men on base, but then use relationships that exist among
baseball events to construct a model of how run scoring works.
Constructing Models
Models can be constructed by examining how runs are scored. To obtain David Smyth’s model, start from a simple axiom, that the number
of runs scored cannot exceed the number of men who reach base minus those who are out on base (via double play, being caught stealing, or
out running the bases on a hit). In mathematical terms, Runs ≤ Reached − OutOnBase . Another way to state this is
Runs = f × ( R eached − OutOnBase) , where f is a function that never exceeds the value 1. A third alternative is to write this as
Runs − HR = f × ( R eached − HR − OutOnBase) , noting that home runs always score one run. What can be said about f? It
should have the value 1 if no batter makes an out, so a good form for it is X (OWB + X ) , where OWB stands for “out while batting”,
including all sacrifice hits and sacrifice flies. That leads to the run estimator
By the Numbers, February, 2009
Page 3
Runs =
X × ( R eached − HR − OutOnBase)
+ HR
X + OWB
which is the form of Base Runs. Finding X is a challenge, but one that can be addressed by using linear regression (carefully). This leads
to something very like Base Runs, with a single exception – Reached should include reaching on error, a piece of data that is now available
for a number of teams for a number of seasons, through Retrosheet. Note that the form of Bill James’s original Runs Created can be written
in the form Runs = f × TB , where X is simply RBS ( reaching base safely, not including reaching on error). There isn’t a fundamental
premise that leads to the derivation of this form. As a result, Runs Created is subject to the anomalous behavior mentioned earlier.
Another, more complex, construction starts from the observation that runners who reach base either score, are out on base, or are left at the
end of an inning. In mathematical terms, Runs = Reached − OutOnBase − LeftOnBase . Add to this two additional
assumptions related to the average number of men on base. The first assumption is driven by intuition and observation. It is obvious that a
home run or triple will (on average) drive in the average number of men on base; a double will drive in a substantial percentage of that
number; and a single will drive in a smaller percentage, a walk a smaller percentage still. Because home runs always drive in the batter, they
should be separated. The implication of these two principles is that Scored = V × m + HR , where V is (hopefully) a linear function
and m stands for average men on base.
The second assumption can be motivated as follows. If every batter came up with the same average number of men on base as did the last
batter, then the number of men left on base per inning would be the same as the average number of men on base. That isn’t the case, for a
variety of reasons, not the least of which is that the first man in the inning always comes to the plate with no runner on base. It is intuitive
that LeftOn = k × m × Innings , where the number k (which could be a variable or a constant) is in the neighborhood of 1. Data
indicates (and other models that will be discussed bear out) that this number can be expressed as
for “batter outs per inning” and z is some (fairly small, per actual team data) number.
If these assumptions are combined, and using the fact that
BOPI × (1 − z ) 2 , where BOPI is short
Innings × BOPI = OWB , the result is
[V + (1 − z ) 2 × OWB] × m = Reached − HR − OutOnBase
R eachedBase − HR by “Occupied”, and the
Then, substituting the result for m into TDI = V × m yields
where OWB stands for “out while batting”. It is convenient to denote the quantity
quantity
Scored − HR
TDI =
by “TDI” (for teammates driven in).
2 × V × (Occupied − OutOnBase)
2 × V + (1 − z ) × OWB
The problem is that V and z are both unknown. They can both be obtained using linear regression techniques (and some tricks with algebra),
but that obscures what is going on to actually score runs. The variable z is small, and doesn’t vary much, according to historical data, so the
real problem is in V. Empirically,
sacrifice flies1.
V = (TB − HR) 3 + U
where U includes terms in doubles, triples, stolen bases, sacrifice hits, and
A Third Construction
Markov models can be used as a tool to develop a better model, one that has a clearer relationship to what happens in the game. Markov
models are based on stochastic processes that model real world activities, such as baseball games. Simply put, they explain how an inning is
likely to progress from one batter to the next. That enables determining the average state of the game, including such numbers as average
men on base, average men on first, second, and third, average scoring from any given base, and so on. Their application requires matrix
1
David Smyth asked in an online forum whether sacrifice flies should be given special treatment in run estimation models. In an early version of the run
estimation model described here, sacrifice flies were given the same treatment as home runs; in the current version, they are treated as a variable part of V.
Both treatments yield roughtly the same “value” for sacrifice flies. Neither treatment is right or wrong.
By the Numbers, February, 2009
Page 4
algebra involving sub-matrices of a 25x25 matrix and row (or column) vectors. Their actual use is beyond the scope of this paper, but they
can be useful in deriving certain key expressions and approximations. For a deeper discussion, see [8] and [9].
A Markov model for baseball [10] can be used to derive the relationship
m=
2 × (Occupied − k1 × TDI − k 2 × OOB)
OWB
The numbers k1 and k 2 are variable, but can generally be treated as constants for the purposes of modeling the process of run scoring.
Markov models suggest that these numbers are roughly .8 and .875 2. This last relationship produces a slightly different formula for run
estimation. Using TDI = V × m , this version yields
TDI =
2 × V × (Occupied − k 2 × OOB)
OWB + 2 × k1 × V
(Est)
k1 and k 2 are both 1. Note the similarity between this formula and Base Runs. David Smyth
did not incorporate “out on base” (equivalent to k 2 = 0 ) and had an identical term in the numerator and denominator (equivalent to
k1 = 1 and A = 2 × V ).
This latter form is identical to the prior one if
To finish the process requires devising a linear function, V, for which TDI = V × m holds. To do that, Markov models can be used to
derive simple relationships among various averages of men on base – total men on base as well as average numbers of men on each base. A
really simple sketch proceeds as follows. Let
V1 ,V2,
and
V3
be linear functions that approximate the rate of driving in men from first,
second, and third base respectively. Then, the simple relationship
first,
m2
the average number of men on second, and
m3
m2 = m − m1 − m3
(where
m1
is the average number of men on
the average number of men on third) yields
TDI = V 2 × m − (V 2 − V1 ) × m1 + (V3 − V 2 ) × m3
The Markov model also produces good approximate coefficients a, b, c, and d, such that
m3 ≈ a × m − b × m1 + c × (
3B + d × OOB
)
BFP
This enables the equation for TDI to be written in the form
TDI = U 1 × m − U 2 × m1 + U 3 × (
3B + d × OOB
)
BFP
m1 , as well as the final term in the right hand side, can be related to m. In this way one obtains a linear function V that includes
only first-order terms (no constants) for which the relationship TDI = V × m holds. Using historical values to create these
Ultimately,
approximations, and making small adjustments to obtain simple coefficients, yields
V = .5 × (TB + ROE − .3 × H − 2 × HR + .3 × 3B + 3.1 × SF + .7 × SB + .1 × ( SH + BB + HBP ) − .06 × SO )
2
This is equivalent to computing the value of z discussed earlier.
By the Numbers, February, 2009
Page 5
Note also that this formula, unlike others, uses reached-on-error3, a statistic that is available for many years through Retrosheet. Observing
that m/2 is approximately ( R eached − HomeRuns ) / BFP for most actual team data, runs scored can be roughly estimated by
something of the form
(TB + X ) ×
R eached
2 × R eached + (TB − 2 × HR + X )
+ (1 −
) × HR .
BFP
BFP
If X and the term multiplying HR are both negligible (or offset), this is very similar to the original Bill James formula for Runs Created.
Runs Created works not because it was derived from first principles, but because it approximates a model that was.
Effectiveness
Is this a good estimator? The answer is quite good, as measured by first, fit to data. The following table shows the results using one hundred
years of data from 1907 through 2006. The rows in the table show the average values of actual runs, estimated runs, the correlation between
these, and the standard error for the period since the year in the first column. (This method is designed to remove era-specific estimation.
Better results for correlation and standard error can be obtained for any specific span of time, but only at the expense of some other span of
time.)
Since
Actual
Est.
Correl
Std Err
1907
1920
1930
1940
1950
1960
1970
1980
1990
2000
691.4
704.1
700.6
694.4
697.7
699.1
708.6
721.5
747.6
775.5
691.2
703.6
700.5
695.2
698.3
699.1
709.2
721.0
746.7
775.3
.979
.978
.978
.979
.980
.981
.981
.982
.978
.963
22.68
21.77
21.70
20.97
20.40
19.95
20.08
20.49
20.40
21.67
Since 1960, the estimator performs remarkably well4. One should not expect an estimator to do much better, because the average variance in
run scoring for two teams that have identical statistics should be between twenty and twenty one as well.
A comparison of the model’s fit to data against RC, XR, and BsR is quite favorable. For example, since 1960, the latest technical version of
RC produces a correlation of .976 and a standard error of 22.9. Slightly modified versions5 of XR and BsR produce correlations of .978 and
.979, and standard errors of 22.1 and 21.8, respectively. Some of the advantage that the model has over RC, XR, and BsR exist because it
incorporates reached on error (ROE), while the others do not. It requires a bit of manipulation, but both BsR and XR can be extended to
include ROE; only the most recent version of RC is amenable to this, one with an elaborate weighting scheme. The following table
compares RC, BsR, XR, and the new model since 1960.
RC
Correl
Stderr
BsR
XR
New Model
W ROE
W/O
W ROE
W/O
W ROE
W/O
.979
21.1
.976
22.9
.980
21.0
.978
21.8
.979
21.2
.977
22.2
.981
19.9
3
Adding reached on error is an important feature of this model. Modern teams reach base on error fewer than 70 times (the 2006 league average was 66),
versus an average of 180 times in 1907. That difference equates to over fifty runs. Accounting explicitly for outs on base is also important, as that number
varies greatly year over year; modern teams hit into more double plays, but are caught stealing fewer times than historical averages.
4
For example, use of linear regression on the period since 1960 produces only slightly better correlation and standard error, but at the expense of estimating
earlier periods well.
5
The author tweaked coefficients in both models based on findings related to his own model; these tweaks improved both XR and BsR in terms of fit to data.
The structure of each model was left intact. Such tweaking was not justified with RC, as Bill James has repeatedly tweaked his own model, unlike Furtado
and Smyth.
By the Numbers, February, 2009
Page 6
All these methods clearly improve with the inclusion of ROE, although the new model still outperforms them.
Fit to data is only one criterion for effectiveness. This estimator also “explains” why it works. Intuitively, a relationship exists between
scoring runs and the average number of men a team has on base. This model exploits that relationship, yielding good estimates for average
men on base as well as run scoring. The fact that the linear function V can be obtained by construction (as well as by linear regression) is, I
believe, a benefit. As David Smyth remarked about his own Base Runs in [3], it does a “good job of modeling the scoring process”. (For
Smyth, this was more important than standard error.) The men-on-base model improves on Base Runs, in that it provides an explicit
connection among the terms in the estimator.
The model also explains the net values of various statistics in run scoring. Using standard mathematical techniques [11], one can derive
linear functions for run scoring from the non-linear formula denoted in the foregoing by (Est). These linear functions can be derived from
team-specific data, from yearly league averages, or from averages over longer periods of time. For example, using this technique on
historical averages for various statistics yields linear weights of .498, .751, 1.029, 1.421, and .353 for singles, doubles, triples, home runs,
and walks (as well as hit by pitch). It also yields a weight of .168 for stolen bases and -.396 for caught stealing.
Measured in terms of simplicity, this model is not nearly as elegant as James’s original Runs Created, but compared with any of the more
“accurate” versions of RC, or any other estimator, it holds its own. It is possible to simplify the expression for V and obtain a reasonable
result, but at a fair sacrifice to accuracy. One potential use for this model is allocating team runs to individual players to evaluate their
performance (see [111]). Using the full model, rather than an abbreviated version, improves the accuracy of this allocation as well.
Summary
The goal of this paper was to elaborate a new model for estimating runs, one that adheres to the process of run scoring that actually occurs
during games. Rather than use linear regression to obtain an estimator, it uses fundamental principles. It ties its structure to other run
estimators, both non-linear and linear. With indirect help from a stochastic (Markov) model, it produces a formula for run estimation that is
apparently superior in fitting data to any other simple estimator available6.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
James, Bill, Win Shares.
James, Bill, The 1979 Baseball Abstract.
Palmer, Pete and Thorn, John, The Hidden Game of Baseball.
Furtado, Jim, The 1999 Big Bad Baseball Annual
Smyth, David, Base Runs Primer, no longer available online.
In Dan Agonistes, October 7, 2004, available online at http://danagonistes.blogspot.com/2004/10/brief-history-of-run-estimationruns.html. This work references as an original source for the idea, Curve Ball, by Jim Albert and Jay Bennett, an excellent source
for understanding the role of chance and stochastic processes in baseball.
Johnson, Paul, in The 1985 Baseball Abstract (James).
For those interested, a good starting point is Mark Pankin’s tutorial at http://www.pankin.com/markov/.
"The Book: Playing the Percentages in Baseball" also makes extensive use of Markov models.
Schell, Richard, Using Markov Models as a Tool for Run Estimation, to be published.
Schell, Richard, Linear Weights from Non-Linear Run Estimators, to be published.
Richard Schell, [email protected] ♦
6
Full Markov models should do better, but are more complex to understand and use.
By the Numbers, February, 2009
Page 7
Submissions
Phil Birnbaum, Editor
Submissions to By the Numbers are, of course, encouraged. Articles should be concise (though not necessarily short), and
pertain to statistical analysis of baseball. Letters to the Editor, original research, opinions, summaries of existing research,
criticism, and reviews of other work are all welcome.
Articles should be submitted in electronic form, either by e-mail or on CD. I can read most word processor formats. If you send
charts, please send them in word processor form rather than in spreadsheet. Unless you specify otherwise, I may send your
work to others for comment (i.e., informal peer review).
If your submission discusses a previous BTN article, the author of that article may be asked to reply briefly in the same issue in
which your letter or article appears.
I usually edit for spelling and grammar. If you can (and I understand it isn’t always possible), try to format your article roughly
the same way BTN does.
I will acknowledge all articles upon receipt, and will try, within a reasonable time, to let you know if your submission is accepted.
Send submissions to:
Phil Birnbaum
88 Westpointe Cres., Nepean, ON, Canada, K2G 5Y8
[email protected]
Get Your Own Copy
If you’re not a member of the Statistical Analysis Committee, you’re probably reading a friend’s copy of this issue of BTN, or
perhaps you paid for a copy through the SABR office.
If that’s the case, you might want to consider joining the Committee, which will get you an automatic subscription to BTN.
There are no extra charges (besides the regular SABR membership fee) or obligations – just an interest in the statistical
analysis of baseball.
The easiest way to join the committee is to visit http://members.sabr.org, click on “my SABR,” then “committees and regionals,”
then “add new” committee. Add the Statistical Analysis Committee, and you’re done. You will be informed when new issues
are available for downloading from the internet.
If you would like more information, send an e-mail (preferably with your snail mail address for our records) to Neal Traven, at
[email protected]. If you don’t have internet access, we will send you BTN by mail; write to Neal at
4317 Dayton Ave. N. #201, Seattle, WA, 98103-7154.
By the Numbers, February, 2009
Page 8
Study
McCracken and Wang Revisited
Pete Palmer
The author comments on two issues. First, Voros McCracken's DIPS theory suggests that pitchers have only a small amount of control over
whether balls in play turn into base hits. Here, the author shows how much real-life variances in this ability contribute to ERA differences
between groups of pitchers. Second, the author comments on Victor Wang's recent study on OBP and SLG ratios, running regressions to
provide more data on what linear combination of on-base and slugging best predicts runs.
Voros McCracken suggested that there is no difference among pitchers for outs per balls in play. This has been analyzed and contested and
McCracken backed off somewhat, saying that there is very little difference. In actuality, there is a fair difference, but not nearly as much as
would be expected based on the change in earned run average. However, there is a good reason for that, because counting outs per balls in
play eliminates most of the real differences between pitchers. The table below covering data for 2001-2008, a reasonably homogeneous
sample, shows typical stats for pitchers grouped by earned run average. In this sample, only pitchers with at least 1/3 of the appearances as
starters were considered. This is because relief pitchers have a slight advantage in ERA because all runners on base when they come in are
charged to the previous pitcher, even though the current pitcher is partly responsible for them. Including all pitchers changes the results
only slightly.
Statistics for pitchers with at least 1/3 starts, 2001-2008, grouped by ERA
num
pit
122
162
253
313
284
217
627
era
2.64
3.28
3.76
4.25
4.73
5.22
6.41
tbf
36.40
37.19
37.88
38.52
39.24
39.95
41.38
h
7.48
8.15
8.71
9.18
9.70
10.13
11.06
2b
1.45
1.59
1.76
1.89
2.00
2.14
2.33
3b
.16
.15
.17
.19
.21
.22
.24
hr
0.69
0.83
0.97
1.06
1.18
1.29
1.51
bb
2.46
2.67
2.80
2.97
3.14
3.37
3.90
hb
.30
.28
.32
.35
.37
.41
.44
so
8.08
7.28
6.44
6.17
5.76
5.71
5.55
bip
24.37
25.60
26.80
27.41
28.15
28.52
29.28
outs o/bip
17.59 .722
18.29 .714
19.06 .711
19.29 .704
19.63 .698
19.68 .690
19.73 .674
oba
.284
.301
.315
.327
.340
.351
.375
slg
.341
.372
.401
.422
.446
.467
.506
Taking the 2nd and 5th lines, the ratio of era is 5.22/3.28 or 1.59, while the ratio of outs/bip is only .714/.690 or 1.03. Taking hits/bip
instead, you still get .310/.286 or 1.08. For pitchers, runs allowed are proportional to on-base times slugging. Batters, on the other hand,
show runs scored as a function of on-base plus slugging (OPS). This is because a batter gets dropped in a normal lineup every 9 positions,
so even a player who hit a home run every time up would add only about 6 runs per game. A pitcher who gave up a homer every time up
would allow an infinite number of runs. So for a hits/bip ratio of 1.08, runs allowed would be charged with about 17% more runs rather than
59%.
The correlation between outs/bip and earned run average is still fairly decent at an R-squared of 42%; however, on-base times slugging has a
77% mark. The slope and intercept show the regression line values. Correlation is in terms of R in the table. So the least squares best fit for
ERA as a function of outs/bip is ERA equals –38.1409 x outs/bip + 31.6966. Sigma is the standard deviation of ERA using this formula,
which for outs/bip is 2.1018 runs. The average ERA is high because each pitcher is counted equally regardless of innings,
Regression equation coefficients for predicting ERA from a single other statistic
Item
O/BIP
OXS
OPS
OBA
SLG
Mean
.692
.163
.806
.349
.458
Slope
-38.1409
37.7772
16.3936
45.4494
21.6610
By the Numbers, February, 2009
Int
31.6966
-0.8879
-7.9314
-10.5559
-4.6283
Corr-r
-0.6523
0.8778
0.8475
0.8256
0.7900
Sigma
2.1018
1.3285
1.4719
1.5647
1.7001
Page 9
What is the difference between a good pitcher and a poor pitcher? If the good pitcher turns a home run into an extra base hit, his outs/bip
goes down because there are more BIPs and the same number of outs. If he turns an extra base hit into a single, he gets no gain at all. If he
turns an ordinary out into a strikeout, his outs/bip goes down again because of less BIPs and less outs. If he turns a walk into a ball in play,
his outs/bip is unaffected, assuming the batter would make outs at the same rate as existing batters. The only time he makes a gain is if he
turns a single (or, equally, a non-HR hit) into an out.
Looking at linear weights instead, you can see that only about 25% of the total difference between pitchers in the 2nd and 5th groups is
explained by outs/bip. Since doubles and triples count as singles in outs/bip, only the difference between a single and the double or triple is
counted. The difference in actual runs is slightly higher than the difference in earned runs.
Components of the ERA difference between the sixth group (5.22) and the second group (3.28)
(all figures per nine innings)
item 2nd
ERA 3.28
R
3.59
5th
5.22
5.65
diff
1.94
2.06
HR
3B
2B
BB
HB
SO
.83
.15
1.59
2.67
.28
7.28
1.29
.22
2.14
3.37
.41
5.71
.46
.07
.55
.70
.13
-1.57
1.40
.55*
.38*
.33
.33
-.28
.644
.038
.220
.231
.053
.440
1B
8.15
out 18.29
10.13
19.68
1.98
1.39
.47
-.28
.931
-.389
2.168
weight
1.00
1.00
net
1.940
2.060
*as compared to a single
For example: the second group gave up .83 home runs per nine innings. The fifth group gave up 1.29.
The difference is .46 home runs. Multiplied by a linear weight of 1.4, we see that differences in
home run rates was responsible for .644 runs per nine innings between the two groups.
There has been talk that on-base average is more important than slugging average in correlating with team runs. This was first reported in
Michael Lewis’ Moneyball, where Oakland had determined that the ratio should be 3 to 1. Mark Pankin gave a talk at SABR which
proposed 2 to 1, or to be more exact, 1.8 to 1. Victor Wang in this publication verified 1.8. However, he also showed that the differences
for various values in correlation to team runs was very small. OPS is an approximation of NOPS, normalized OPS, which correlates directly
with run scoring. The formula is oba/lg + slg/lg – 1, where lg is the league average. This has the effect of weighting oba slightly higher than
slg, since the league average oba is around .333 and slugging is around .400. This results in weighting oba 1.20 times slg. An increase of
20% in oba and 20% in slugging results in run scoring at 40% above average. However OPS shows only a 20% increase. NOPS itself is an
approximation of linear weight runs.
When you compare OBA and SLG to team runs, you have to use runs per inning batted, because high scoring teams will usually win more
games and therefore bat in less innings. This is of course due to the fact that the home team does not bat in the last of the ninth if ahead.
Using runs per game will show these teams often to score less runs than predicted.
Here is a trick you can use to amaze your friends. All you need are complete batting and pitching stats for a team, except wins and losses.
You can calculate innings batted by taking total plate appearances minus runs and minus left on base, all divided by 3. You already have
innings pitched from the team pitching. Wins are simply equal to games over 2 plus innings pitched minus innings batted. This is because
for each home win, you have one less inning batted and one each road win you have one more inning pitched. The only problem involves
games that are won in the last inning by the home team, but these tend to cancel out. The standard deviation using this method is about 2
wins per year, compared to 4 using normal methods employing team runs scored and allowed. This does not work for teams that have an
unbalanced home/away schedule, which occurred during the strikes of 1981 and 1994. Also in 1991, Montreal was forced to play their last
26 games on the road because of structural problems at Olympic Stadium, losing about 13 home games.
An aside on comparing runs to wins: Bill James’ Pythagorean Theorem calculates winning percentage by taking runs scored squared divided
by the sum of runs scored squared plus runs allowed squared. The best method varies over the years, but in some cases, using an exponent
By the Numbers, February, 2009
Page 10
of 1.83 works better than 2. This formula caused Bill to take 20 years to figure the relationship between runs created and wins when he
developed Win Shares. There is a much easier formula that makes the relationship between runs and wins much more obvious, namely 10
runs per win. This states that for every 10 runs more scored or 10 runs less allowed, the team will win one more game. So wins equals runs
scored minus runs allowed, all over 10, plus games over 2. Bill’s formula works better for high scoring teams or teams with a big run
differential, like 160 or more. The easy formula can match this using a divisor of 10 times the square root of runs scored per inning by both
teams. For example, if the runs per game is 6, the runs scored per inning by both teams would be 12/9, which would give 11.5 runs per win.
Since most teams have small run differences, extreme cases where one method might be better than another get swamped, so you have to
look at the unusual cases to find a difference in methods.
Typical results for various periods are shown below.
Standard Deviations of various Wins-From-Runs estimators
period
1900-19
1900-19
Minimum run
difference teams
0
328
160
79
standard deviation in wins
runs(10) runs(sqi) Pythag(2) Pythag(1.83)
4.65
4.34
4.51
4.32
4.88
4.30
5.08
4.89
1920-99
1920-99
1920-99
1920-99
0
160
240
300
1622
269
70
18
4.06
4.29
5.00
6.49
3.95
3.91
4.47
5.12
4.05
4.11
4.84
5.04
3.96
3.95
4.65
4.83
2000-08
2000-08
0
160
270
33
4.09
4.20
4.06
4.13
4.06
4.29
4.08
4.27
If you knew exactly how good each team was at the beginning of the season and tried to predict how many games they would win, the
expected standard deviation for each team would be square root of (pqn), where p is the probability of winning, q is the probability of losing
(which equals 1-p) and n is the number of games.
For 162 games, this is the square root of 40.5 or 6.34. Yet the derived values are much less, around 4. This is because at the end of the
season you have more data, namely the actual number of runs scored and allowed. Even though a team was known to be a .500 team, if by
luck they scored more runs than allowed, they would usually win a few more games. So what is the absolute minimum that can be achieved?
Looking at all teams from 1920 to 2008 who had a run differential of 5 or less (84 teams), the standard deviation of wins was 4.07, which
suggests that 4 is about the best that can be done.
Now let’s get back to the OBA multiplier. I used values of x between 0.5 and 1.5 and compared OPS as calculated by x times OBA + (2-x)
times SLG to team runs per 27 outs. So OBA over SLG equals x over (1-x). There was very little difference for an OBA to SLG ratio of
about 1.3 up to 2.3. Looking at 2000 through 2008, the lowest sigma was around 0.16 runs per game or about 25 runs per season, which is
about the best you can do without bringing in steals and double plays. The broad minimum varied only from around 0.158 to 0.160 or
about 0.3 runs per year.
Results of team run regressions for various weightings of OBA and SLG, 2000-2008
OBA
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
SLG RATIO
1.50 0.33
1.45 0.38
1.40 0.43
1.35 0.48
1.30 0.54
1.25 0.60
1.20 0.67
1.15 0.74
1.10 0.82
1.05 0.90
1.00 1.00
0.95 1.11
OPS
.804
.799
.795
.790
.786
.781
.777
.772
.768
.763
.759
.754
By the Numbers, February, 2009
RPG
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
SLOPE
11.3611
11.5871
11.8191
12.0579
12.2997
12.5447
12.7927
13.0545
13.3139
13.5808
13.8493
14.1102
INT
-4.3077
-4.4373
-4.5697
-4.7053
-4.8411
-4.9773
-5.1136
-5.2583
-5.3988
-5.5427
-5.6854
-5.8200
CORR-R
0.9211
0.9238
0.9265
0.9291
0.9318
0.9341
0.9364
0.9388
0.9409
0.9429
0.9447
0.9460
SIGMA
0.1959
0.1927
0.1893
0.1860
0.1827
0.1796
0.1766
0.1733
0.1705
0.1676
0.1650
0.1631
Page 11
1.10
1.15
1.20
1.25
1.30
1.35
1.40
1.45
1.50
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
1.22
1.35
1.50
1.67
1.86
2.08
2.33
2.64
3.00
.750
.745
.741
.736
.732
.727
.723
.718
.714
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
4.82
14.3830
14.6527
14.9256
15.1916
15.4619
15.7255
15.9762
16.2319
16.4627
-5.9612
-6.0977
-6.2340
-6.3628
-6.4924
-6.6147
-6.7253
-6.8372
-6.9291
0.9473
0.9484
0.9491
0.9495
0.9495
0.9491
0.9481
0.9468
0.9447
0.1612
0.1596
0.1585
0.1579
0.1579
0.1585
0.1600
0.1619
0.1650
The 1920-1999 period had a slightly higher run variation between predicted and actual, but the broad minimum had about the same range. I
started at 1920 because that is the first year that left on base stats were kept.
Results of team run regressions for various weightings of OBA and SLG, 1920-1999
OBA
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
1.45
1.50
SLG RATIO
1.50 0.33
1.45 0.38
1.40 0.43
1.35 0.48
1.30 0.54
1.25 0.60
1.20 0.67
1.15 0.74
1.10 0.82
1.05 0.90
1.00 1.00
0.95 1.11
0.90 1.22
0.85 1.35
0.80 1.50
0.75 1.67
0.70 1.86
0.65 2.08
0.60 2.33
0.55 2.64
0.50 3.00
OPS
.747
.744
.741
.739
.736
.733
.730
.727
.724
.721
.718
.715
.712
.709
.706
.703
.700
.698
.695
.692
.689
RPG
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
4.47
SLOPE
11.1666
11.3890
11.6256
11.8700
12.1094
12.3567
12.6045
12.8579
13.1182
13.3750
13.6337
13.8978
14.1684
14.4169
14.6867
14.9396
15.1927
15.4390
15.6761
15.9015
16.1211
INT
-3.8749
-4.0077
-4.1498
-4.2962
-4.4375
-4.5832
-4.7278
-4.8750
-5.0258
-5.1724
-5.3190
-5.4679
-5.6199
-5.7546
-5.9029
-6.0378
-6.1712
-6.2986
-6.4180
-6.5279
-6.6325
CORR-R
0.9197
0.9231
0.9268
0.9304
0.9337
0.9371
0.9402
0.9432
0.9462
0.9489
0.9514
0.9537
0.9561
0.9574
0.9591
0.9602
0.9608
0.9610
0.9608
0.9600
0.9587
SIGMA
0.2631
0.2577
0.2516
0.2455
0.2398
0.2338
0.2281
0.2225
0.2168
0.2114
0.2063
0.2014
0.1964
0.1935
0.1896
0.1872
0.1858
0.1852
0.1856
0.1876
0.1905
1900 to 1920 was quite a bit worse, and the broad minimum was around a 1.0 OBA to SLG ratio. This was probably due to the fact that
there were fewer homers and a higher correlation between batting average and slugging average. Stolen bases, not included in OPS, were
more important. There were more errors and more unearned runs. The percent of unearned runs was 32% in 1900, 20% in 1920, 14% in
1930 and 8% today. The lack of left on base stats made the innings batted calculation less accurate, as I had to derive it using team wins and
losses instead. Another problem resulted from trying to apply the same regression line over a period where things were changing. If the
sample was divided into two parts, then the best results for 1900-1909 was a sigma of 0.2563 runs per game at a ratio of 1.35, and for 191019, it was a sigma of 0.2140 runs per game at a ratio of 1.86. Changing to 5 year intervals reduced the runs per game by another 10%, with
a ratio around 1.8.
Results of team run regressions for various weightings of OBA and SLG, 1900-1920
OBA
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
SLG RATIO
1.50 0.33
1.45 0.38
1.40 0.43
1.35 0.48
1.30 0.54
1.25 0.60
1.20 0.67
1.15 0.74
OPS
.658
.657
.656
.655
.655
.654
.653
.652
By the Numbers, February, 2009
RPG
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
SLOPE
12.7592
12.9292
13.0992
13.2704
13.4375
13.6058
13.7725
13.9374
INT
-4.3779
-4.4791
-4.5801
-4.6816
-4.7801
-4.8791
-4.9768
-5.0731
CORR-R
0.8774
0.8791
0.8805
0.8820
0.8832
0.8842
0.8852
0.8859
SIGMA
0.3519
0.3497
0.3477
0.3457
0.3441
0.3426
0.3413
0.3402
Page 12
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
1.45
1.50
1.10
1.05
1.00
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.82
0.90
1.00
1.11
1.22
1.35
1.50
1.67
1.86
2.08
2.33
2.64
3.00
.651
.651
.650
.649
.648
.647
.646
.646
.645
.644
.643
.642
.642
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
4.02
14.0990
14.2617
14.4171
14.5660
14.7205
14.8577
14.9973
15.1290
15.2525
15.3677
15.4711
15.5680
15.6545
-5.1669
-5.2612
-5.3505
-5.4353
-5.5235
-5.6003
-5.6784
-5.7511
-5.8183
-5.8800
-5.9339
-5.9835
-6.0262
0.8865
0.8870
0.8872
0.8870
0.8869
0.8863
0.8856
0.8845
0.8832
0.8815
0.8795
0.8771
0.8745
0.3394
0.3387
0.3385
0.3387
0.3388
0.3397
0.3407
0.3422
0.3440
0.3463
0.3491
0.3523
0.3558
So the ratio of the importance of oba to slg for optimal run scoring is probably higher than unity, but the actual difference in team runs over
the course of a season is small.
Pete Palmer, [email protected] ♦
By the Numbers, February, 2009
Page 13