Do Two Wrongs Make a Right in NBA Officiating?

MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
Do Two Wrongs Make a Right in NBA Officiating?
An Analysis of Referee Bias in Make-Up Call Situations
Paul Gift
Pepperdine University, Graziadio School of Business and Management
Los Angeles, CA, USA, 90045
Email: [email protected]
Abstract
A make-up call is a particularly enigmatic type of potential referee bias in sports.
Examples could include a wrong call to balance a prior wrong call or a questionable call to
balance a prior questionable call. Motivation for a make-up call may derive from the
rationalization that “two wrongs make a right,” from crowd or team pressures, or from a
league’s explicit or implicit incentives. In this paper, I investigate whether NBA referees
may consciously or subconsciously be affected by such factors using play-by-play data of
over 1.1 million possessions from 6,538 games played during five seasons from 2006-2011.
I examine the probability of various judgment call turnovers on one team when a
judgment call was recently made against the opposing team, using information on nonjudgment turnovers to control for possible changes in player aggression and awareness.
Findings support a make-up call hypothesis, whether intentioned or not. Results do not
support a hypothesis of make-up non-calls; i.e., that a referee is less likely to make a
judgment call because a potentially incorrect call had recently been made on the same
team. This paper sheds important light on an oft-suggested but seldom-studied type of
potential behavioral bias.
1 Introduction
On Jan. 24, 2011, reporter Jon Krawczynski tweeted, “Ref Bill Spooner told Rambis he'd 'get
it back' after a bad call. Then he made an even worse call on Rockets. That's NBA officiating folks.”
Referee Spooner subsequently sued the Associate Press and its sportswriter, and a settlement was
reached in December. He claimed he told coach Rambis he would “get back” to him after reviewing
the tape. The NBA investigated and concluded that the referee had acted properly. [1] Regardless of
the outcome, this story is one of numerous anecdotes and suggestions that can be found regarding the
possible existence of make-up calls. In general, a make-up call can be thought of as a referee’s
conscious or subconscious balancing of a wrong/questionable call on one team with a subsequent
wrong/questionable call on the opposing team. In this paper, I move out of the realm of anecdotes
and empirically investigate make-up call situations in the NBA using actual in-game decisions of
referees for more than 6,500 games played over five seasons.
The sports world can be a useful economic research lab because there is typically very
detailed and precise reporting of individual behavior and decisions in a variety of situations. It is a
fitting setting to investigate referee bias as allegations of make-up calls are quite common in a number
of sports. This may involve the judgment calls of pass interference or roughing the passer in football,
strikes and balls in baseball, cross-checking or two-man advantages in hockey, red/yellow cards or
MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
offside in soccer, point deductions in mixed martial arts and boxing, and offensive fouls or certain
violations in basketball.
Prior research on NBA referees has examined home and losing team bias [2], racial bias [3],
and omission bias [4]. Sequential judgment effects have been analyzed for penalty decisions in soccer
[5], gymnastics judging [6], and ball/strike calls in baseball [4]. Some were done in an experimental
lab setting and others analyzed actual judge/referee decisions. I model and analyze make-up call
situations in the NBA using in-game play-by-play data to investigate the changes in probability of
certain referee judgment calls following recent judgment calls on the opposing team. While make-up
calls cannot be “proven,” my findings are consistent with a make-up call hypothesis. I also investigate
the changes in probability of certain referee judgment calls following recent judgment calls on the same
team. Results initially appear to support this make-up non-call hypothesis, but a more detailed
analysis suggests that they are likely explained by changes in player awareness.
The NBA may have a latent history with make-up calls. [7] Shortly after the Tim Donaghy
scandal in the summer of 2007, the league hired former federal prosecutor Lawrence Pedowitz to
conduct an investigation of its refereeing structure. Among other things, Donaghy alleged that an
incorrect call had been made in a Minnesota/New Orleans game and he was later told by another
referee that they “could have made something up at the other end…calling a traveling violation on
Kevin Garnett.” [8] After interviewing every referee [9] as well as other team and league personnel,
Pedowitz found that there were two historical refereeing philosophies, the old and the new. Under
the old philosophy, “…if a referee recognized that he or his crew had made an incorrect call, a referee
might whistle a ‘make-up call’ soon thereafter.” [7] But, the league changed its officiating philosophy
in 2003, establishing 16 performance standards where referees were to “strive for the unattainable
goal of perfection,” “get the calls right,” and make “accurate calls, regardless of the circumstances of
the game.” [7] This would seem to at least discourage conscious make-up calls, but the effect on
subconscious ones remains debatable.
2 Data
Play-by-play data were obtained from basketballvalue.com for five NBA regular and post
seasons from 2006-07 to 2010-11, firmly within the time period of the new refereeing philosophy.
The data contain complete information for 6,538 games and over 1.2 million possessions.1 More than
300 random spot checks were conducted and compared to online play-by-play information from
nba.com (the original source), espn.com, and cbssports.com. The data were then distilled down to
the possession-by-possession level.2 Possessions were dropped if there were less than 24 seconds
remaining in the period or less than two minutes remaining in the 4th quarter or overtime. This was
done in an effort to exclude situations where the offense does not execute a typical full possession or
the defense commits an intentional foul when trailing towards the end of a game. The final dataset
contains over 1.1 million possessions. The outcome of interest for each possession is whether it ends
in a turnover, and, if so, which type.
The make-up call situations I examine involve offensive fouls and violations. While
defensive foul calls could technically be “made up,” their impact on the game can vary greatly. The
effect on a player’s foul total may be de minimis, it may put a star or bench player in foul trouble, or it
1
2
18 games were missing and seven were dropped due to incomplete data.
I use the definition of a possession that is associated with the Points Per Possession statistic. Under this
definition, a team’s possession ends with a made basket or free throw, a turnover, or a defensive rebound or
out of bounds to the opposing team. This differs with the NBA’s definition of a team possession which “ends
when the defensive team gains possession or there is a field goal attempt which hits the rim.” [10]
2
MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
may possibly add points as a shooting foul. However, every single time an offensive foul or violation
is called it results in a turnover and loss of possession. This could be easily remedied on a conscious
or subconscious level by whistling an erroneous or questionable offensive foul or violation on the
opposing team, thus “evening things out.”
Researchers have noted [11] that time pressure and ambiguity are two conditions under
which implicit attitudes may arise. The best candidates for ambiguous calls are those involving the
most judgment. I classify offensive fouls, traveling violations, and 3 second violations as judgment
calls (JCs). In the sample, each respectively occurs on average 2.08, 1.28, and .36 times every 100
possessions. I classify 24 second violations, step out-of-bounds turnovers, bad pass turnovers, lost
ball turnovers, bad pass steals, and lost ball steals as non-judgment calls (NJCs). Each respectively
occurs on average .64, .28, 1.58, .94, 4.70, and 3.09 times every 100 possessions.3 These classifications
do not imply that NJCs involve a lack of judgment, just significantly less than JCs.4
3 Hypotheses and Method
Hypothesis 1 Make-up Calls: The probability of a judgment call turnover on one team
will increase following a recent judgment call turnover on the opposing team.
Hypothesis 2 Make-up Non-Calls: The probability of a judgment call turnover on one team
will decrease following a recent judgment call turnover on the same team.
The probability of a particular turnover is not solely dependent on referee behavior. It is
also affected by team characteristics such as the players, their motivation to play the opposing team,
and the coaching styles; game/possession characteristics such as the score differential, time remaining,
home/away possession, and if it is a playoff game; the current aggression level of the players; and the
current awareness of the players. I model the probability of a turnover in the current possession as
Pr (Y ) = f Y (team, ref ( X , Z ), pag ( X , Z ), pawY ( X , Z ))
where Y is an indicator variable equal to one if there is a particular turnover in the current possession
and zero otherwise, X is a vector of game/possession characteristics, Z is an indicator variable equal
to one if there was a particular turnover in the opponent’s previous possession and zero otherwise,
team is a vector of observable and unobservable team characteristics, ref is a referee behavior function,
pag is a player aggression function, and paw is a player awareness function. I control for team factors
using a regression model with offense-defense-season fixed effects. The fixed effects account for any
invariant characteristics among the two teams in a season; e.g., during possessions of the Lakers’
offense against the Warriors’ defense in 2010-11. The regression model is
yij = α i + zij β1 + xij′ β 2 + ε ij
3
4
(1)
These nine turnovers have the highest frequency of occurrence in the data. Prior research [2] has classified the
JCs as calls involving more referee discretion and five of the six NJCs as events involving less discretion. In
addition, whether true or not, Tim Donaghy used an erroneous traveling call as an example of a make-up call
situation. And, former NBA player and current TV analyst, Chris Webber, recently stated [12], “I'd just take
anything subjective out of the game, so the charge [an offensive foul] is something that there's not a definite rule
on…It has to be the same thing every time.” (emphasis and brackets added)
The data do not reveal the quality of individual calls.
3
MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
for i = 1…4,350 (30×29×5) offense-defense-season combinations and j = 1…Ji possessions. The
parameter of interest is β1, the marginal impact of a particular recent turnover by the opposing team
on the probability of a particular turnover by the current team. Interactions of Z and X were tested
but were mostly insignificant and did not pass the Likelihood Ratio test.
To identify the probable cause of β1, a few basic assumptions are needed to disentangle
changes in referee behavior from possible changes in player aggression or awareness.
Assumption 1 Referee behavior does not affect nor is affected by non-judgment calls
Assumption 2 Recent turnovers do not affect player awareness of different turnovers
Assumption 3 If there are no changes in player awareness from recent non-judgment calls,
then there are also no changes from recent judgment calls
Assumption 4 If there are no changes in player aggression from recent non-judgment calls,
then there are also no changes from recent judgment calls
By way of example, Assumption 1 implies that referee behavior does not affect nor is affected by 24
second violations or lost ball turnovers. Assumption 2 implies that a bad pass turnover should not
make players think about traveling. Assumption 3 implies that if a 24 second violation does not make
players more aware of 24 second violations, then a 3 second violation does not make players more
aware of 3 second violations. Assumption 4 implies that if stepping out of bounds does not make
players more aggressive with respect to charging, then traveling does not make players more
aggressive with respect to charging.
4 Results
Equation 1 was estimated using a fixed effects logit (FE Logit) and fixed effects linear
probability model (FELPM). The FE Logit avoids the possible nonsense probability problem but
does not allow for estimation of average partial effects of β1. Estimates of β1 alone do not have much
intuitive value for interested parties such as league and team personnel, the media, or fans. The
FELPM allows for estimation of average partial effects, but may lead to some nonsense estimates if a
probability is near zero or one. In what follows, I report sign and significance from the FE Logit and
the average partial effect (in percent change format) from the FELPM.5 Sign and significance results
for β1 were qualitatively similar for both models so this distinction is largely academic.
Hypothesis 1 (Make-Up Calls) is tested by examining the average percent change in the
probability of certain current turnovers (Y) associated with various turnovers in the opponent’s previous
possession (Z). These results are presented in Table 1. The upper-left panel shows the probability
changes of current JCs associated with various JCs in the previous possession. An offensive foul is
associated with a significant increase in probability of all three JCs in the next possession. Travelling
and 3 second violations have some positive, significant associations in the next possession and all JCs
have a positive, significant association with subsequent 3 second violations.6 Having controlled for
team factors and game/situational factors, these probability increases may be due to changes in
referee behavior or player behavior.
If changes in player aggression are affecting current turnovers, this should be apparent upon
examination of the upper-right panel of Table 1. This section shows the probability changes of
5
6
I make this distinction believing the FE Logit to be the preferred statistical model but the FELPM more useful
for interpreting the results.
The percent changes in probability of current NJCs associated with JCs in the previous possession are all
insignificant or negative and have been excluded due to space limitations.
4
MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
current JCs associated with various NJCs in the previous possession. These NJCs should not affect
referee behavior nor should they affect player awareness of JCs. 17 out of 18 of the statistics in this
area are negative or insignificant, supporting the notion that changes in player aggression are not
driving the earlier results.
If changes in player awareness are affecting current turnovers, this should be apparent upon
examination of the lower-right panel of Table 1. This section shows the probability changes of
current NJCs associated with the same NJC in the previous possession. These NJCs should not affect
referee behavior and changes in player aggression have been previously rejected. All statistics in this
area are insignificant, supporting the notion that changes in player awareness are not driving the initial
results.
Table 1: Percent Change in Probability of Current Turnovers Associated with
Various Turnovers in the Opponent’s Previous Possession
Previous Possession of Opposing Team (Z)
Judgment Calls (JC)
Offensive
Current Turnover (Y)
Foul
Non‐Judgment Calls (NJC)
3 Second 24 Second Step Out
Bad Pass
Lost Ball
Traveling Violation Violation of Bounds Turnover Turnover
Bad Pass
Lost Ball
Steal
Steal
Player Aggression
Offensive Foul
21.7% **
5.2%
11.5%
8.1%
‐1.0%
‐9.1%
11.3%
‐0.3%
1.6%
Traveling
16.4% **
66.5% **
30.2% **
49.2% **
23.6%
6.6%
3.7%
0.3%
8.1%
‐8.5% *
‐4.8%
10.0%
‐19.1%
‐11.8%
3 Second Violation
52.3% *
53.1% **
‐14.3%
‐5.3%
0.5%
‐2.2%
Player Awareness
Same variable as Z
21.7% **
30.2% **
52.3% *
‐25.0%
28.2%
1.1%
‐15.4%
Notes: Current turnovers (Y) are on the vertical axis and previous turnovers (Z) are on the horizontal axis. The second most
recent possession by the opposing team is used for both steal categories since these turnovers are often immediately followed
by fastbreaks. ** and * indicate significance at the 1 and 5 percent levels, respectively.
Hypothesis 2 (Make-Up Non-Calls) is tested by examining the average percent change in the
probability of certain current turnovers (Y) associated with various turnovers in the previous
possession of the same team (Z). These results are presented in Table 2. Examination of the upperleft panel reveals that the probability of a current JC decreases only when the same JC was whistled in
the team’s previous possession. Estimates in the upper-right panel are all weak and insignificant,
suggesting that changes in player aggression are not meaningful. Results thus far are consistent with
the idea that referees are less likely to whistle the same JC on the same team two possessions in a row.
Evidence from the lower-right panel strongly suggests that player awareness adjusts after
committing a turnover, making the team much less likely to commit the same turnover again in their
next possession.7 This holds for every single NJC and is therefore a very plausible explanation for the
observed JC results. Thus, one cannot disentangle the effects of possible changes in referee behavior
from changes in player awareness. The strong results in this panel do not support Hypothesis 2 and
suggest that changes in player awareness explain most, if not all, of the observed JC probability
declines.
7
Notice the two nonsense probability changes of -126.2% and -118.7%. Since 3 second violations and step out
of bounds are the most infrequent of the nine JC and NJC turnovers and their probability of occurrence is
sufficiently close to zero, the FELPM yields nonsense probabilities when there is a strong negative association
of Z to Y. This does not pose a concern because the numeric estimates are not important in this case. What
matters is the significant, negative estimate of β1, and both the FE Logit and FELPM models have this result.
5
MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
Table 2: Percent Change in Probability of Current Turnovers Associated with
Various Turnovers in the Previous Possession of the Same Team
Previous Possession of Same Team (Z)
Judgment Calls (JC)
Offensive
Current Turnover (Y)
Foul
Non‐Judgment Calls (NJC)
3 Second 24 Second Step Out
Bad Pass
Lost Ball
Traveling Violation Violation of Bounds Turnover Turnover
Bad Pass
Lost Ball
Steal
Steal
Player Aggression
Offensive Foul
Traveling
3 Second Violation
Same variable as Z
‐25.8% **
‐0.4%
‐0.2%
7.4%
‐4.9%
‐0.1%
11.5%
3.0%
1.7%
3.5%
‐34.2% **
2.0%
‐17.6%
‐7.8%
0.7%
‐4.6%
0.6%
‐1.4%
‐6.9%
‐6.0%
‐126.2% **
4.1%
‐4.9%
6.0%
‐13.8%
‐0.9%
‐9.2%
‐44.9% **
‐6.0% **
‐25.8% **
Player Awareness
‐34.2% ** ‐126.2% ** ‐58.1% ** ‐118.7% *
‐35.0% **
‐10.3% **
Notes: Current turnovers (Y) are on the vertical axis and previous turnovers (Z) are on the horizontal axis. ** and * indicate
significance at the 1 and 5 percent levels, respectively.
5 Summary and Conclusions
The mysterious make-up call is something that can invoke strong opinions and emotions
from any sports fan or anyone who has ever played organized sports. In this study, I find evidence
consistent with a make-up call hypothesis but not supportive of a make-up non-call hypothesis at the
highest level of professional basketball (the NBA). The most likely source for a make-up call appears
to be an offensive foul. After an offensive foul on one team, the likelihood of a judgment call in the
next possession of the opposing team increases by 16-66% depending on the type of call. Increases
of this magnitude lend credence to the position that the make-up call balancing effect is subconscious.
For example, a 3 second violation occurs approximately one time every 300 possessions. A conscious
attempt to increase scrutiny of this violation on the opposing team would likely increase the
probability of this event by substantially more than 66% because of its infrequency of occurrence. On
the other hand, the statistics in this paper use information on all turnover calls. Thus, they are a likely
lower bound to the probability increases estimated with knowledge of truly questionable calls.
Due to space limitations, a thorough examination of the time trend of subsequent probability
increases was excluded from this paper. In general, I find significant and long-lasting increases in the
probability of offensive foul calls and 3 second violations on one team following an offensive foul on
the opposing team. These increases can remain noteworthy for up to 10-15 possessions (a team has
23 possession in a typical quarter). Significant increases in the probability of traveling and 3 second
violations on one team following a traveling violation on the opposing team tend to disappear after 36 possessions. All probability increases on one team following a 3 second violation on the opposing
team are transitory. This appears to be consistent with a theory of subconscious make-up calls. A 3
second violation is often away from the ball, so the initial call may be less questionable with less
referee desire to subsequently balance. Traveling and offensive fouls occur mostly on or near the ball
where time and social pressures may be the greatest. In particular, when a block/charge event occurs,
everyone in the arena knows that something just happened, but there might be differences of opinion
as to what it was. The implicit desire to subsequently balance may be greater in these situations.
The subject of make-up calls in the NBA has important implications for not only employee
(referee) behavior and training, but for the reputation and perceived quality of a multibillion dollar
business enterprise. There are also important strategic implications for coaches and players. This
paper attempts to shed a little light on this frequently-suggested but seldom-examined topic.
6
MIT Sloan Sports Analytics Conference 2012
March 2-3, 2012, Boston, MA, USA
Acknowledgments
Excellent research assistance was provided by Matthew “MGM” Morgan. I am grateful for helpful
comments and suggestions from Mike Beauregard, seminar participants at Pepperdine University, and
conference participants at the Academy of Business Research.
References
[1] Associated Press, “AP and NBA Referee Reach Settlement in Lawsuit over Reporter’s Twitter
Message,” Washingtonpost.com, 7 Dec. 2011, Web, 14 Dec. 2011.
<http://www.washingtonpost.com/national/ap-and-nba-referee-reach-settlement-in-lawsuit-overreporters-twitter-message/2011/12/07/gIQAiXMqcO_story.html>
[2] Price et al., “Sub-Perfect Game: Profitable Biases of NBA Referees,” Working Paper, Oct. 2010.
[3] J. Price and J. Wolfers, “Racial Discrimination among NBA Referees,” The Quarterly Journal of
Economics, vol. 125, no. 4, p1859-1877, Nov. 2010.
[4] T. Moskowitz and L. Wertheim, Scorecasting: The Hidden Influences Behind How Sports Are
Played and Games Are Won, 1st Edition, New York, NY, 2011.
[5] H. Plessner and T. Betsch, “Sequential Effects in Important Referee Decisions: The Case of
Penalties in Soccer,” Journal of Sport and Exercise Psychology, vol. 23, no. 3, pp. 254-259, Sept. 2001.
[6] Damisch et al., “Olympic Medals as Fruits of Comparison? Assimilation and Contrast in
Sequential Performance Judgments,” Journal of Experimental Psychology: Applied, vol. 12, no. 3, pp. 166178, Sept. 2006.
[7] Lawrence Pedowitz “Report to the Board of Governors of the National Basketball
Association,” Wachtell, Lipton, Rosen & Katz, 2008.
[8] Tim Donaghy, Personal Foul: A First-Person Account of the Scandal that Rocked the NBA, 1st
Edition, Sarasota, FL, 2009.
[9] Mark Stein, “League Won't Immediately Release Probe's Findings,” Espn.com, 29 Jul. 2008, Web, 4
Jan. 2012. <http://sports.espn.go.com/nba/news/story?id=3509550>
[10] Official Rules 2010-2011, National Basketball Association, Aug. 2010.
[11] Bertrand et al., “Implicit Discrimination,” The American Economic Review, vol. 95, no. 2,
pp. 94-98, May 2005.
[12] “If I Was Commish,” Open Court, NBA TV, 21 Dec. 2011, Television.
7