The Generalist Bias: Estimating the Value of Three

The Generalist Bias: Estimating the Value of Three-Point Shooting in the National
Basketball Association
A Thesis
Presented to
The Established Interdisciplinary Committee for Mathematics and Economics
Reed College
In Partial Fulfillment
of the Requirements for the Degree
Bachelor of Arts
Torrey Payne
May 2014
Approved for the Committee
(Mathematics and Economics)
Jeffrey Parker and Albert Kim
Table of Contents
Chapter 1 ......................................................................................................................... 12 1.1 Introduction .......................................................................................................... 12 1.2 Literature Review ................................................................................................... 16 Chapter 2: Data and Models .......................................................................................... 23 2.1 Data ...................................................................................................................... 23 2.2 Models ................................................................................................................. 27 2.2.1 Basic Model ..................................................................................................... 27 2.2.2 Full Performance Model .................................................................................. 28 Chapter 3: Results and Discussion ................................................................................ 29 3.1.1 Basic Performance Model: Two-point Shooting, Three-point Shooting, and
standard performance statistics. .................................................................................... 29 3.1.2 Full Performance Model: Non-scoring Box Score Statistics & Advanced Player
Statistics ........................................................................................................................ 34 3.2 Discussion ............................................................................................................... 40 Conclusion ....................................................................................................................... 42 Appendix .......................................................................................................................... 43 Bibliography .................................................................................................................... 45 Abstract
My paper looks to investigate the effect of scoring on NBA real average salaries,
using observations of newly signed contracts and the previous-season’s performance
statistics. I use both box-score statistics and advanced statistics to analyze my data.
Results suggest that scoring from beyond the three-point line has a slightly larger impact
on wage than two-point scoring, but these results are not strongly confirmed. The true
impact of three-point shooting on salary is still unclear, but the evidence suggests it has at
least a mild impact.
12
Chapter 1
1.1 Introduction
This thesis investigates a field of economics that perhaps doesn’t receive as much
attention as its sports analytics cousin: sports economics. My thesis specifically
investigates the value of three-point shooting in the NBA labor market. The three-point
shot has only recently started to be fully utilized by basketball players around the world;
its introduction has fundamentally changed the pace of a basketball game, the spacing on
offensive and defensive plays, and created countless highlights and moments.
Fig. 1. The layout of a standard NBA basketball court (Britannica 2013).
The basic rules of basketball address the number of players, positions, scoring,
violations, and fouls. On the competitive levels, basketball teams are made up of 5
players on the court and 5 players sitting on the bench that can be used for substitution
13
during the whole period of the game. Each player is assigned a position on the court,
which is usually determined by the height of the player. The tallest player on the team
usually player “center”, also known as position 5, while shortest players play “guards” or
positions 1 and 2. The “forwards” are medium height and play positions 3 and 4 (FIBA
2014).
A player scores when he manages to throw, or “shoot”, the ball into the basket,
with the ball passing through the basket from above the hoop. Scoring a basket increases
the team’s score by 3, 2 or 1 point. If the player successfully shoots from outside of the
three-point line, the basket is worth 3 points, otherwise it is worth 2 points. A player can
also score one point when shooting from the free throw line after a personal foul,
technical foul, or other violation (FIBA 2014).
Perhaps most interesting in the economics of professional basketball is the impact
of the 3-point line. The 3-point line was first used in a professional sports league in 1961
by the American Basketball League, which folded after less than two years (Wood 2013).
In the NBA, the line is 22-feet from the rim in the corners, and 23 ft. 9 in. elsewhere. For
the WNBA and international play, the line is 20-feet 6 in. In the NCAA, the line is 20 ft.
9 in. or 19 ft. 9 in. for men’s and women’s basketball, respectively (women’s and high
school lines are the same).
14
Fig. 2. The distance of the three-point line in different basketball leagues and
competitions (Condotta
2008).
The American Basketball Association, or ABA, adopted the line in 1967 as part
of its experimentation with fan-friendly ideas. “‘We called it the home run, because the 3pointer was exactly that,’ George Mikan said, ‘it brought fans out of their seats’” (Wood
2013). The ABA and NBA merged in 1976, and in 1979 the NBA finally adopted the 3point line. After its implementation, a whole generation of basketball coaches had to
rethink their fundamental understanding of the game, since this line gave a new incentive
to long-range shooting. Hubie Brown, a former ABA and NBA coach, is noted for saying
in the book Loose Balls:
“Don’t give them the 25-footer, which is something players had been
conditioned to do all their lives. [And] as a coach, if you have a
shooter with range, you have to give him the freedom to take the 25footer, which is probably a philosophy that goes against what you
learned as a young coach—namely, pound the ball inside.” (Wood
2013)
15
Use of three point shooting has exploded in the past decade. Ignoring the threeyear period in which the NBA decided to shorten the 3-point line(it was restored in the
1997-98 regular season), 3-pt. shooting attempts steadily increased soon after the 199798 season. In 1992-93, not a single team attempted more than 1,100 three-pointers. In the
2012-13 season, “each and every team attempted more than 1,100 three-pointers” (Beer
2013). The New York Knicks set all-time records for most three-point attempts and
makes in a season shooting 891 of 2,371, followed by the Houston Rockets with 867 of
2369, 350 more than the entire league in 1979-80. Stephen Curry of the Golden State
Warriors set the NBA record for most individual three-point shots made with 272, 178
more than the entire 1982-83 San Antonio Spurs, the league leader in three-point shots
made for that season. Additionally, the top two seeds in each of the league conferences
(Miami Heat, New York Knicks, San Antonio Spurs, and Oklahoma City Thunder) all
finished in the top-five in three-point percentage last season. Two of those teams, the
Miami Heat and San Antonio Spurs, met in the NBA. Of the 16 teams to qualify for the
playoffs (8 in the Eastern Conference and 8 in the Western Conference), 11 were in the
top half of the league in 3-pt. shots made, and the top 5 all made the playoffs. For
attempts, 8 of the top 9 teams in 3 pointers attempted qualified for the playoffs. A similar
story is true for 3-pt. efficiency: the top 5 all qualified for playoffs, and 9 of the top 15
(Basketball-Reference 2013).
16
1.2 Literature Review
Scoring is one of the most emphasized statistics in basketball. Berri, Brook, and
Schmidt (2007) summarize the current economic literature on scoring by commenting,
“points scored dominates the evaluation of player productivity in the NBA… The only
factor consistently found to be correlated with player evaluation in the NBA is points
scored.” Their study uses the “standard approach” in the relevant sports economics
literature, following Becker(1971), by using the following model:
𝑌 = 𝛼0 + 𝛼1𝑋 + 𝛼 2R+ 𝜀i
where Y equals a decision variable such as salary, employment, or playing time, X equals
the measures of worker productivity(player characteristics and performance data, as well
as market variables), R is a dummy variable for a worker’s race, and ei is an error term.
The study references a survey done by Berri in 2006 that surveyed twelve studies
examining racial discrimination in the NBA, and each employed a model similar to the
above equation.
In 14 of the 15 models examined in Berri’s survey, points scored was found to be
both with the expected sign and statistically significant. Even though efficiency in
utilizing shot attempts would also be an indicator of a player’s worth to a team, field goal
percentage was not statistically significant in the majority of studies where it was
considered. “In other words, a player who scores points can expect to receive a higher
salary. Evidence that scoring needs to be achieved via efficient shooting is not quite as
clear… Given the ambiguous results uncovered with respect to everything else besides a
player’s points scored per game, these results suggest that a player interested in
maximizing salary, draft position, employment tenure, and playing time should primarily
focus upon taking as many shots as a coach allows.”
Berri, Brook, and Schmidt (2007) follow Jenkins (1996) by restricting the study
of salary in professional sports to recent free agents. Researchers often regressed current
17
salary upon current player statistics. The NBA often signs players to multi-year contracts,
for example in the 2002-2003 season where 70% of players were under contracts at least
three years in length. Therefore, it was argued that to determine the relationship between
productivity and salary one must consider information at the time the salary was
determined. Interestingly, all these models fail to account for 3-pt. shooting. The results
of regressions in the paper found that NBA Efficiency per game (an official NBA
statistic) explains 64% of player salary, Wins Produced has explanatory of 41%, and
when points scored per game is used as a sole measure it explains 59% of a player’s
average wage. When all the vectors of performance data were used, only scoring,
rebounds, and blocked shots statistically impact player compensation. In terms of
elasticity measures, a 10% increase in points scored per game increases average salary by
7.7%. A similar increase in rebounds only leads to a 4.8% increase in compensation. In
conclusion, the analysis shows that player evaluation in the NBA seems overly focused
on scoring.
Berri(1999) investigates how to measure productivity of an individual
participating in basketball. Berri creates a model that links the player’s statistics in the
NBA to team wins. This model is then employed in the measurement of each player’s
marginal product. He begins with a fixed-effects model, estimated using aggregate team
data from the 1994-95 through 1997-98 seasons:
The fi are team specific fixed effects. Using this model, Berri finds that total
points a team scores and surrenders in a season explains 95% of the variation in team
wins. Such findings suggest that how many points a team scores and surrenders per game
is a good approximation for team wins, hence the value of a player should simply be a
function of how many point he scores and allows the opponent to score per contest.
18
However, Berri acknowledges that scoring is determined by various factors that can be
quantified; a team’s scoring should be a function of how the team acquires the ball,
efficiency of ball handling, and ability to convert possessions into points.
Berri introduces two additional equations: Y2 is virtually identical to the previous
equation for wins, and Y3 represents the opponent’s scoring. The equations include all the
typical performance statistics. This model presents a basic theory of basketball, with the
primary determinants of offense and defense laid forth and then connected to wins. Berri
uses a factor, team tempo, which was not generally accounted for in previous academic
studies, but crucial in accurately measuring a player’s statistical output from the
philosophical view of a coaching staff. The number of shots a team takes on average per
game plays a significant role in determining how many opportunities the opponent will
have. A slower tempo implies the opponent with have less time of possession; given less
time the opponent will, ceteris paribus, score less. Controlling for rebounds and
turnovers further explain possessions and scoring opportunities. Since a team playing a
faster pace will have more opportunities, players from these teams will accumulate
greater numbers of statistics. Weighting tempo will mitigate such bias.
The results of the regressions show an interesting case for players in the 1996-97
season. According to the paper, Dennis Rodman actually outproduced Michael Jordan
due to his incredible rebounding abilities. The results also seemed to accurately weigh
win contribution. Differences between actual wins and predicted wins (when summing up
player wins contributed individually) are relatively small, and for over five teams this
difference is less than 1.
Goldman and Rao (2013) attempt to determine the right proportion of 2- and 3point shots to take. This paper is significant because it quantifies some of the in-game
impact of three-point shooting on a possession-by-possession basis. In their study, they
investigate optimal two-point and three-point shooting selection. As time remaining
decreases, the trailing team should place an increasingly positive value on risk, and the
opposite (a negative value on risk) for the leading team. Hence, a testable optimality
condition: 3-point success rate must fall relative to 2-point success rate when a team’s
preference for risk increases. This should be true since teams should be forcing more 3pt. shots to shorten the lead. For teams with a lead, as the gap in score decreases, the team
19
should become more risk-neutral. Their findings show that this condition only holds to
the trailing team. The leading team in fact is not efficiently allocating their shots and
hence score differentials become tighter than they should be. Their paper also shows that
if the offense shoots more 3’s as it becomes risk-loving this implies the attack can be
varied more readily than the defensive adjustment.
In their analysis, they exclude situations in which one team has less than a 5%
chance of winning (“garbage time”), end of quarter shots, and fast-break shots, as all
these situations tend to have very different strategies than a half-court offensive set. The
study revolves around a parameter, α, defined as:
The increases in win probability of adding 2 or 3 points to the team’s current
score are denoted WV2 and WV3 respectively. α defines the degree to which 3-pointer
win value diverges from 1.5 2-pointers. When α > 1.5, the win value of a 3-pointer
exceeds it’s nominal value. This occurs for the trailing team, especially late in the game.
The opposite is true for the leading team, where α < 1.5—here a 3-pointer is worth less
than usual since the team should be risk-averse. Using this parameter, as well as a
basketball analysis concept called a “usage curve”, Goldman and Rao (2013) create and
analyze an optimization problem centered on fraction of shots attempted as 3’s, with the
first-order condition that marginal returns to 2-pointers and 3-pointers should be equal.
The above graph gives a representation of the maximization problem.
Lutz (2012) attempts to redefine positions on the basketball court and observe the
contributions of types of basketball player through cluster analysis. This study helps to
clarify the role that a Three-pt. shooter has on a basketball team. Lutz uses data on games
played, minutes played per game, percent of made field goals that are assisted, assist rate,
turnover rate, offensive rebound rate, defensive rebound rate, steals per 40 minutes,
blocks per 40 minutes, and the number of shots attempted per 40 minutes at each of the
20
following locations: at the rim, from 3-9 feet, from 10-15 feet, from 16-23 feet, and
beyond the 3-point line. All the variables are standardized using z-scores in order to put
them all on the same scale and thus give equal weight to each variable. An Expectationmaximization algorithm for Gaussian mixture models is employed to do the clustering. A
Mclust function is used with the Bayesian Information Criterion to determine the
parameters of the model and how many clusters to use, which explain the 10 categories
the paper settles on. Fisher’s Linear Discriminant is utilized to place players into one of
the 10 clusters.
In the investigation, Lutz finds that players in the Durable Shooters cluster “are
most often members of winning organizations… 66.7% of these players are on a winning
team.” A typical member of the Durable Shooters cluster is Ray Allen, who is the alltime leader in 3-pt. field goals made in NBA history (NBA). These players, statistically,
can be differentiated from the other clusters by a high number of 3-pt. field goal attempts,
above average minutes played, above average steal rates, low rebound and turnover rates,
and much more games played than average (these are represented by the z-scores in
Table 3). In second was the Combo Guard cluster, containing players who attempt more
3s than average but mainly accumulate high assist and turnover rates, high steal rates, and
very high assist ratios (defined as assists - turnover ratio); 62% were on winning teams.
The next closest clusters, Defensive Bigs and Elite Bigs, have percentages of players on
winning teams 55% and 52%, respectively. Every other group had percentages under
50%. The lowest were the Big Bodies and Active Bigs (41% and 38% respectively).
Active Bigs tend to shoot a lot more, be more active in rebounding, and have good
rebound and block rates compared to Big Bodies, but both groups tend to miss more
games and play less minutes than average, games than average, have below average assist
ratio, above average rebound rates. Coming in last was the Ball Handlers cluster; these
players find themselves on winning teams 44% of the time.
When comparing the abundance of players in each cluster, Durable Shooters are
scarce and Ball Handlers are quite abundant (57 Durable Shooters and 172 Ball
Handlers).
When looking at the p-values of the percentages on winning teams, only 3 out of
10 clusters have p-value less than .05, and 4 out of 10 have p-value less than .1.
21
Lutz also considers groups of clusters and their “interaction effect” on point
differential. The Durable Shooters combination was found the most on winning teams
with a p-value of .007, and the top four pairs are combinations with Durable Shooters. Of
the top 10 3-way combinations found on winning teams, five contained a Durable
Shooters cluster, and all but one contained one cluster from Durable Shooters or Combo
Guards.
Michaelides (2010) stresses the importance of testing for unobserved
heterogeneity in analyzing basketball compensating differences. The results from his
investigation indicate that the quality of empirical results is distorted when important
measures of player skills are omitted from the specifications. Michaelides uses data on all
professional basketball players employed in the NBA between 1999 and 2003. The data
contains on-court performance (minutes played, points, rebounds., etc.), race, age, height,
place of birth, year entered the league, and draft pick number at the annual league draft.
The paper uses the classical hedonic wage equation of other studies (Berri 1999) that
includes all available measures of player productivity and team-specific characteristics
that capture employer heterogeneity.
A major contribution in his paper is the specification for firm heterogeneity. In the
context of professional basketball, this would include location amenities, quality of the
team’s coaching staff, and team success. Michaelides obtains measures that capture coach
and team quality through official reports of the NBA such as the Association of
Professional Basketball Research. To account for location amenities, he obtains weather
conditions from the National Climatic Data Center and the Meteorological Service of
Canada. Additional specifications were included to control for the salary structures of
rookie contracts and veteran contracts.
Finally, there is Wang and Murnighan (2001) paper on generalist bias. Their
paper investigates a tendency to reward and select people with general skills when
complementary, specialized skills are needed. The paper includes five studies to
investigate these effects. Their second study investigated compensation of NBA players,
comparing two-point scoring to three-point scoring. The study identifies three-point
shooters are specialists because they have special skills and they are typically not as
individually productive as players who are generalists with a wider variety of skills.
22
Three-point shooters’ long-range shooting abilities allow their contributions to
complement those of their teammates, who may be more overall skilled at scoring.
According to their study, two-point shooting accounted for 82.3% of team scoring in
2005 and 81.6% in 2006. On average, 21% of the players made 76% of their team’s
three-point shots in 2005, and 23% of the players made 75% of these shots in 2006. This
is all evidence that three-point shooting is a specialized skill and that most players on
most teams focus on two-point shooting, and teams may evaluate their players more on
the basis of their two-point scoring than their three-point scoring.
The study identified three-point shooters as guards whose three-point scoring
represented more than 20% of their overall scoring. Using a subset of 35 players from the
NBA player pool who were described with their definition of three-point shooter, they
found that three-point scoring was statistically insignificant, and two-point scoring was
significant with p < 0.01. Analyzing two-point shooters, only two-point scoring was
significant, with p<0.001. Their results suggested the bias is restricted to true specialists.
23
Chapter 2: Data and Models
2.1 Data
To examine the market value of various NBA performance statistics, data are
combined from numerous sources. The core performance data, which includes season
totals of the main NBA performance measures, such as points, rebounds, games played,
and minutes, come from BasketballReference.com, a subsidiary of SportsReference.com.
The salary data, which included contract length, contract amount, and final year of
contract, were collected from University of Michigan professor Rodney Fort. Each year’s
data were collected from news sources such as USAToday or basketball websites like
Hoopsdata.com.
The data contain relevant seasonal information on NBA players who signed with
an NBA franchise during the NBA free-agency/re-signing periods from the 2002-2003
NBA regular season and 2004-05 through 2007-08 seasons. This would include restricted
free agents and unrestricted free agents, and would exclude rookies. We only included
players who played at least 12 minutes per game, since playing fewer than 12 minutes a
game may lead to skewed performance statistics. The year 2004-05 is excluded from the
models due to restrictions with availability of the data. The data do not contain
observations of players who were released and signed with another team in the same
season, as this created an issue with allocating a particular set of performance statistics
with a certain team and salary. Hence, we only have the most recent year’s performance
information linked the contract that was signed after it. Some players may appear
multiple times in the data in different years, but not in the same year.
The salary measures are calculated according to 2008 U.S. dollars; the previous
years are inflated according to the CPI. Salary is calculated by dividing the total contract
amount by contract length. This method avoids the unbalanced contract structure that
many players agree to in their negotiations; players may have a back-weighted or front-
24
weighted salary structure that may otherwise appear to be correlated with year-to-year
performance.
The models used in my investigation take many core elements from Berri (2007);
the models will include additional independent variables and a longer timespan. The
dependent variable in my models is real (logged) average salary. This is due to the fact
that the NBA functions on a season-by-season basis, and different length contracts may
be determined by the age and durability of a player. Average salary ideally should
represent some portion of the expected marginal product that the player contributes to a
team in a particular upcoming season, regardless of whether the player is expected to do
this for multiple seasons or one. For this reason, age may not have as strong an effect on
the salary as much as on the contract length or structure; older players may perhaps just
sign shorter contracts but still get paid according to their expected marginal labor product
based off the previous season's performance.
The independent variables used in the data fall into relatively two categories: nonperformance variables, and performance variables. Non-performance variables include
contract year, team, and position. The year variable is to control for year-by-year changes
in the free agent market, whether it be by changes in the collective bargaining agreement,
the salary cap ceiling, or as a reflection of the scarcity for talented players in the free
agent market. The team categorical variable will control for any relevant factors that may
be due to a specific market location. The year variable indicates the regular season
immediately before the (year)-(year+1) contract was signed (2005 would mean indicate
player performance in the 2005-06 NBA regular season and the contract for the 2006-07
regular season and/or further seasons).
Table 1: Summary Statistics
Variable year g age per ts efg ftr par Obs 248 248 248 248 248 248 248 248 Mean 2002.665 62.1129 27.58871 13.65565 0.5170887 0.4745685 0.3225605 0.1843992 Std. Dev. Min Max 1.888835 2000 2007 21.2539 2 82 4.073241 18 38 4.018597 1.6 27 0.0549687 0.326 0.85 0.0569163 0.28 0.8 0.1590199 0 1.1 0.1875033 0 0.735 25
orb drb trb ast stl tov usg ortg drtg ws48 contractyrs contractamt mpg ppg rpg apg spg bpg topg fpg fgmpg fgapg ftpct fgpct tpmpg tpapg threeptpct salary lsal ftmpg twoptscoring threeptsco~g 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 245 248 248 248 210 248 248 248 248 248 5.880645 13.95847 9.918548 14.28427 1.672581 14.34476 18.08145 104.5323 104.9194 0.0915726 3.028226 2.16E+07 24.21865 8.993101 4.163952 2.166794 0.7736303 0.4952926 1.382882 2.226145 3.34372 7.581034 0.7421754 0.4422836 0.5015793 1.427805 0.2743201 4744826 14.86526 1.804082 5.68428 1.504738 4.10655 0 17.2 5.655695 3.1 33 4.596802 3.2 22.5 9.70982 0 49.3 0.7321245 0 5.3 4.08967 3.6 42.9 4.694011 6 32.9 9.066944 74 129 4.337198 94 115 0.0545501 -­‐0.115 0.248 1.937069 1 7 3.01E+07 231625.7 1.50E+08 8.047507 12 42.5 5.131943 1.405063 27.58537 2.47007 0.7058824 14.14286 1.800436 0 10.5122 0.450692 0 2.885246 0.6187575 0 3.317073 0.7126767 0.2 3.7 0.6709745 0.625 4.012821 1.872388 0.5189874 11.22857 4.10485 1.253165 23.65854 0.1013789 0.4328358 1 0.0584132 0.28 0.7 0.5973277 0 2.756098 1.581141 0 6.804878 0.1482258 0 0.6666667 4896657 231625.7 2.20E+07 1.032914 12.35288 16.90655 1.372363 0 6.95122 3.545835 1.037975 22.4 1.791983 0 8.268292 Ts: True Shooting percentage; gp: Games Played; per: Player Efficiency
Rating(Hollinger); efg: Effective Field-Goal Percentage; ftr: Free-Throw Rating; par; 3pt. Shooting Rating; ast-tov: rates for analogous performance statistics, based on overall
rates by team; ws: Win-Shares; ppg-drpg: per-game performance statistics; lsal: loggedsalary; tpmpg: Three-points made per game; tp12, tp23: Indicator variables for amount of
3-pt. shots made per game; ftpct: Free-Throws made percentage.
The performance variables include all major performance statistics used by the NBA in
their box scores, such as points per game, rebounds per game, and minutes per game, as
26
well as some advanced statistics either utilized by the NBA or constructed by sports
statisticians and economists. These statistics include Win Shares, Win Shares Per 48
minutes, Hollinger PER, True Shooting percentage, and Three-point shooting rate. Pergame rates are used instead of season totals because they are generally recognized as
better indicators of player performance; the 82-game season is long enough to where
even generally healthy players may miss a few games. The WS48(Win Shares per 48
minutes) and PER independent variables are significant additions to the data, since these
are considered to be statistics that evaluate overall player contribution. Also, the Threepoint shooting rate is significant since this perhaps captures how heavily a player depends
on the three-point line for scoring his field goals.
Twoptscoring and threeptscoring are two variables constructed specifically for the
models. Twoptscoring represents the average amount of points scored through two-point
fields goals per game. This was constructed using the following equation:
twoptscoring = 2*(fgm-tpm)/g
where fgm is the total field goals made in a season, and tpm is the total number of threepoint field goals in a season. Dividing by g gives us a per-game rate. Multiplying by 2
gives us the point value of the shot. Analogously, I constructed a threeptscoring variable:
threeptscoring = 3* tpm/g
These two variables, in addition to ftmpg, the amount of free-throws made per game,
account for the total points per game for each player. Therefore, these three variables
should be perfect instrumental variables for ppg if calculated correctly. Looking at the
means in the summary table, this is confirmed. By breaking up points per game, we can
more closely inspect the variation in scoring and better explain the variance.
where (alpha) represents the intercept term, and
represents the error terms in year t.
27
2.2 Models
2.2.1 Basic Model
A variation of the model will attempt to estimate three-point shooting
specialization using the threeptscoring performance variable using only basic
performance statistics. The model aims to quantify how much an increase of 1 point in
each of the scoring statistics increases real average salary. The Finally, I will rerun the
regression on the observations that fall into the Three-point shooting category to
investigate any effects within this specific subset. I will get estimates using only standard
performance explanatory variables, and also with the advanded performance statistics.
Table 1: Basic Model. Dependent variable is player’s average wage, regressed on some
performance explanatory variables.
Dependent Variable
Indep. Variable of Interest
Other Explanatory Variables
Logged Avg. Real Salary
Threeptscoring
Team
Twoptscoring
Minutes per game
Year
Position
According to the relevant literature, this model should explain much of the
variation in average salary. Of concern would be the potential collinearity between the
scoring variables and minutes per game; we would expect plays that play more minutes to
score more points, due to increased opportunities. However, if the results are robust to the
inclusion of mpg, then we can conclude that these statistics are appropriate explanatory
variables for describing variation in salary.
28
2.2.2 Full Performance Model
The Full Performance model will follow in the footsteps of the Basic Performance
model, but additionally include the major box score statistics, and the more advanced
performance statistics that do not appear on the box score. The final regression will
include a combination of the performance statistics. The expected result of including
these statistics is to more accurately control for the skill sets of these players, especially
the defensive impact.
Table 2: Harnessing all performance statistics.
Dependent Variable
Indep. Variable of Interest
Other Explanatory Variables
Logged Avg. Real Salary
Threeptscoring
Team
Twoptscoring
Minutes per game
Year
Position
Non-scoring Box Score Statistics
Advanced Player Statistics
29
Chapter 3: Results and Discussion
In this chapter, the regression results of the models presented in the previous
chapter will be presented and discussed.
3.1.1 Basic Performance Model: Two-point Shooting,
Three-point Shooting, and standard performance
statistics.
The basic model is useful because it utilizes variables that until recently were the
core explanatory variables for describing anything that happens on the basketball court.
These variables are presented in the box scores for every basketball game, and many
awards are based off these statistics. From the perspective of an NBA front office, these
variables are the first ones seen when investigating a player’s marginal labor product.
Table 3: Regression Results of Basic Model – Dependent Variable: Log Avg. Salary
twoptscoring
Threeptscorin
lsal
lsal
lsal
lsal
lsal
0.197
0.132
0.076
0.068
0.056
(0.014)**
(0.023)**
(0.028)**
(0.026)**
(0.026)*
0.046+
-0.011
0.080
0.040+
(0.027)+
(0.033)
(0.034)*
(0.034)+
0.206
0.168
0.175
0.171
(0.057)**
(0.054)**
(0.049)**
(0.059)**
0.038
0.037
0.043
g
ftmpg
mpg
30
(0.012)**
Center
PF
SF
SG
2bn.Tm
(0.011)**
(0.010)**
0.726
0.608
(0.138)**
(0.153)**
0.415
0.375
(0.141)**
(0.149)*
-0.108
0.004
(0.149)
(0.158)
-0.037
-0.034
(0.128)
(0.130)
-0.021
(0.309)
3.Tm
0.187
(0.310)
4.Tm
0.049
(0.326)
5.Tm
0.286
(0.253)
6.Tm
0.293
(0.320)
7.Tm
-0.020
(0.332)
8.Tm
0.350
(0.281)
9.Tm
0.084
(0.273)
31
10.Tm
0.395
(0.414)
11.Tm
0.312
(0.277)
12.Tm
0.354
(0.432)
13.Tm
-0.094
(0.319)
14.Tm
0.096
(0.421)
15.Tm
0.548
(0.548)
16.Tm
-0.004
(0.290)
17.Tm
0.199
(0.254)
18.Tm
0.099
(0.241)
19.Tm
-0.093
(0.400)
20.Tm
0.600
(0.332)
21.Tm
0.428
(0.395)
22.Tm
0.002
32
(0.267)
23.Tm
0.603
(0.250)*
24.Tm
0.186
(0.283)
25.Tm
0.450
(0.323)
26.Tm
0.185
(0.287)
27.Tm
-0.127
(0.290)
28.Tm
0.491
(0.321)
29.Tm
0.111
(0.246)
30.Tm
0.312
(0.260)
2001bn.year
-0.356
(0.257)
2002.year
-0.045
(0.139)
2004.year
0.245
(0.135)
2005.year
0.208
(0.178)
33
2006.year
-0.470
(0.379)
2007.year
0.517
(0.224)*
_cons
R2
N
13.744
13.676
13.226
12.959
12.691
(0.085)**
(0.090)**
(0.153)**
(0.166)**
(0.257)**
0.46
248
0.49
248
0.52
248
0.58
248
0.66
248
* p<0.05; ** p<0.01
The results from the basic model regressions corroborate some expected results
when it comes to scoring. Twoptscoring is positive and significant the 1% significance
level in four out of five regressions, and significant at the 5% level in all five. In the first
regression, a one-point scoring increase leads to a 20% increase in salary, and in the final
regression a 5% increase. Analogously, the R2 in the first model is 0.46 and in the last
model 0.66, a difference of 0.2. The coefficient on twoptscoring decreases in magnitude
by half when mpg (minutes per game) is included in the model. The correlation
coefficient between mpg and twoptscoring is 0.7860; this indicates twoptscoring is an
effective explanatory variable even when strong collinearity is present.
The twoptscoring coefficient was an expected result; PPG overall had a
significant coefficient in most of the regressions run by Berri
The coefficient on threeptscoring is not as significant as twoptscoring; in three
out of the four models that include threeptscoring is statistically significant at the 10%
level, but only significant at the 5% level in one of the three (p < 0.02). In the one model
where threeptscoring is not statistically significant, the sign is slightly negative. In an Ftest on the coefficients in the second model, the probability of twoptscoring and
threeptscoring being equal was approximately 1%; an analogous F-test on twoptscoring
and ftmpg had probability 0.34, and 0.02 for threeptscoring and ftmpg. The F-test on
whether all the coefficients were the same had probability under 1%.
34
Ftmpg had the largest coefficient of the scoring explanatory variables in all four
regressions and was also statistically significant at the 1% level. After controlling for
minutes per game, the coefficient did not change much in magnitude to variation in the
model. This is an unexpected result; free-throw shots are worth the least in terms of point
value in an NBA game. However, it is not uncommon for players to shoot free-throws at
a percentage as high as 80% or 90%. Also, the correlation coefficient between ftmpg and
twoptscoring is 0.8244, which is very strong. It is therefore likely that the coefficients for
both variables describe similar effects on salary, which is likely skill set(players who
shoot more free-throws are likely to be more talented on offense and draw more personal
fouls). This is corroborated by the fact that introducing both ftmpg and threeptshooting
into the model increased the R2 by .03, a 6% increase in explanatory power.
The position categorical variables had two statistically significant coefficients,
which were on the Power Forward and Center indicator variables. This result was robust
throughout all the variations of the model that included position; b3.pos1 was significant
at the 1% level, and 2.pos1 significant at the 5% level. These positions also represent the
positions with the tallest and largest players, and compromise 37.5% of the observations.
Adding these explanatory variables increased R2 by approximately 12% from model (3)
to model (4).
3.1.2 Full Performance Model: Non-scoring Box Score
Statistics & Advanced Player Statistics
Table 4: Regressions with all performance statistics.
twoptscoring
lsal
lsal
lsal
0.056
0.137
0.098
(0.028)
threeptscoring
0.067
(0.036)
(0.038)**
0.212
(0.052)**
(0.043)*
0.230
(0.067)**
35
ftmpg
0.160
(0.062)*
mpg
0.256
(0.096)**
0.217
(0.106)*
0.020
(0.016)
apg
rpg
0.100
-0.085
(0.065)
(0.101)
0.041
(0.040)
spg
-0.027
(0.140)
bpg
0.285
0.223
(0.105)**
topg
fpg
Center
Power Forward
Small Forward
Shooting Guard
2bn.Tm
(0.106)*
-0.084
0.188
(0.136)
(0.259)
0.060
0.102
(0.102)
(0.096)
0.423
0.443
0.312
(0.231)
(0.268)
(0.268)
0.304
0.295
0.256
(0.214)
(0.243)
(0.238)
0.050
0.052
0.029
(0.215)
(0.234)
(0.228)
0.087
0.023
0.049
(0.182)
(0.187)
(0.183)
0.027
0.135
0.171
36
3.Tm
4.Tm
5.Tm
6.Tm
7.Tm
8.Tm
9.Tm
10.Tm
11.Tm
12.Tm
13.Tm
14.Tm
(0.293)
(0.283)
(0.294)
0.191
0.121
0.227
(0.295)
(0.276)
(0.296)
-0.036
0.202
0.164
(0.309)
(0.298)
(0.299)
0.217
0.035
0.015
(0.258)
(0.257)
(0.283)
0.268
0.294
0.303
(0.327)
(0.318)
(0.329)
-0.156
-0.038
-0.159
(0.295)
(0.296)
(0.297)
0.259
0.390
0.398
(0.293)
(0.300)
(0.314)
0.055
0.002
0.043
(0.304)
(0.269)
(0.293)
0.386
0.299
0.329
(0.416)
(0.379)
(0.396)
0.239
0.223
0.217
(0.296)
(0.309)
(0.328)
0.224
0.102
0.054
(0.391)
(0.426)
(0.409)
-0.062
0.043
0.036
(0.330)
(0.323)
(0.335)
0.070
0.095
0.150
(0.440)
(0.436)
(0.468)
37
15.Tm
16.Tm
17.Tm
18.Tm
19.Tm
20.Tm
21.Tm
22.Tm
23.Tm
24.Tm
25.Tm
26.Tm
27.Tm
0.439
0.584
0.460
(0.521)
(0.535)
(0.521)
-0.097
-0.050
-0.076
(0.307)
(0.301)
(0.314)
0.210
0.201
0.272
(0.267)
(0.277)
(0.296)
-0.004
0.095
0.063
(0.267)
(0.240)
(0.266)
-0.073
0.008
0.060
(0.387)
(0.418)
(0.423)
0.588
0.545
0.549
(0.342)
(0.334)
(0.335)
0.392
0.398
0.408
(0.396)
(0.379)
(0.389)
0.029
0.004
0.036
(0.287)
(0.281)
(0.299)
0.512
0.584
0.559
(0.251)*
(0.258)*
(0.269)*
0.149
0.193
0.191
(0.305)
(0.285)
(0.292)
0.419
0.541
0.559
(0.329)
(0.342)
(0.340)
0.189
0.068
0.160
(0.326)
(0.317)
(0.327)
-0.205
-0.252
-0.293
38
28.Tm
29.Tm
30.Tm
2001bn.year
2002.year
2004.year
2005.year
2006.year
2007.year
(0.307)
(0.287)
(0.310)
0.442
0.527
0.511
(0.332)
(0.313)
(0.336)
0.024
0.046
0.024
(0.269)
(0.243)
(0.268)
0.333
0.399
0.437
(0.280)
(0.280)
(0.301)
-0.309
-0.289
-0.285
(0.262)
(0.263)
(0.267)
-0.038
-0.071
-0.081
(0.139)
(0.139)
(0.137)
0.210
0.363
0.317
(0.140)
(0.146)*
(0.148)*
0.182
0.412
0.351
(0.204)
(0.225)
(0.236)
-0.525
-0.355
-0.394
(0.384)
(0.384)
(0.383)
0.475
0.762
0.728
(0.228)*
ts
ftr
par
(0.234)**
(0.233)**
-0.373
-0.029
(1.069)
(1.118)
-0.228
-0.213
(0.591)
(0.589)
-1.125
-1.535
(0.567)*
(0.621)*
39
trb
-0.016
-0.021
(0.023)
(0.023)
0.013
0.028
(0.011)
(0.018)
-0.159
-0.127
(0.084)
(0.078)
0.005
-0.019
(0.014)
(0.023)
-0.056
-0.059
(0.015)**
(0.022)**
-0.062
-0.053
(0.018)**
(0.018)**
12.774
21.009
20.053
(0.294)**
(2.370)**
(2.334)**
ast
stl
tov
usg
drtg
_cons
R2
N
0.68
248
0.69
248
0.70
248
In the Full Performance Model, we see that twoptscoring, threeptscoring and
ftmpg are all statistically significant at the 1% level in two out of three regressions. The
first regression uses standard performance statistics that are reported in the box score for
every basketball game. When we include all these variables, and additionally control for
team-specific, position-specific, and year-specific effects on the data, we get an R2 of
0.68, only 0.02 higher than model (5) in the basic performance model. Furthermore, the
only statistically significant explanatory scoring variable is ftmpg, which is significant at
the 5% level. Due to the high correlation between mpg, ftmpg, and twoptscoring, it is
likely collinearity is disturbing the explanatory power of certain scoring variables.
40
Amongst the newly introduced explanatory variables, 3Par, drtg, and usg are
statistically significant at least at the 5% level in both models they are included in. 3Par
describes the amount of attempts a player from the 3-pt. line as a percentage of overall
shooting; an increase in this variable means a player relies more heavily on the 3-pt. line.
The coefficient on this variable is quite large, significant only at the 5% level, and has
negative magnitude. In conjunction with the threeptscoring variable, this coefficient is
somewhat confusing and counterintuitive.
Usg, usage rate, an estimate of the percentage of team plays run by a team
through a specific player, is intended to capture how heavily a team relies on a certain
player. One would expect to see this variable positively correlated with better skill sets
and more productive players, yet the coefficient is negative and statistically significant.
However, if we assume the other variables are capturing player skill, then usg may be
controlling for the statistical inflation a player receives by being the only good player (if
a team is not very talented, the most talented player may receive a heavier load of plays
and have inflated performance statistics).
Overall, the three models have relatively the same R2, and do not have much
stronger predictive power than the Basic Performance Model. The addition of nonscoring
performance explanatory variables barely aids in describing the variation in real average
salary.
3.2 Discussion
Inspecting the results of both models, the Basis Performance Model explains
roughly 70% of the variation in logged real average salary with relatively few
performance statistics. Controlling for team-specific, time-specific, and position-specific
effects, we can conclude from the model that all three forms of shooting have relatively
the same impact on salary. Testing for difference in coefficients, we cannot reject the null
hypothesis that the coefficients are different, but the model suggests that the scoring
variables are likely appropriate explanatory parameters for salary. Even though minutes
per game is highly correlated with scoring, our results were robust to the inclusion of mpg
41
when it came to two-point scoring and free-throw scoring; the effect of three-point
shooting was more mild but significant as well.
In terms of estimating the impact of three-point shooting on salary, we see a mild
but positive effect in both models. In the first model, this effect is weak, and in the
second, this effect is strong. This mildly supports the generalist bias assumption prevalent
in the literature. Three-point shooting specialization does have market value, and could
perhaps be slightly higher than for two-point shooting. This would be a sensible
conclusion to make since three-point shooting is more difficult physically, and more
valuable from a game theory perspective.
There should still be concern as to how strong this effect is. The advanced model
utilized explanatory variables that partially control for skill level and overall scoring
ability. Introducing the advanced performance variables, most notably 3Par, led to the
coefficient on threeptscoring to almost quadruple in magnitude. In this model, a player
who attempts 20% of his shots from beyond the three-point line would get a decrease of 0.2, or a 20% decrease in salary. If the player scores three points a game from the threepoint line(or only one three-point field goal per game), then this would increase his salary
by 79%, for an overall net increase of roughly 59% on his salary. Controlling for reliance
on the three-point line, this result would imply very high market value on three-point
shooting. This would also imply that we reject the generalist bias almost completely, and
instead conclude that instead two-point field goals are valued less than three-point
shooting; two-point shooting only has a strong effect on salary since it is the main scoring
mechanism for most players. This would also accurately reflect the relative scarcity of
elite three-point shooters. Finally, this result would also suggest that players who rely on
the three-point line but do not score much from there will be negatively evaluated for
their inefficiency; bad or ineffective scoring is not rewarded in the market.
Finally, it is important to note the explanatory power of the models were not
greatly ameliorated by inclusion of more performance variables beyond the scoring ones.
42
Conclusion
The purpose of my thesis was to determine whether three-point shooting was
valued differently than two-point shooting in NBA labor markets. Using a small
regression model, three-point shooting was found to have a statistically significant effect
on real average salaries for NBA players, and there is some evidence to suggest this
coefficient may be different than two-point scoring. When the model included minutes
per game, the results were much more mild; when 3Par was included, the effect was
exacerbated. In multiple regressions, the coefficient on three-pt scoring was higher than
two-point scoring. These results were not expected but do follow along with the expected
value of the two shots in a competitive basketball game.
There are issues with the results of the models when it comes to the estimates on
three-point shooting. For one, the sample size of 248 players is relatively small; a sample
size of over a decade would further improve the precision of the results. Also, it is
difficult to determine which variables are explaining the same underlying effects on the
court; perhaps all the variables just describe overall talent when it comes to the superstar
players, and since these players are on another pedigree they skew results that would
otherwise not be true for players that may be just average or below-average.
The model also could have benefited from better explanatory variables. Instead of
using age, it would have been better to have years of experience as a variable; players can
come into the league at any age over 18 or 19 and be at different levels of experience.
Finally, it would have been more interesting to include more recent years of
performance into the data. From the results presented, it is clear that utilization of the
three-point line has been on the increase, and it would be interesting to determine
whether the market value of three-point baskets has increased in recent years.
Overall, the results suggest that three-point shooting has a positive, signficnat
impact on salary. Three-point shooting is has at least as much impact on per-year wage
on a point-by-point scale. The fact that the coefficients are slightly larger on three-point
shooting suggest NBA front-offices correctly put slightly more value on the three-point
shot.
Appendix
Bibliography
Wang, Long, and J. Keith Murnighan. 2013. “The Generalist Bias.” Organizational
Behavior and Human Decision Making Processes 120:47-61.
Lazear, Edward P., and Sherwin Rosen. 1981. “Rank-Order Tournaments as Optimum
Labor Contracts.” Journal of Political Economy 89.5: 841. Print.
Lutz, Dwight. 2012. “A Cluster Analysis of NBA Players.” MIT Sloan Conference.
Berri, David J. 1999. “Who is ‘Most Valuable’? Measuring the Player’s Production of
Wins in the National Basketball Association.” Managerial and Decision
Economics 20.8: 411-27. JSTOR.
Wood, Ryan. 2013. “The History of the 3-Pointer.” IHoops. Youth USAB.
Goldman, Matthew, and Justin M. Rao. 2012. “Live by the Three, Die by the Three? The
Price of Risk in the NBA.” MIT Sloan Conference.
“NBA Player Salaries – National Basketball Association – ESPN.” 2013. ESPN.com.
Michaelides, M. “A Test of Compensating Differences: Evidence on the Importance of
Unobserved Heterogeneity.” Journal of Sports Economics 11.5: 475-95. Sagepub.
Condotta, Bob. 2008. “College Men’s New 3-point Line Sparks Debate.” College Sports.
Seattle Times.
Berri, David J., Stacey L. Brook, and Martin B. Schmidt. “Does One Simply Need to
Score to Score.” Internaitonal Journal of Sport Finance 2, no. 4: 190-205.
“Three-point Line: Basketball.” 2006. Image. Encyclopedia Brittanica – Kids
Encyclopedia.
Gaines, Cork. 2013. “Does The NBA Need To Move Back The Three-Point Line?”
Business Insider.
“NBA & ABA Basketball Statistics & History.” 2013. Basketball-Reference.com. Sports
Reference LLC.
46
“NBA.com, Official Site of the National Basketball Association.” 2013. NBA.com.