An Investigation of Three-point Shooting through an Analysis of NBA

AnInvestigationofThree-pointShootingthroughanAnalysisofNBAPlayerTrackingData
By
BradleyA.Sliz
ThesisProject
Submittedinpartialfulfillmentofthe
Requirementsforthedegreeof
MASTEROFSCIENCEINPREDICTIVEANALYTICS
December,2016
Dr.AliannaJeanAnnMaren,FirstReader
ThomasRobinson,SecondReader
2
Abstract
Inmythesis,Iaddressthedifficultchallengeofmeasuringtherelativeinfluenceofcompeting
basketballgamestrategies,andIapplymyanalysistoplaysresultinginthree-pointshots.Iuse
aglutofSportVUplayertrackingdatafromover600NBAgamestoderivecustomposition-based
featuresthatcapturetangiblegamestrategiesfromgame-playdata,suchasteamwork,player
matchups, and on-ball defender distances. Then, I demonstrate statistical methods for
measuringtherelativeimportanceofanygivenbasketballstrategy.Indoingso,Ihighlightthe
highimportanceofteamworkbasedstrategiesinaffectingthree-pointshotsuccess.Bycoupling
SportVUdatawithanadvancedvariableimportancealgorithmIamabletoextractmeaningful
resultsthatwouldhavebeenimpossibletoachieveeven3yearsago.
Further,Idemonstratehowplayer-trackingbasedfeaturescanbeusedtomeasurethethreepointshootingpropensityofplayers,andIshowhowthismeasurementcanidentifyeffective
shooters that are either highly-utilized or under-utilized. Altogether, my findings provide a
substantialbodyofworkforinfluencingbasketballstrategy,andformeasuringtheeffectiveness
ofbasketballplayers.
3
Acknowledgements
Firstly,Iwouldliketoexpressmysincereappreciationtomythesiscommittee,Dr.AliannaMaren
andThomasRobinson.Theirpatienceandsupportwasabeaconthathelpedguidemetothe
end.Thankyou!
LastlyIwouldliketoexpressmydeepestgratitudetoDr.RajivShah.Hisadvice,collaboration,
andexcitementprovidedmetheenergyneededtofinishmythesisamidthestressesoffamily
lifeandfulltimeemployment.Withouthisfriendshipandcooperation,myworkwouldhave
beenonlyashellofwhatitis.Thankyou!
4
TableofContents
Abstract.................................................................................................................................2
Acknowledgements................................................................................................................3
Introduction...........................................................................................................................5
Background............................................................................................................................7
ReviewoftheLiterature.........................................................................................................9
SportVU...........................................................................................................................................9
VariableImportance.......................................................................................................................13
Methods................................................................................................................................16
Make-MissModel..........................................................................................................................16
PlayerModel..................................................................................................................................17
Results..................................................................................................................................23
Make-MissModel..........................................................................................................................23
PlayerModel..................................................................................................................................25
Conclusions...........................................................................................................................32
References............................................................................................................................35
5
Introduction
Basketballisagameofathleticism,skill,positioning,andteamwork.Teamsthatoptimizeeach
ofthesefacetsoftheirgamecangenerallyexpecttobesuccessful.However,itisdifficultto
measurethedegreetowhichagivenstrategycaninfluencebasketballsuccess,becausethere
are many competing influencers (i.e. did a player make a shot because they were open, or
becausetheyareagoodshooter?),andbecausethereissomuchnoisemixedinwiththesignal
(i.e.evengreatthree-pointshootersonlymake40%oftheirshots).
Withtheadventofplayertrackingdata,ithasbecomepossibletoexploregamestrategiesina
newlight.Playertrackingdataenablesmeasurementsthatwerenotbeforemeasureablebeyond
subjectivesuppositionsandterseremarks.Infact,acrosssports,playertrackingisrevolutionizing
the sports-analytics movement with copious collections of fine-grained game observations,
enablinganassortmentof(literally)game-changinganalyses.Inbasketballresearch,muchwork
hasbeendonetoleverageplayertrackingdata,butlittleworkhasusedittoanalyzethree-point
shooting.
Inmythesis:
• Ianalyzeplayertrackingdatafromover600gamesfromthefirsthalfofthe2015-2016
NBAseason,tofindplaysresultinginthree-pointshots.
• Iderivecustomposition-basedfeaturesthatcapturetangiblegamestrategiesfromgameplaydata.
6
• I propose statistical methods for measuring the relative importance of any given
basketballstrategy.
• Idemonstratehowtheseposition-basedfeaturescanbeusedtomeasurethethree-point
shootingpropensityofplayers.
• Finally,Ishowhowthispropensitymetriccanidentifyeffectiveshootersthatareeither
highly-utilizedorunder-utilized.
7
Background
Between 2010 and 2013, the NBA equipped all of its arenas with motion capture cameras.
Throughoutthesubsequentbasketballseasons,positionaldatawerecollectedineveryregular
seasonandpostseasongame.Duringeachgame,thepositionsoftheballandeachplayeron
thecourtwererecordedatarateof25observationspersecond.Thisrichdatasethasenabled
researchers,analysts,andbasketballaficionadosaliketoexplorethegameofbasketballinways
thatwereneverbeforepossible.
YonggangNiu[2014]offersanexcellentdescriptiononthebackgroundofthetechnologythat
enables the collection of this data in their paper Application of the SportVU Motion Capture
SystemintheTechnicalStatisticsandAnalysisinBasketballGames.Thefollowingparaphrases
thediscussioninthatpaperontheSportVUtechnology:
TheSportVUsystem(Multi-lensTracingSystem)wasinventedin2005,byIsraeliscientist
MickeyTamir,andwasoriginallyintendedformissiletrackinginamilitarysetting.The
technologywasalsoshowntohavefunctionalapplicationsinsports.In2008,thesports
analyticsfirmSTATSacquiredtheSportVUtechnologyandfocuseditontheanalysisof
basketballgames.Today,thissystemhasbeeninstalledineveryNBAteams’homecourt
andhascapturedmotiondataforover1000professionalbasketballgames.
To date, this NBA SportVU data has already occupied an important position in the
academic world. The annual Sloan Sports Analytics Conference at the Massachusetts
8
InstituteofTechnologyisthetoptechnologyeventinthesportsworld.Amongthepapers
submittedtoSloanaboutbasketballlastyear,halfwerebasedonthedatacapturedfor
theNBAbytheSportVUsystem.
The SportVU system is run by STATS Data Corporation Limited. The ceiling of every
basketballgymnasiumintheNBAisequippedwith6camerasandeveryhalf-courthas3
cameras,allsynchronizedtoeachother.Collectively,thesecamerascaptureplayerand
ballmovements,andextractXYZlocationsrelativetothecourtatarateof25framesper
second.Furthermore,thesepositionaldataarecollectedwithaforeignkeythatcanbe
usedtojoinontoeachgame’sPlay-by-Playrecords.
ThisdecisionbytheNBAtoequipallofitsarenaswithSTATSSportVUsystemswaspivotalin
usheringinanewageofdatadrivenstrategytothegameofprofessionalbasketball.
9
ReviewoftheLiterature
SportVU
In his paper CourtVision: New Visual and Spatial Analytics for the NBA, Goldsberry [2012]
proposedtheuseofspatialanalyticaltechniquestoassessNBAplayer’sshootingabilities.His
workwasoneofanumberofeffortsbeginningtochallengebox-scoreanalyticsasthestatusquo
forbasketballperformanceassessment.Hesuggestedthatspatialanalysiswasvitaltothestudy
ofNBAbasketball,andthissuggestionhasonlybecomemoretrueinthepastfiveyears.Indeed,
hisworkhelpedpavethewayfortheNBAtobuyintocollectingplayertrackingdatawithSTATS
SportVU,whichspawnedaflurryofin-depthNBAspatialanalysesthatcontinuetocontribute
substantiallytothedomainofbasketballanalytics.
WiththeadventofSTATSSportVUtrackingdataintheNBA,basketballresearchershavebeen
abletoexplorein-gameinteractions,strategies,andplayerperformanceininnovativewaysthat
have not before been possible. Specifically, the granularity at which the SportVU data are
collectedenableaprecisionofmeasurementthatbeforewasnotpossibleinanalyzingthegame
ofbasketball.Indeed,inthefouryearssinceGoldsberry'sseminalwork,thefieldofbasketball
analyticshasbeenrevolutionizedbyanalyticswithSportVUtrackingdata.Ithasbeenleveraged
to inform all facets of the game, from team member selection, to team strategy, to player
development.Thefollowingaresomeexamplesofthisradicalre-envisioning:
•
Cervone et al. [2014] demonstrate that player-tracking data can be leveraged to
10
evaluateeverydecisionmadeduringabasketballgame,whetheritbetopass,dribble,
shoot,etc.Furthermore,theyshowthatbyapplyingtheirmodelingframeworktoevery
moment(25framespersecond)ofabasketballgame,amultitudeofnewmetricsand
analysesofbasketballbecomefeasible;theyoffersomeexamplesofthesenewmetrics
foransweringrealbasketballdecisions.
•
Inamorerecentpaper,Cervoneetal.[2016]expandontheirpreviousworktoshowhow
newpositional-basedmetricscanbeleveragedtoinfluencebasketballstrategy.They
useSportVUtrackingdatatoassessthevalueofthespatialregionsofthebasketballcourt.
Theyinferthevalueofcourtrealestatebasedonplayerandballmovementalone.Asin
theirpreviouswork,theydevelopnewmetricsforassessingbothoffensesanddefenses
attheplayerandteamlevels.
•
Maheswaranetal.[2014]showthatsimplebasketballstatisticssuchasreboundscanbe
observedinmuchmorecomplexwaysthansimplynumbersinabox-score.Theyuse
player tracking data to deconstruct rebounds into subcomponents that help to better
explain rebound events. They propose that a rebound can be considered from three
distinctdimensions:Positioning,HustleandConversion,andthatplayertrackingdatacan
enablereboundeventstobeobservedinthesecontexts.LikeCervone,theydemonstrate
howsportstrackingdatacanenablethecreationofnovelmetricsforevaluatingthegame
ofbasketball.
11
•
Luceyetal.[2014]useplayertrackingdatatoexplainhowshootersgetopen.First,they
confirmthenotionthaton-balldefensivepressurereducesshootingpercentages.Given
this,theyinvestigatehowanoffensecangetshootersopen.Theydemonstratethatthe
frequency of defensive role-swaps is predictive of open shots, and use this finding to
measureteams’defensiveeffectiveness.Furthermore,theydescribeamethodthatcan
beusedtoquerysimilarhistoricalplaysbyusingtrackingdataasthequeryinput.
Remarkably,thisisonlyasmallsampleoftheworkdonetodatethathasdemonstratedthevalue
ofSportVUdata.Morerecentresearchispushingitslimitsevenfarther,fromautomaticplay
categorization, to applications with neural networks, to the prediction of injuries before they
happen. Truly, the uses of SportVU data are bountiful. More significantly, SportVU data is
enablingsportsanalysesthatarebothuniqueandmeaningfultothegameofbasketball.Here
areafewexceptionalexamples:
•
McIntyreetal.[2016]proposethattheirworkcanbeconsumedasonecomponentofa
coachingassistancetoolforanalyzingplays.Theyuseplayertrackingdatatotraina
classifierthatlabelsballscreenplaysaccordingtocommondefensiveresponsestrategies:
Over,Under,Trap,andSwitch.
•
Wang and Zemel [2016] demonstrate how long short term memory (LSTM) recurrent
12
neuralnetworkscanconsumevoluminousamountsofthefine-grainedSportVUdatato
performanalysesandcomparisonsofbasketballplaysthatwouldnotbepossiblefora
humanobserveralone.Theyfocusontheclassificationofoffensiveplays.Theuseofan
LSTMallowstheirnetworktolearnthecomplexinteractionsbetweenalltheplayerson
thecourtastheyevolveoverthecourseofaplay.Furthermore,theyshowhowtheir
modelcanstillperformwellwhentrainedononeseasonandtestedonthenext.
•
Talukderetal.[2016]presentamodelthatusesSportVUplayertrackingdatatopredict
the likelihood that any given player will sustain an injury during the course of an
upcominggame.Theycombineplay-by-playgamedata,SportVUdata,playerworkload
andmeasurements,andteamschedulestotraintheirpredictivemodel.Theyarguethat
bycombiningtheirresultswithinformationonteamschedulesandrestdays,teamscan
identifythebesttimetoresttheirstarplayersandreducelong-terminjuryrisk.Thiswork
is significant because it demonstrates how player tracking data can impact the game
beyond just basketball strategy; it can be harnessed to manage player health, and, by
association,faninterestandrevenue.Furthermore,itcanbeusedbyfantasysportsfans
tomanagetheirowninvestmentrisks.
Insum,thereisasubstantialbodyofworkdevelopedinthelastfewyearsencompassingthe
analysisofbasketballwithNBASportVUtrackingdata.Becausepositioningissocentraltothe
game of basketball, Goldsberry’s [2012] suggestion is becoming more and more true: spatial
13
analysisisvitaltothestudyofthegame.ThefloodofdatacollectedduringgamesviaSportVUis
revolutionizingbasketballanalytics.Thisrevolutionischallengingcoreprinciplesofthegame
includinggamestrategy,performanceassessment,andteamandplayermanagement.Likewise
itasanexcitingtimetobeinvolvedinbasketballresearchbecauseeachnewinnovationopens
doorstomanynewanalysesandposesquestionsabouthowweunderstandthegame.
VariableImportance
Importantvariablemeasurementisakeycomponentofthiswork,soconsidersomebackground
onthistopic.Someofthemostcommonlyusedmachinelearningalgorithmssuchasrandom
forests and gradient boosting machines provide measures for predictor variable importance
alongwiththeirresultantmodels.Breiman[2001]discussesvariableimportanceinhisRandom
Forests paper. He describes how out-of-bag predictors are randomly permuted to measure
percentincreaseinmisclassificationrateforeachpredictorvariable,togiveastrongestimateof
variableimportanceforthegivenclassificationorregressiontask.Healsodescribeshowrandom
forestsarerobusttocollinearity,andcanimplicitlycapturevariableinteractionsintheirvariable
importancemeasurements.Sincetheirintroduction,randomforestshavebecomeastandard
method for measuring important variables. Given their strengths, random forests may be a
perfectvehicleforassessingbasketballstrategiesinmywork.
However, random forests do have some flaws in variable importance measurement. In their
paperBiasinrandomforestvariableimportancemeasures:Illustrations,sourcesandasolution,
Strobl et al. [2007] discuss how random forests are not reliable in situations where predictor
14
variablesvaryintheirscaleofmeasurementortheirnumberofcategories.Specifically,they
demonstrate that when random forest variable importance measures are used with data of
varying types, the results are misleading because suboptimal predictor variables may be
artificially preferred. They propose conditional inference forests as a strategy to counteract
thesebiases.
One downside to the conditional inference forests proposed by Strobl et al. [2007] is
computational inefficiency, so I consider an alternative method for my work. In their paper
Feature Selection with the Boruta Package, Kursa and Rudnicki [2010] describe how their
algorithmBorutacontrolsforthevariableimportancebiasesofarandomforest.Specifically,
they standardize importance measures to z scores, and intentionally include features in the
modelthatarerandombydesign;theseareknownas‘shadow’features.Ashadowfeature’s
Boruta importance score can be nonzero only due to random fluctuations. Thus the set of
importancescoresofshadowfeaturesisusedasareferencefordecidingwhichactualfeatures
are truly important. Effectively, anything that performs worse than these shadow features is
considerednobetterthanrandom.Further,theBorutaalgorithmimplementationisefficient
enoughthatdozensofiterationscanbeperformedonmydatatoassemblefeatureimportance
distributions,ratherthanmerelyscalarmeasurements.
Inmywork,IusetheBorutaalgorithmtomeasurevariableimportance,becauseitisamore
advanced (and more current) method which can overcome the deficiencies of random forest
variableimportancemeasurementformyproblem.Bycouplingthemostrecentinnovationin
15
basketballdata-gathering(SportVU),withanadvancedvariableimportancealgorithm(Boruta),I
amabletoextractmeaningfulresultsthatwouldhavebeenimpossibletoachieveeven3years
ago.
16
Methods
Make-MissModel
Asawhole,thisresearchinvestigatesthree-pointshotstrategiesintheNBA.Toaccomplishthis,
three-pointstrategyisinvestigatedfromtwodifferentframesofreference.First,three-pointers
arestudiedattheplaylevel,wherein-gamestrategiesandactionsarecomparedfortheirpower
atinfluencingthree-pointshotsuccess.Specifically,amodelistrainedtomeasureeachvariable’s
importanceininfluencingamakeormiss.Tovisualizethismake-missmodel,considerhoweach
basketballgameismadeupofmanyplays,andhoweachplayismadeupoftheactionsof5
playersfromeachteamandtheball,asdepictedinFigure1.
Figure1:Depictionofthemake-missmodelframeofreference
17
AsdepictedinFigure1,eachplayismadeupoftheactionsof5playersfromeachteamandthe
ball.Iusetheseplayerandballactionstoconstructcustomfeaturesthatcapturegamestrategy
such as teamwork, player matchups, and on-ball defender distances. These features are
aggregated into a single observation for each play, across all games. Likewise, the make-miss
modelisconstructedonthiscollectionofobservationsofmycustomfeaturesforeachplay.
Thestructureofabasketballgamelendsitselfperfectlytoaclassificationproblem,becauseevery
shottakenhasabinaryoutcome:amake,oramiss.Thisanalysisusestheplay(specificallythreepoint plays) as its unit of measurement, and seeks to quantify the relative value of different
offensivestrategiesatthatplaylevel.Likewise,thevariableimportancemeasuresreturnedby
theBorutaalgorithmareperfectvehiclesforquantifyingtherelativevaluesofplaystrategies.By
consideringthemake/missofathree-pointerasaclassificationproblem,Ifitamodeltopredict
theoutcomeofaplay,thencomparetheimportanceofthedependentvariables.
PlayerModel
Tobecompetitiveatmakingthree-pointshotsintheNBA,understandingtherelativestrengthof
variousgamestrategiesandactionsisastrongstart.However,three-pointshootingisaskill,and
onethatvariesgreatlyevenattheprofessionallevel.Likewise,itishighlyvaluabletoassess
three-pointshootingacrossplayers.
18
ThesecondframeofreferenceIusetoanalyzethree-pointshootingisattheplayerlevel,where
the same in-game strategies and actions measured in the make-miss model are collapsed to
comprehensivevaluesforeachplayer.Tovisualizetheplayermodel,considerhowineachgame,
agivenplayermaytakeathree-pointshotonmultipleplays.Foreachplayer,Iaggregateallof
theirthree-pointshootingplaysacrossallgames,asdepictedinFigure2.
Figure2:Depictionoftheplayermodelframeofreference
AsdepictedinFigure2,playerAshotthree-pointersonmultipleplays.Icollectthemake-miss
modelfeaturesforallthree-pointshootingplaysforplayerA,acrossallgames,andaggregate
them to form a single observation for player A. I do this aggregation for all players who
attemptedathree-pointshot.ThiscollectionofplayerobservationsformsthedataonwhichI
buildtheplayermodel.
19
Intheplayermodel,Iaggregatethemetricsderivedinthemake-missmodeltoeachshooterin
mydatasettoidentifytrendsinplayerusage.Byaggregatingthefeaturesdefinedinthemakemissmodel,Iamabletocapturecomprehensivemeasurementsofthemovementofplayersand
theirteamsontheirthree-pointshootingplays.Specifically,theplayermodelusesagradient
boosting machine regression algorithm to predict three-point attempts. By comparing the
model’spredictionforaplayer’sper-gamethree-pointattemptratetotheiractualthree-point
attemptrate,Icanidentifyplayerswhoarebehavinginunexpectedways.Iquantifyboththe
mosteffectiveshooters,andthemostunder-utilizedshooters.
Next, consider the modeling strategy I deployed for the player model problem. A typical
modelingframeworkmightincludeatraindataset,andatestdataset,suchthatthetrainsetis
usedtotrainthemodel,andthetestsetisusedtoevaluatethemodel’sperformanceonunseen
data.ThisarchitecturewouldlooksomethinglikeFigure3,wheretheorangeboxrepresents
trainingdatacontainingobservationsforplayers1throughn,andtheblueboxrepresentstesting
datacontainingobservationsforplayersmthroughz:
20
ModelTraining
Holdout
Figure3:Typicalmodelingdataframework
However,becauseeachplayerinmydatasetneedsaprediction,thismodelingmethodologywill
notsuffice.Instead,Ideployaniterativeleave-one-outmodelingapproachontopofmytraintestsplit.Whilethetestsetremainsasanunseenholdout,thetrainsetissplitfurther,suchthat
Itrainonemodelforeachplayerinthetrainset,asinthefollowingFigure:
ModelTraining
Holdout
Figure4:Iterativeleave-one-outmodelingdataframework
ThemodelingarchitecturedisplayedinFigure4allowsforeveryplayertobescoredonamodel
inwhichtheywerenotincludedfortraining.Thisisimportantbecauseitprotectstheplayer
scoresfrombeingover-biased,asinacasewherethemodelhasalready“seen”theplayeritis
21
scoring.Also,bymaintainingaholdouttestset,Icanevaluatetheperformanceofeveryplayer’s
modelandassessmodelconsistencyacrosstheplayers;andbecauseeachplayermodel’straining
setonlydiffersbyoneobservation,wecanexpectconsistentmodelperformance.
Next, consider the means by which players can be assessed based on the outputs of their
respectiveplayermodels.AsIdescribedabove,Ifirstfindthedeviationbetweenthemodel’s
predictionforaplayer’sper-gamethree-pointattemptrateandtheiractualthree-pointattempt
rate.Inasense,Iusetheerrortermoftheregressionmodeltoidentifyplayerswhoarebehaving
inunexpectedways.Specifically,Imeasureplayermodeldeviationlikethis:
𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑎𝑐𝑡𝑢𝑎𝑙3𝑃𝐴 − (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑3𝑃𝐴)
Intheaboveequation,deviationisdefinedasthedifferencebetweenaplayer’sactualthreepointattemptrate,andtheirmodel-predictedthree-pointattemptrate.Thisdeviationalonecan
identifyplayerswhoshootthree-pointersmoreorlessfrequentlythanotherplayerswithlikeingameexperiences.However,asmentionedbefore,three-pointsuccessishighlydependenton
player skill. Likewise, I propose a new metric for measuring a given player’s three-point
propensity, by applying a penalty on deviation according to the player’s three-point shooting
percentage.Specifically,Imeasurepropensitylikethis:
𝑃𝑟𝑜𝑝𝑒𝑛𝑠𝑖𝑡𝑦 = 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛(3𝑃%): In the above equation, propensity is defined as a player’s deviation times their three-point
shootingpercentagecubed.Bycubingthree-pointshootingpercentage,Iensurethattheworst
22
shooters receive a large compounded penalty, while the best shooters receive the smallest
penalty.Whenplayersareorderedbytheirpropensity,thosewiththehighestscoresareboth
effectiveandhighlyutilized,whileplayerswiththemostnegativescoresaretheleastutilized,
thoughstillveryeffective.
23
Results
Make-MissModel
First,considertheresultsofthemake-missmodel.Recallthatthemake-missmodelwastrained
to measure the relative importance of each feature in predicting a made shot. Figure 5
summarizes the returned Boruta importance scores for each feature relative to each other.
Figure5:Borutafeatureimportancedistributionsforthemake-missmodel
AsdepictedinFigure5,theboxesfortheshadowvariablesonthefarleft-handsideofthefigure
representtheBorutaimportancescoresforrandomlypermutedvariables.Becauseeachshadow
feature represents the distribution of importance scores for random source feature
permutations,wecaninferthateachofourfeaturesisatleastmorepredictivethanrandom.
24
OtherkeytakeawaysfromFigurefivearethatteamworkmetrics(e.g.,offensiveconvexhulland
ballmovement)aregenerallymorepredictiveofsuccessthanplayermatchups.Unsurprisingly,
some of the strongest predictors capture the distance between the shooter and the nearest
defender.
Therestoftheseresultscanbeeasytoglossover,soIwilldescribesomeofthemorenuanced
findingshere,andprovidecontext.First,ofallthemetricstested,theonethatismostpredictive
ofamakeormissistheaverage(median)distancebetweentheshooterandtheclosestdefender
overthecourseofthatplay.Thisresultisexpected:itiseasierforaplayertoshootwhenthey
areopen,anditismoredifficulttoshootwhentheyarebeingdefendedclosely.Inasense,this
findingprovidesasanitycheckontherestofthefindingsinthisanalysis.
Next,IwanttojumpdownthelistalittletopointoutPLAYER1_ID.Thisfeaturerepresentsthe
identity of the shooter. Consider what that means in this context. The identity of a player
essentially captures the difference in player skill and efficiency in one feature. Its relative
importancetellsushowsignificantitisthatagoodplayerisshootingvs.abadone.Furthermore,
itspositiononthelistofimportantfeaturesisverynoteworthybecausetherearemanyfeatures
aheadofit.Thissuggeststhatmanyfeatures,suchasballmovement,andshottiming,aremore
predictiveofthree-pointsuccessintheNBAthantheshooter’sskill.
Next,considerthevariousfeaturesthatcaptureshooter-defendermatchups.Manyoffensive
gamestrategiesinvolveluringthedefenseintopersonnelmismatches,throughscreensetting,
orothermeans.Forexample,itisgenerallyacceptedthatbigplayerscanover-powersmaller
25
defenders on post-up plays. However, it is not as well understood how mismatches can be
exploited on three-point shots. According to these results, the difference in height, weight,
experience, and position between a shooter and his nearest defender all have relatively low
poweratpredictingthree-pointshotoutcomeswhencomparedtostrategiesthatinvolveball
movement,courtspacing,shottiming.
PlayerModel
Next,considertheresultsoftheplayermodel.Recalltheleave-one-outmodelingarchitecture
thatwasdeployedforscoringplayersintheplayermodel,andconsiderthedistributionofmodel
performance observed on each of the player models. Below, I plot a histogram of model
performanceintermsofR2andRMSE(rootmeansquarederror)forthetestsetscoredoneach
oftheplayermodels.
26
Figure6:HistogramsofRMSEandRsquaredacrossallplayermodels
In the histograms depicted in Figure 6, we can see that the distribution of player model
performance is approximately normal for both RMSE and R2. The narrow shape of each
distributionsuggestsstablemodelperformanceacrossplayers.Furthermore,wecanobserve
thatthemodelsdisplayareasonablevariance;meanR2isaround0.46,withmaximumaround
0.55, and minimum around 0.39. These results should offer confidence in the stability of
performanceacrossplayermodels.
Next,recallthattheplayermodelaggregatesthefeaturesderivedinthemake-missmodelto
each player in the dataset for a comprehensive measurement of player and team movement
duringthree-pointshots.Theplayermodelusestheseaggregatefeaturestoinfereachplayer’s
27
per-gamethree-pointattemptrate.Bycomparingaplayer’smodel-inferredthree-pointattempt
ratetotheiractualthree-pointattemptrate,wecanobserveplayerswhobehaveinuniqueways
intermsoftheirthree-pointshooting.Specifically,thiscomparisonallowsustodeduceifagiven
playershootsmorefrequentlyorlessfrequentlythanwouldbeexpectedofanotherplayerin
theirsituation.Considerfirstplayerswhoshotmorethreespergamethanexpected:
Figure7:Playerswhoshotmorethreesthantheirmodelexpected,coloredbytheirrespectivethree-pointshootingpercentage
InFigure7,thesizeofthebarassociatedwitheachplayercorrespondstothedeviationoftheir
actual three-point attempt rate from their model-expected three-point attempt rate (more
28
three-point attempts than expected). The color of each bar offers context by conveying the
three-pointshootingpercentageofthecorrespondingplayer.TheplayersshowninFigure7are
the top ten positive deviators from their model’s projection. We can see that Stephen Curry
averaged 5.7 more three-point attempts per game than expected and is also a very efficient
three-pointshooter.Giventhehighefficiencyofthetwo-timemostvaluableplayer,heshould
beawelcomeoutlier.
Conversely,weseethatKobeBryantaveraged3morethree-pointattemptspergamethanhis
modelexpected,butwasaveryinefficientthree-pointshooter.Knowinghisspecificsituationis
revealing;2015-16wasthefinalseasonofBryant’slongandstoriedcareer.Thoughtheseresults
suggesthewasforcingupmanymorethree-pointersthanotherplayersinhispositionwould
have, his team presumably put up with such inefficient performance in honor of his final
professionalseason,andtogivetheirfansafinalglimpseofhiminaction.Next,considerplayers
whoshotfewerthreespergamethanexpected.
29
Figure8:Playerswhoshotfewerthreesthantheirmodelexpected,coloredbytheirrespectivethree-pointshootingpercentage
InFigure8,weseethetoptennegativedeviatorsfromtheirmodel’sprojection.Wecanobserve
that Karl-Anthony Towns averages 2 fewer three-point attempts than expected. Given the
relativelyhighefficiencywithwhichheshootsthethreefromthecenterposition,itwouldbea
promisingstrategytostretchhimouttothethree-pointlinemoreoften.Conversely,thoughmy
modelprojectsAnthonyDavistoshoot2.8morethreespergamethanhereallydid,hismediocre
shootingpercentagedoesnotwarrantagamestrategywherehetakestoomanymorethreepointshots.
30
TheresultsillustratedinFigures7and8anddiscussedaboveareverytelling.Theyhighlight
effective and ineffective shooters in the context of how other players would perform in their
situation. However, they do not convey the whole story. As demonstrated in this discussion,
thereisameaningfulrelationshipbetweenaplayer’sdeviationfrommodel-expectedthree-point
attemptsandtheirthree-pointshootingpercentage.Likewise,Idefinedthepropensitymetric
formeasuringthisrelationship.Recallthatwhenplayersareorderedbytheirpropensity,those
withthehighestscoresarebotheffectiveandhighlyutilized;theseplayersconsistentlymake
shotsthattheirpeerswouldnot.Conversely,playerswiththemostnegativepropensityscores
areveryeffectiveshooterswhoareunder-utilized;theyrepresentplayerswiththemostmissed
opportunities;despitebeingeffectiveshooters,theyrefrainfromshootingmoreoftenthantheir
peerswouldinsimilarsituations.Considertheplayerswiththestrongestpropensityfromeach
ofthesetwogroups(effectivehigh-utilizationandeffectivelow-utilization),aslistedinFigure9.
31
Effective,High-utilization
Player
Propensity
StephenCurry
KlayThompson
DamianLillard
WesleyMatthews
JamesHarden
HollisThompson
PaulGeorge
KyleLowry
J.R.Smith
IsaiahCanaan
0.5250
0.1984
0.1881
0.1310
0.1206
0.1156
0.1119
0.1075
0.1008
0.0989
…
JeffTeague
-0.0939
TroyDaniels
-0.0974
ChrisPaul
-0.0977
DeronWilliams
-0.1005
IanClark
-0.1005
KawhiLeonard
-0.1023
Karl-AnthonyTowns
-0.1117
TyrekeEvans
-0.1179
JrueHoliday
-0.1290
LuisScola
-0.1468
Effective,LowUtilization
Figure9:Three-pointshooters,orderedandcoloredbytheirthree-pointshootingpropensity
The most effective, highly utilized players are observed in the first table of Figure 9. The list
includesmanyhouseholdnames,suchasthehistoricallygreatshooterandMVPStephenCurry,
histeammateKlayThompsan,aswellasDamianLillard,JamesHarden,andPaulGeorge.These
players’labelasgreatshooterswillbenosurprisetoNBAfans.However,whenassessedbythe
same standards, several other lesser-heralded shooters rank highly; Wesley Matthews, Hollis
Thompsan,andIsaiahCanaanareallwellregardedshooters,butrarelyhavetheirthree-point
32
shootingprowesscomparedtothesuperstarscitedabove.
Similarly, the second table of Figure 9 lists the most effective and under-utilized three-point
shooters.Again,thislistisofparticularinterestbecauseitcallstolightplayerswhocouldexpect
to be successful if they shoot more three-pointers. As before, we see the rookie-of-the-year
centerKarl-AnthonyTownswithastrongrankingbythismetric.Inshort,thisisasignificantlist
becausetheseplayershaveunlockedpotentialintermsofthree-pointshooting.Knowingthis,
teams can adjust game strategy around these players, or target under-the-radar players for
sneakytalentacquisition.
33
Conclusions
Inmythesis,Imeasuretherelativeinfluenceofcompetingbasketballgamestrategies,andIapply
myanalysistoplaysresultinginthree-pointshots.IuseSportVUplayertrackingdatafromNBA
games to derive custom position-based features that capture tangible game strategies from
game-playdata.Then,Idemonstratestatisticalmethodsformeasuringtherelativeimportance
ofanygivenbasketballstrategy.Indoingso,Ihighlightthehighimportanceofteamworkbased
strategies in affecting three-point shot success. By coupling the most recent innovation in
basketballdata-gathering(SportVU),withanadvancedvariableimportancealgorithm(Boruta),I
amabletoextractmeaningfulresultsthatwerenotfeasibleeven3yearsago.Furthermore,I
demonstrate how player-tracking based features can be used to measure the three-point
shootingpropensityofplayers,andIshowhowthismeasurementcanidentifyeffectiveshooters
thatareeitherhighly-utilizedorunder-utilized.Altogether,thesefindingsprovideasubstantial
body of work for influencing basketball strategy, and for measuring the quality of basketball
players.
Thoughthree-pointshootingwasthefocusofmyresearch,thatchoicewasanarbitraryoneto
narrowmyscope.ThemethodsIdemonstrateinmyresearchcanbeappliedtoanumberof
game targets as long as they can be measured (i.e. 2-point shooting, pick-and rolls, team
rebounding,defense,etc.).Similarly,thefeaturesthatIdefineinthemake-missmodelwerealso
onlyarbitraryselectionsbasedonquantifiablegamestrategies;anygamestrategycanbetested
inthisframeworkaslongasitcanbemeasured.
34
Intheplayermodel,Iconstructahighlymeaningfulmodelthatwastrainedonlyonthefeatures
definedforthemake-missmodel.However,thesefeaturesarelimitedintheirabilitytocapture
relevant game-play information, and their explicit definitions are not relevant for the player
model’sutilization.Likewise,amoreencompassingapproachtotrainingaplayermodelwould
bebasedonaneuralnetworkstylearchitecture.Thebenefitofaneuralnetworkinthissituation
isthatitcantaketherawplayertrackingdataasinputs,andautomaticallylearntherelevant
featuresandinteractionsforagiventarget(i.e.three-pointshooting).Onecouldthusexpecta
neuralnetworkstylemodeltoachieveevenbetterperformancethanthemodelIdemonstrate
inthisresearch.Moreover,asdiscussedintheliteraturereview,neuralnetworkshavealready
beensuccessfullydemonstratedforuse-casesontheNBAplayertrackingdata.
Inclose,myworkpushestheenvelopeforanalyzingbasketballstrategy,andformeasuringthe
qualityofbasketballplayers.Untilrecently,theanalysesdemonstratedinthispaperwerenot
evenfeasible.Theywereonlymadepossiblewiththeavailabilityofplayertrackingdataandwith
thelatestadvancesinstatisticallearning.Muchisstillyettobedonetoadvancebothmywork
andthefieldofbasketballanalyticsasawhole.SportVUdatahasopenedmanynewdoorsfor
basketballanalytics,andeachnewanalysissnowballsmanymorequestionsaboutourperception
ofthegame.
35
References
Breiman,L.(2001).Randomforests.MachLearn,45(1),5-32.
DanCervone,LukeBornn,KirkGoldsberry(2016).NBACourtRealty,MITSloanSportsAnalytics
Conference.
DanCervone,AlexanderD’Amour,LukeBornn,KirkGoldsberry(2014).POINTWISE:Predicting
Points and Valuing Decisions in Real Time with NBA Optical Tracking Data, MIT Sloan Sports
AnalyticsConference.
J.H.Friedman(2001).GreedyFunctionApproximation:AGradientBoostingMachine,Annalsof
Statistics29(5):1189-1232.
Kirk Goldsberry (2012). CourtVision: New Visual and Spatial Analytics for the NBA, MIT Sloan
SportsAnalyticsConference.
MironB.Kursa,WitoldR.Rudnicki(2010).FeatureSelectionwiththeBorutaPackage.Journalof
StatisticalSoftware,36(11),p.1-13.URL:http://www.jstatsoft.org/v36/i11/.
PatrickLucey,AlinaBialkowski,PeterCarr,YisongYueandIainMatthews(2014).“HowtoGetan
Open Shot”: Analyzing Team Movement in Basketball using Tracking Data, MIT Sloan Sports
AnalyticsConference.
Rajiv Maheswaran, Yu-Han Chang, Jeff Su, Sheldon Kwok, Tal Levy, Adam Wexler, Noel
Hollingsworth (2014). The Three Dimensions of Rebounding, MIT Sloan Sports Analytics
Conference.
AveryMcIntyre,JoelBrooks,JohnGuttag,andJennaWiens(2016).RecognizingandAnalyzing
BallScreenDefenseintheNBA,MITSloanSportsAnalyticsConference.
RCoreTeam(2015).R:Alanguageandenvironmentforstatisticalcomputing.RFoundationfor
StatisticalComputing,Vienna,Austria.URLhttps://www.R-project.org/.
36
CarolinStrobl,Anne-LaureBoulesteix,AchimZeileis,TorstenHothorn(2007).Biasinrandom
forestvariableimportancemeasures:Illustrations,sourcesandasolution,BMCBioinformatics.
HishamTalukder,ThomasVincent,GeoffFoster,CamdenHu,JuanHuerta,AparnaKumar,Mark
Malazarte,DiegoSaldana,ShawnSimpson(2016).Preventingin-gameinjuriesforNBAplayers,
MITSloanSportsAnalyticsConference.
Kuan-ChiehWang,RichardZemel(2016).ClassifyingNBAOffensivePlaysUsingNeural
Networks,MITSloanSportsAnalyticsConference.
YonggangNiu,HaojieHuang,HuanbinZhao(2014).ApplicationoftheSportVUMotionCapture
SystemintheTechnicalStatisticsandAnalysisinBasketballGames,AsianSportsScience.