AnInvestigationofThree-pointShootingthroughanAnalysisofNBAPlayerTrackingData By BradleyA.Sliz ThesisProject Submittedinpartialfulfillmentofthe Requirementsforthedegreeof MASTEROFSCIENCEINPREDICTIVEANALYTICS December,2016 Dr.AliannaJeanAnnMaren,FirstReader ThomasRobinson,SecondReader 2 Abstract Inmythesis,Iaddressthedifficultchallengeofmeasuringtherelativeinfluenceofcompeting basketballgamestrategies,andIapplymyanalysistoplaysresultinginthree-pointshots.Iuse aglutofSportVUplayertrackingdatafromover600NBAgamestoderivecustomposition-based featuresthatcapturetangiblegamestrategiesfromgame-playdata,suchasteamwork,player matchups, and on-ball defender distances. Then, I demonstrate statistical methods for measuringtherelativeimportanceofanygivenbasketballstrategy.Indoingso,Ihighlightthe highimportanceofteamworkbasedstrategiesinaffectingthree-pointshotsuccess.Bycoupling SportVUdatawithanadvancedvariableimportancealgorithmIamabletoextractmeaningful resultsthatwouldhavebeenimpossibletoachieveeven3yearsago. Further,Idemonstratehowplayer-trackingbasedfeaturescanbeusedtomeasurethethreepointshootingpropensityofplayers,andIshowhowthismeasurementcanidentifyeffective shooters that are either highly-utilized or under-utilized. Altogether, my findings provide a substantialbodyofworkforinfluencingbasketballstrategy,andformeasuringtheeffectiveness ofbasketballplayers. 3 Acknowledgements Firstly,Iwouldliketoexpressmysincereappreciationtomythesiscommittee,Dr.AliannaMaren andThomasRobinson.Theirpatienceandsupportwasabeaconthathelpedguidemetothe end.Thankyou! LastlyIwouldliketoexpressmydeepestgratitudetoDr.RajivShah.Hisadvice,collaboration, andexcitementprovidedmetheenergyneededtofinishmythesisamidthestressesoffamily lifeandfulltimeemployment.Withouthisfriendshipandcooperation,myworkwouldhave beenonlyashellofwhatitis.Thankyou! 4 TableofContents Abstract.................................................................................................................................2 Acknowledgements................................................................................................................3 Introduction...........................................................................................................................5 Background............................................................................................................................7 ReviewoftheLiterature.........................................................................................................9 SportVU...........................................................................................................................................9 VariableImportance.......................................................................................................................13 Methods................................................................................................................................16 Make-MissModel..........................................................................................................................16 PlayerModel..................................................................................................................................17 Results..................................................................................................................................23 Make-MissModel..........................................................................................................................23 PlayerModel..................................................................................................................................25 Conclusions...........................................................................................................................32 References............................................................................................................................35 5 Introduction Basketballisagameofathleticism,skill,positioning,andteamwork.Teamsthatoptimizeeach ofthesefacetsoftheirgamecangenerallyexpecttobesuccessful.However,itisdifficultto measurethedegreetowhichagivenstrategycaninfluencebasketballsuccess,becausethere are many competing influencers (i.e. did a player make a shot because they were open, or becausetheyareagoodshooter?),andbecausethereissomuchnoisemixedinwiththesignal (i.e.evengreatthree-pointshootersonlymake40%oftheirshots). Withtheadventofplayertrackingdata,ithasbecomepossibletoexploregamestrategiesina newlight.Playertrackingdataenablesmeasurementsthatwerenotbeforemeasureablebeyond subjectivesuppositionsandterseremarks.Infact,acrosssports,playertrackingisrevolutionizing the sports-analytics movement with copious collections of fine-grained game observations, enablinganassortmentof(literally)game-changinganalyses.Inbasketballresearch,muchwork hasbeendonetoleverageplayertrackingdata,butlittleworkhasusedittoanalyzethree-point shooting. Inmythesis: • Ianalyzeplayertrackingdatafromover600gamesfromthefirsthalfofthe2015-2016 NBAseason,tofindplaysresultinginthree-pointshots. • Iderivecustomposition-basedfeaturesthatcapturetangiblegamestrategiesfromgameplaydata. 6 • I propose statistical methods for measuring the relative importance of any given basketballstrategy. • Idemonstratehowtheseposition-basedfeaturescanbeusedtomeasurethethree-point shootingpropensityofplayers. • Finally,Ishowhowthispropensitymetriccanidentifyeffectiveshootersthatareeither highly-utilizedorunder-utilized. 7 Background Between 2010 and 2013, the NBA equipped all of its arenas with motion capture cameras. Throughoutthesubsequentbasketballseasons,positionaldatawerecollectedineveryregular seasonandpostseasongame.Duringeachgame,thepositionsoftheballandeachplayeron thecourtwererecordedatarateof25observationspersecond.Thisrichdatasethasenabled researchers,analysts,andbasketballaficionadosaliketoexplorethegameofbasketballinways thatwereneverbeforepossible. YonggangNiu[2014]offersanexcellentdescriptiononthebackgroundofthetechnologythat enables the collection of this data in their paper Application of the SportVU Motion Capture SystemintheTechnicalStatisticsandAnalysisinBasketballGames.Thefollowingparaphrases thediscussioninthatpaperontheSportVUtechnology: TheSportVUsystem(Multi-lensTracingSystem)wasinventedin2005,byIsraeliscientist MickeyTamir,andwasoriginallyintendedformissiletrackinginamilitarysetting.The technologywasalsoshowntohavefunctionalapplicationsinsports.In2008,thesports analyticsfirmSTATSacquiredtheSportVUtechnologyandfocuseditontheanalysisof basketballgames.Today,thissystemhasbeeninstalledineveryNBAteams’homecourt andhascapturedmotiondataforover1000professionalbasketballgames. To date, this NBA SportVU data has already occupied an important position in the academic world. The annual Sloan Sports Analytics Conference at the Massachusetts 8 InstituteofTechnologyisthetoptechnologyeventinthesportsworld.Amongthepapers submittedtoSloanaboutbasketballlastyear,halfwerebasedonthedatacapturedfor theNBAbytheSportVUsystem. The SportVU system is run by STATS Data Corporation Limited. The ceiling of every basketballgymnasiumintheNBAisequippedwith6camerasandeveryhalf-courthas3 cameras,allsynchronizedtoeachother.Collectively,thesecamerascaptureplayerand ballmovements,andextractXYZlocationsrelativetothecourtatarateof25framesper second.Furthermore,thesepositionaldataarecollectedwithaforeignkeythatcanbe usedtojoinontoeachgame’sPlay-by-Playrecords. ThisdecisionbytheNBAtoequipallofitsarenaswithSTATSSportVUsystemswaspivotalin usheringinanewageofdatadrivenstrategytothegameofprofessionalbasketball. 9 ReviewoftheLiterature SportVU In his paper CourtVision: New Visual and Spatial Analytics for the NBA, Goldsberry [2012] proposedtheuseofspatialanalyticaltechniquestoassessNBAplayer’sshootingabilities.His workwasoneofanumberofeffortsbeginningtochallengebox-scoreanalyticsasthestatusquo forbasketballperformanceassessment.Hesuggestedthatspatialanalysiswasvitaltothestudy ofNBAbasketball,andthissuggestionhasonlybecomemoretrueinthepastfiveyears.Indeed, hisworkhelpedpavethewayfortheNBAtobuyintocollectingplayertrackingdatawithSTATS SportVU,whichspawnedaflurryofin-depthNBAspatialanalysesthatcontinuetocontribute substantiallytothedomainofbasketballanalytics. WiththeadventofSTATSSportVUtrackingdataintheNBA,basketballresearchershavebeen abletoexplorein-gameinteractions,strategies,andplayerperformanceininnovativewaysthat have not before been possible. Specifically, the granularity at which the SportVU data are collectedenableaprecisionofmeasurementthatbeforewasnotpossibleinanalyzingthegame ofbasketball.Indeed,inthefouryearssinceGoldsberry'sseminalwork,thefieldofbasketball analyticshasbeenrevolutionizedbyanalyticswithSportVUtrackingdata.Ithasbeenleveraged to inform all facets of the game, from team member selection, to team strategy, to player development.Thefollowingaresomeexamplesofthisradicalre-envisioning: • Cervone et al. [2014] demonstrate that player-tracking data can be leveraged to 10 evaluateeverydecisionmadeduringabasketballgame,whetheritbetopass,dribble, shoot,etc.Furthermore,theyshowthatbyapplyingtheirmodelingframeworktoevery moment(25framespersecond)ofabasketballgame,amultitudeofnewmetricsand analysesofbasketballbecomefeasible;theyoffersomeexamplesofthesenewmetrics foransweringrealbasketballdecisions. • Inamorerecentpaper,Cervoneetal.[2016]expandontheirpreviousworktoshowhow newpositional-basedmetricscanbeleveragedtoinfluencebasketballstrategy.They useSportVUtrackingdatatoassessthevalueofthespatialregionsofthebasketballcourt. Theyinferthevalueofcourtrealestatebasedonplayerandballmovementalone.Asin theirpreviouswork,theydevelopnewmetricsforassessingbothoffensesanddefenses attheplayerandteamlevels. • Maheswaranetal.[2014]showthatsimplebasketballstatisticssuchasreboundscanbe observedinmuchmorecomplexwaysthansimplynumbersinabox-score.Theyuse player tracking data to deconstruct rebounds into subcomponents that help to better explain rebound events. They propose that a rebound can be considered from three distinctdimensions:Positioning,HustleandConversion,andthatplayertrackingdatacan enablereboundeventstobeobservedinthesecontexts.LikeCervone,theydemonstrate howsportstrackingdatacanenablethecreationofnovelmetricsforevaluatingthegame ofbasketball. 11 • Luceyetal.[2014]useplayertrackingdatatoexplainhowshootersgetopen.First,they confirmthenotionthaton-balldefensivepressurereducesshootingpercentages.Given this,theyinvestigatehowanoffensecangetshootersopen.Theydemonstratethatthe frequency of defensive role-swaps is predictive of open shots, and use this finding to measureteams’defensiveeffectiveness.Furthermore,theydescribeamethodthatcan beusedtoquerysimilarhistoricalplaysbyusingtrackingdataasthequeryinput. Remarkably,thisisonlyasmallsampleoftheworkdonetodatethathasdemonstratedthevalue ofSportVUdata.Morerecentresearchispushingitslimitsevenfarther,fromautomaticplay categorization, to applications with neural networks, to the prediction of injuries before they happen. Truly, the uses of SportVU data are bountiful. More significantly, SportVU data is enablingsportsanalysesthatarebothuniqueandmeaningfultothegameofbasketball.Here areafewexceptionalexamples: • McIntyreetal.[2016]proposethattheirworkcanbeconsumedasonecomponentofa coachingassistancetoolforanalyzingplays.Theyuseplayertrackingdatatotraina classifierthatlabelsballscreenplaysaccordingtocommondefensiveresponsestrategies: Over,Under,Trap,andSwitch. • Wang and Zemel [2016] demonstrate how long short term memory (LSTM) recurrent 12 neuralnetworkscanconsumevoluminousamountsofthefine-grainedSportVUdatato performanalysesandcomparisonsofbasketballplaysthatwouldnotbepossiblefora humanobserveralone.Theyfocusontheclassificationofoffensiveplays.Theuseofan LSTMallowstheirnetworktolearnthecomplexinteractionsbetweenalltheplayerson thecourtastheyevolveoverthecourseofaplay.Furthermore,theyshowhowtheir modelcanstillperformwellwhentrainedononeseasonandtestedonthenext. • Talukderetal.[2016]presentamodelthatusesSportVUplayertrackingdatatopredict the likelihood that any given player will sustain an injury during the course of an upcominggame.Theycombineplay-by-playgamedata,SportVUdata,playerworkload andmeasurements,andteamschedulestotraintheirpredictivemodel.Theyarguethat bycombiningtheirresultswithinformationonteamschedulesandrestdays,teamscan identifythebesttimetoresttheirstarplayersandreducelong-terminjuryrisk.Thiswork is significant because it demonstrates how player tracking data can impact the game beyond just basketball strategy; it can be harnessed to manage player health, and, by association,faninterestandrevenue.Furthermore,itcanbeusedbyfantasysportsfans tomanagetheirowninvestmentrisks. Insum,thereisasubstantialbodyofworkdevelopedinthelastfewyearsencompassingthe analysisofbasketballwithNBASportVUtrackingdata.Becausepositioningissocentraltothe game of basketball, Goldsberry’s [2012] suggestion is becoming more and more true: spatial 13 analysisisvitaltothestudyofthegame.ThefloodofdatacollectedduringgamesviaSportVUis revolutionizingbasketballanalytics.Thisrevolutionischallengingcoreprinciplesofthegame includinggamestrategy,performanceassessment,andteamandplayermanagement.Likewise itasanexcitingtimetobeinvolvedinbasketballresearchbecauseeachnewinnovationopens doorstomanynewanalysesandposesquestionsabouthowweunderstandthegame. VariableImportance Importantvariablemeasurementisakeycomponentofthiswork,soconsidersomebackground onthistopic.Someofthemostcommonlyusedmachinelearningalgorithmssuchasrandom forests and gradient boosting machines provide measures for predictor variable importance alongwiththeirresultantmodels.Breiman[2001]discussesvariableimportanceinhisRandom Forests paper. He describes how out-of-bag predictors are randomly permuted to measure percentincreaseinmisclassificationrateforeachpredictorvariable,togiveastrongestimateof variableimportanceforthegivenclassificationorregressiontask.Healsodescribeshowrandom forestsarerobusttocollinearity,andcanimplicitlycapturevariableinteractionsintheirvariable importancemeasurements.Sincetheirintroduction,randomforestshavebecomeastandard method for measuring important variables. Given their strengths, random forests may be a perfectvehicleforassessingbasketballstrategiesinmywork. However, random forests do have some flaws in variable importance measurement. In their paperBiasinrandomforestvariableimportancemeasures:Illustrations,sourcesandasolution, Strobl et al. [2007] discuss how random forests are not reliable in situations where predictor 14 variablesvaryintheirscaleofmeasurementortheirnumberofcategories.Specifically,they demonstrate that when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred. They propose conditional inference forests as a strategy to counteract thesebiases. One downside to the conditional inference forests proposed by Strobl et al. [2007] is computational inefficiency, so I consider an alternative method for my work. In their paper Feature Selection with the Boruta Package, Kursa and Rudnicki [2010] describe how their algorithmBorutacontrolsforthevariableimportancebiasesofarandomforest.Specifically, they standardize importance measures to z scores, and intentionally include features in the modelthatarerandombydesign;theseareknownas‘shadow’features.Ashadowfeature’s Boruta importance score can be nonzero only due to random fluctuations. Thus the set of importancescoresofshadowfeaturesisusedasareferencefordecidingwhichactualfeatures are truly important. Effectively, anything that performs worse than these shadow features is considerednobetterthanrandom.Further,theBorutaalgorithmimplementationisefficient enoughthatdozensofiterationscanbeperformedonmydatatoassemblefeatureimportance distributions,ratherthanmerelyscalarmeasurements. Inmywork,IusetheBorutaalgorithmtomeasurevariableimportance,becauseitisamore advanced (and more current) method which can overcome the deficiencies of random forest variableimportancemeasurementformyproblem.Bycouplingthemostrecentinnovationin 15 basketballdata-gathering(SportVU),withanadvancedvariableimportancealgorithm(Boruta),I amabletoextractmeaningfulresultsthatwouldhavebeenimpossibletoachieveeven3years ago. 16 Methods Make-MissModel Asawhole,thisresearchinvestigatesthree-pointshotstrategiesintheNBA.Toaccomplishthis, three-pointstrategyisinvestigatedfromtwodifferentframesofreference.First,three-pointers arestudiedattheplaylevel,wherein-gamestrategiesandactionsarecomparedfortheirpower atinfluencingthree-pointshotsuccess.Specifically,amodelistrainedtomeasureeachvariable’s importanceininfluencingamakeormiss.Tovisualizethismake-missmodel,considerhoweach basketballgameismadeupofmanyplays,andhoweachplayismadeupoftheactionsof5 playersfromeachteamandtheball,asdepictedinFigure1. Figure1:Depictionofthemake-missmodelframeofreference 17 AsdepictedinFigure1,eachplayismadeupoftheactionsof5playersfromeachteamandthe ball.Iusetheseplayerandballactionstoconstructcustomfeaturesthatcapturegamestrategy such as teamwork, player matchups, and on-ball defender distances. These features are aggregated into a single observation for each play, across all games. Likewise, the make-miss modelisconstructedonthiscollectionofobservationsofmycustomfeaturesforeachplay. Thestructureofabasketballgamelendsitselfperfectlytoaclassificationproblem,becauseevery shottakenhasabinaryoutcome:amake,oramiss.Thisanalysisusestheplay(specificallythreepoint plays) as its unit of measurement, and seeks to quantify the relative value of different offensivestrategiesatthatplaylevel.Likewise,thevariableimportancemeasuresreturnedby theBorutaalgorithmareperfectvehiclesforquantifyingtherelativevaluesofplaystrategies.By consideringthemake/missofathree-pointerasaclassificationproblem,Ifitamodeltopredict theoutcomeofaplay,thencomparetheimportanceofthedependentvariables. PlayerModel Tobecompetitiveatmakingthree-pointshotsintheNBA,understandingtherelativestrengthof variousgamestrategiesandactionsisastrongstart.However,three-pointshootingisaskill,and onethatvariesgreatlyevenattheprofessionallevel.Likewise,itishighlyvaluabletoassess three-pointshootingacrossplayers. 18 ThesecondframeofreferenceIusetoanalyzethree-pointshootingisattheplayerlevel,where the same in-game strategies and actions measured in the make-miss model are collapsed to comprehensivevaluesforeachplayer.Tovisualizetheplayermodel,considerhowineachgame, agivenplayermaytakeathree-pointshotonmultipleplays.Foreachplayer,Iaggregateallof theirthree-pointshootingplaysacrossallgames,asdepictedinFigure2. Figure2:Depictionoftheplayermodelframeofreference AsdepictedinFigure2,playerAshotthree-pointersonmultipleplays.Icollectthemake-miss modelfeaturesforallthree-pointshootingplaysforplayerA,acrossallgames,andaggregate them to form a single observation for player A. I do this aggregation for all players who attemptedathree-pointshot.ThiscollectionofplayerobservationsformsthedataonwhichI buildtheplayermodel. 19 Intheplayermodel,Iaggregatethemetricsderivedinthemake-missmodeltoeachshooterin mydatasettoidentifytrendsinplayerusage.Byaggregatingthefeaturesdefinedinthemakemissmodel,Iamabletocapturecomprehensivemeasurementsofthemovementofplayersand theirteamsontheirthree-pointshootingplays.Specifically,theplayermodelusesagradient boosting machine regression algorithm to predict three-point attempts. By comparing the model’spredictionforaplayer’sper-gamethree-pointattemptratetotheiractualthree-point attemptrate,Icanidentifyplayerswhoarebehavinginunexpectedways.Iquantifyboththe mosteffectiveshooters,andthemostunder-utilizedshooters. Next, consider the modeling strategy I deployed for the player model problem. A typical modelingframeworkmightincludeatraindataset,andatestdataset,suchthatthetrainsetis usedtotrainthemodel,andthetestsetisusedtoevaluatethemodel’sperformanceonunseen data.ThisarchitecturewouldlooksomethinglikeFigure3,wheretheorangeboxrepresents trainingdatacontainingobservationsforplayers1throughn,andtheblueboxrepresentstesting datacontainingobservationsforplayersmthroughz: 20 ModelTraining Holdout Figure3:Typicalmodelingdataframework However,becauseeachplayerinmydatasetneedsaprediction,thismodelingmethodologywill notsuffice.Instead,Ideployaniterativeleave-one-outmodelingapproachontopofmytraintestsplit.Whilethetestsetremainsasanunseenholdout,thetrainsetissplitfurther,suchthat Itrainonemodelforeachplayerinthetrainset,asinthefollowingFigure: ModelTraining Holdout Figure4:Iterativeleave-one-outmodelingdataframework ThemodelingarchitecturedisplayedinFigure4allowsforeveryplayertobescoredonamodel inwhichtheywerenotincludedfortraining.Thisisimportantbecauseitprotectstheplayer scoresfrombeingover-biased,asinacasewherethemodelhasalready“seen”theplayeritis 21 scoring.Also,bymaintainingaholdouttestset,Icanevaluatetheperformanceofeveryplayer’s modelandassessmodelconsistencyacrosstheplayers;andbecauseeachplayermodel’straining setonlydiffersbyoneobservation,wecanexpectconsistentmodelperformance. Next, consider the means by which players can be assessed based on the outputs of their respectiveplayermodels.AsIdescribedabove,Ifirstfindthedeviationbetweenthemodel’s predictionforaplayer’sper-gamethree-pointattemptrateandtheiractualthree-pointattempt rate.Inasense,Iusetheerrortermoftheregressionmodeltoidentifyplayerswhoarebehaving inunexpectedways.Specifically,Imeasureplayermodeldeviationlikethis: 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑎𝑐𝑡𝑢𝑎𝑙3𝑃𝐴 − (𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑3𝑃𝐴) Intheaboveequation,deviationisdefinedasthedifferencebetweenaplayer’sactualthreepointattemptrate,andtheirmodel-predictedthree-pointattemptrate.Thisdeviationalonecan identifyplayerswhoshootthree-pointersmoreorlessfrequentlythanotherplayerswithlikeingameexperiences.However,asmentionedbefore,three-pointsuccessishighlydependenton player skill. Likewise, I propose a new metric for measuring a given player’s three-point propensity, by applying a penalty on deviation according to the player’s three-point shooting percentage.Specifically,Imeasurepropensitylikethis: 𝑃𝑟𝑜𝑝𝑒𝑛𝑠𝑖𝑡𝑦 = 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛(3𝑃%): In the above equation, propensity is defined as a player’s deviation times their three-point shootingpercentagecubed.Bycubingthree-pointshootingpercentage,Iensurethattheworst 22 shooters receive a large compounded penalty, while the best shooters receive the smallest penalty.Whenplayersareorderedbytheirpropensity,thosewiththehighestscoresareboth effectiveandhighlyutilized,whileplayerswiththemostnegativescoresaretheleastutilized, thoughstillveryeffective. 23 Results Make-MissModel First,considertheresultsofthemake-missmodel.Recallthatthemake-missmodelwastrained to measure the relative importance of each feature in predicting a made shot. Figure 5 summarizes the returned Boruta importance scores for each feature relative to each other. Figure5:Borutafeatureimportancedistributionsforthemake-missmodel AsdepictedinFigure5,theboxesfortheshadowvariablesonthefarleft-handsideofthefigure representtheBorutaimportancescoresforrandomlypermutedvariables.Becauseeachshadow feature represents the distribution of importance scores for random source feature permutations,wecaninferthateachofourfeaturesisatleastmorepredictivethanrandom. 24 OtherkeytakeawaysfromFigurefivearethatteamworkmetrics(e.g.,offensiveconvexhulland ballmovement)aregenerallymorepredictiveofsuccessthanplayermatchups.Unsurprisingly, some of the strongest predictors capture the distance between the shooter and the nearest defender. Therestoftheseresultscanbeeasytoglossover,soIwilldescribesomeofthemorenuanced findingshere,andprovidecontext.First,ofallthemetricstested,theonethatismostpredictive ofamakeormissistheaverage(median)distancebetweentheshooterandtheclosestdefender overthecourseofthatplay.Thisresultisexpected:itiseasierforaplayertoshootwhenthey areopen,anditismoredifficulttoshootwhentheyarebeingdefendedclosely.Inasense,this findingprovidesasanitycheckontherestofthefindingsinthisanalysis. Next,IwanttojumpdownthelistalittletopointoutPLAYER1_ID.Thisfeaturerepresentsthe identity of the shooter. Consider what that means in this context. The identity of a player essentially captures the difference in player skill and efficiency in one feature. Its relative importancetellsushowsignificantitisthatagoodplayerisshootingvs.abadone.Furthermore, itspositiononthelistofimportantfeaturesisverynoteworthybecausetherearemanyfeatures aheadofit.Thissuggeststhatmanyfeatures,suchasballmovement,andshottiming,aremore predictiveofthree-pointsuccessintheNBAthantheshooter’sskill. Next,considerthevariousfeaturesthatcaptureshooter-defendermatchups.Manyoffensive gamestrategiesinvolveluringthedefenseintopersonnelmismatches,throughscreensetting, orothermeans.Forexample,itisgenerallyacceptedthatbigplayerscanover-powersmaller 25 defenders on post-up plays. However, it is not as well understood how mismatches can be exploited on three-point shots. According to these results, the difference in height, weight, experience, and position between a shooter and his nearest defender all have relatively low poweratpredictingthree-pointshotoutcomeswhencomparedtostrategiesthatinvolveball movement,courtspacing,shottiming. PlayerModel Next,considertheresultsoftheplayermodel.Recalltheleave-one-outmodelingarchitecture thatwasdeployedforscoringplayersintheplayermodel,andconsiderthedistributionofmodel performance observed on each of the player models. Below, I plot a histogram of model performanceintermsofR2andRMSE(rootmeansquarederror)forthetestsetscoredoneach oftheplayermodels. 26 Figure6:HistogramsofRMSEandRsquaredacrossallplayermodels In the histograms depicted in Figure 6, we can see that the distribution of player model performance is approximately normal for both RMSE and R2. The narrow shape of each distributionsuggestsstablemodelperformanceacrossplayers.Furthermore,wecanobserve thatthemodelsdisplayareasonablevariance;meanR2isaround0.46,withmaximumaround 0.55, and minimum around 0.39. These results should offer confidence in the stability of performanceacrossplayermodels. Next,recallthattheplayermodelaggregatesthefeaturesderivedinthemake-missmodelto each player in the dataset for a comprehensive measurement of player and team movement duringthree-pointshots.Theplayermodelusestheseaggregatefeaturestoinfereachplayer’s 27 per-gamethree-pointattemptrate.Bycomparingaplayer’smodel-inferredthree-pointattempt ratetotheiractualthree-pointattemptrate,wecanobserveplayerswhobehaveinuniqueways intermsoftheirthree-pointshooting.Specifically,thiscomparisonallowsustodeduceifagiven playershootsmorefrequentlyorlessfrequentlythanwouldbeexpectedofanotherplayerin theirsituation.Considerfirstplayerswhoshotmorethreespergamethanexpected: Figure7:Playerswhoshotmorethreesthantheirmodelexpected,coloredbytheirrespectivethree-pointshootingpercentage InFigure7,thesizeofthebarassociatedwitheachplayercorrespondstothedeviationoftheir actual three-point attempt rate from their model-expected three-point attempt rate (more 28 three-point attempts than expected). The color of each bar offers context by conveying the three-pointshootingpercentageofthecorrespondingplayer.TheplayersshowninFigure7are the top ten positive deviators from their model’s projection. We can see that Stephen Curry averaged 5.7 more three-point attempts per game than expected and is also a very efficient three-pointshooter.Giventhehighefficiencyofthetwo-timemostvaluableplayer,heshould beawelcomeoutlier. Conversely,weseethatKobeBryantaveraged3morethree-pointattemptspergamethanhis modelexpected,butwasaveryinefficientthree-pointshooter.Knowinghisspecificsituationis revealing;2015-16wasthefinalseasonofBryant’slongandstoriedcareer.Thoughtheseresults suggesthewasforcingupmanymorethree-pointersthanotherplayersinhispositionwould have, his team presumably put up with such inefficient performance in honor of his final professionalseason,andtogivetheirfansafinalglimpseofhiminaction.Next,considerplayers whoshotfewerthreespergamethanexpected. 29 Figure8:Playerswhoshotfewerthreesthantheirmodelexpected,coloredbytheirrespectivethree-pointshootingpercentage InFigure8,weseethetoptennegativedeviatorsfromtheirmodel’sprojection.Wecanobserve that Karl-Anthony Towns averages 2 fewer three-point attempts than expected. Given the relativelyhighefficiencywithwhichheshootsthethreefromthecenterposition,itwouldbea promisingstrategytostretchhimouttothethree-pointlinemoreoften.Conversely,thoughmy modelprojectsAnthonyDavistoshoot2.8morethreespergamethanhereallydid,hismediocre shootingpercentagedoesnotwarrantagamestrategywherehetakestoomanymorethreepointshots. 30 TheresultsillustratedinFigures7and8anddiscussedaboveareverytelling.Theyhighlight effective and ineffective shooters in the context of how other players would perform in their situation. However, they do not convey the whole story. As demonstrated in this discussion, thereisameaningfulrelationshipbetweenaplayer’sdeviationfrommodel-expectedthree-point attemptsandtheirthree-pointshootingpercentage.Likewise,Idefinedthepropensitymetric formeasuringthisrelationship.Recallthatwhenplayersareorderedbytheirpropensity,those withthehighestscoresarebotheffectiveandhighlyutilized;theseplayersconsistentlymake shotsthattheirpeerswouldnot.Conversely,playerswiththemostnegativepropensityscores areveryeffectiveshooterswhoareunder-utilized;theyrepresentplayerswiththemostmissed opportunities;despitebeingeffectiveshooters,theyrefrainfromshootingmoreoftenthantheir peerswouldinsimilarsituations.Considertheplayerswiththestrongestpropensityfromeach ofthesetwogroups(effectivehigh-utilizationandeffectivelow-utilization),aslistedinFigure9. 31 Effective,High-utilization Player Propensity StephenCurry KlayThompson DamianLillard WesleyMatthews JamesHarden HollisThompson PaulGeorge KyleLowry J.R.Smith IsaiahCanaan 0.5250 0.1984 0.1881 0.1310 0.1206 0.1156 0.1119 0.1075 0.1008 0.0989 … JeffTeague -0.0939 TroyDaniels -0.0974 ChrisPaul -0.0977 DeronWilliams -0.1005 IanClark -0.1005 KawhiLeonard -0.1023 Karl-AnthonyTowns -0.1117 TyrekeEvans -0.1179 JrueHoliday -0.1290 LuisScola -0.1468 Effective,LowUtilization Figure9:Three-pointshooters,orderedandcoloredbytheirthree-pointshootingpropensity The most effective, highly utilized players are observed in the first table of Figure 9. The list includesmanyhouseholdnames,suchasthehistoricallygreatshooterandMVPStephenCurry, histeammateKlayThompsan,aswellasDamianLillard,JamesHarden,andPaulGeorge.These players’labelasgreatshooterswillbenosurprisetoNBAfans.However,whenassessedbythe same standards, several other lesser-heralded shooters rank highly; Wesley Matthews, Hollis Thompsan,andIsaiahCanaanareallwellregardedshooters,butrarelyhavetheirthree-point 32 shootingprowesscomparedtothesuperstarscitedabove. Similarly, the second table of Figure 9 lists the most effective and under-utilized three-point shooters.Again,thislistisofparticularinterestbecauseitcallstolightplayerswhocouldexpect to be successful if they shoot more three-pointers. As before, we see the rookie-of-the-year centerKarl-AnthonyTownswithastrongrankingbythismetric.Inshort,thisisasignificantlist becausetheseplayershaveunlockedpotentialintermsofthree-pointshooting.Knowingthis, teams can adjust game strategy around these players, or target under-the-radar players for sneakytalentacquisition. 33 Conclusions Inmythesis,Imeasuretherelativeinfluenceofcompetingbasketballgamestrategies,andIapply myanalysistoplaysresultinginthree-pointshots.IuseSportVUplayertrackingdatafromNBA games to derive custom position-based features that capture tangible game strategies from game-playdata.Then,Idemonstratestatisticalmethodsformeasuringtherelativeimportance ofanygivenbasketballstrategy.Indoingso,Ihighlightthehighimportanceofteamworkbased strategies in affecting three-point shot success. By coupling the most recent innovation in basketballdata-gathering(SportVU),withanadvancedvariableimportancealgorithm(Boruta),I amabletoextractmeaningfulresultsthatwerenotfeasibleeven3yearsago.Furthermore,I demonstrate how player-tracking based features can be used to measure the three-point shootingpropensityofplayers,andIshowhowthismeasurementcanidentifyeffectiveshooters thatareeitherhighly-utilizedorunder-utilized.Altogether,thesefindingsprovideasubstantial body of work for influencing basketball strategy, and for measuring the quality of basketball players. Thoughthree-pointshootingwasthefocusofmyresearch,thatchoicewasanarbitraryoneto narrowmyscope.ThemethodsIdemonstrateinmyresearchcanbeappliedtoanumberof game targets as long as they can be measured (i.e. 2-point shooting, pick-and rolls, team rebounding,defense,etc.).Similarly,thefeaturesthatIdefineinthemake-missmodelwerealso onlyarbitraryselectionsbasedonquantifiablegamestrategies;anygamestrategycanbetested inthisframeworkaslongasitcanbemeasured. 34 Intheplayermodel,Iconstructahighlymeaningfulmodelthatwastrainedonlyonthefeatures definedforthemake-missmodel.However,thesefeaturesarelimitedintheirabilitytocapture relevant game-play information, and their explicit definitions are not relevant for the player model’sutilization.Likewise,amoreencompassingapproachtotrainingaplayermodelwould bebasedonaneuralnetworkstylearchitecture.Thebenefitofaneuralnetworkinthissituation isthatitcantaketherawplayertrackingdataasinputs,andautomaticallylearntherelevant featuresandinteractionsforagiventarget(i.e.three-pointshooting).Onecouldthusexpecta neuralnetworkstylemodeltoachieveevenbetterperformancethanthemodelIdemonstrate inthisresearch.Moreover,asdiscussedintheliteraturereview,neuralnetworkshavealready beensuccessfullydemonstratedforuse-casesontheNBAplayertrackingdata. Inclose,myworkpushestheenvelopeforanalyzingbasketballstrategy,andformeasuringthe qualityofbasketballplayers.Untilrecently,theanalysesdemonstratedinthispaperwerenot evenfeasible.Theywereonlymadepossiblewiththeavailabilityofplayertrackingdataandwith thelatestadvancesinstatisticallearning.Muchisstillyettobedonetoadvancebothmywork andthefieldofbasketballanalyticsasawhole.SportVUdatahasopenedmanynewdoorsfor basketballanalytics,andeachnewanalysissnowballsmanymorequestionsaboutourperception ofthegame. 35 References Breiman,L.(2001).Randomforests.MachLearn,45(1),5-32. DanCervone,LukeBornn,KirkGoldsberry(2016).NBACourtRealty,MITSloanSportsAnalytics Conference. DanCervone,AlexanderD’Amour,LukeBornn,KirkGoldsberry(2014).POINTWISE:Predicting Points and Valuing Decisions in Real Time with NBA Optical Tracking Data, MIT Sloan Sports AnalyticsConference. J.H.Friedman(2001).GreedyFunctionApproximation:AGradientBoostingMachine,Annalsof Statistics29(5):1189-1232. Kirk Goldsberry (2012). CourtVision: New Visual and Spatial Analytics for the NBA, MIT Sloan SportsAnalyticsConference. MironB.Kursa,WitoldR.Rudnicki(2010).FeatureSelectionwiththeBorutaPackage.Journalof StatisticalSoftware,36(11),p.1-13.URL:http://www.jstatsoft.org/v36/i11/. PatrickLucey,AlinaBialkowski,PeterCarr,YisongYueandIainMatthews(2014).“HowtoGetan Open Shot”: Analyzing Team Movement in Basketball using Tracking Data, MIT Sloan Sports AnalyticsConference. Rajiv Maheswaran, Yu-Han Chang, Jeff Su, Sheldon Kwok, Tal Levy, Adam Wexler, Noel Hollingsworth (2014). The Three Dimensions of Rebounding, MIT Sloan Sports Analytics Conference. AveryMcIntyre,JoelBrooks,JohnGuttag,andJennaWiens(2016).RecognizingandAnalyzing BallScreenDefenseintheNBA,MITSloanSportsAnalyticsConference. RCoreTeam(2015).R:Alanguageandenvironmentforstatisticalcomputing.RFoundationfor StatisticalComputing,Vienna,Austria.URLhttps://www.R-project.org/. 36 CarolinStrobl,Anne-LaureBoulesteix,AchimZeileis,TorstenHothorn(2007).Biasinrandom forestvariableimportancemeasures:Illustrations,sourcesandasolution,BMCBioinformatics. HishamTalukder,ThomasVincent,GeoffFoster,CamdenHu,JuanHuerta,AparnaKumar,Mark Malazarte,DiegoSaldana,ShawnSimpson(2016).Preventingin-gameinjuriesforNBAplayers, MITSloanSportsAnalyticsConference. Kuan-ChiehWang,RichardZemel(2016).ClassifyingNBAOffensivePlaysUsingNeural Networks,MITSloanSportsAnalyticsConference. YonggangNiu,HaojieHuang,HuanbinZhao(2014).ApplicationoftheSportVUMotionCapture SystemintheTechnicalStatisticsandAnalysisinBasketballGames,AsianSportsScience.
© Copyright 2026 Paperzz