Predictive Analytics at the NHL Eric Blabac, Director of Decision Science – Membership Analytics, Sam’s Club May 6th, 2017 Agenda Introductions SAP Partnership with the NHL Playoff Predictions Probability of Making Playoffs Q&A © 2017 SAP AG or an SAP affiliate company. All rights reserved. 2 Introductions Global analytics expert, data science evangelist and is currently the Director of Decision Science, Membership Analytics at Sam’s Club. Prior to Sam’s Club, Eric held the role of Principal Data Scientist at SAP. Eric’s background is based on advanced analytics with substantial experience in statistical modeling, predictive analytics, data mining, forecasting and management consulting. He has worked across a variety of industries including retail. financial services, consumer product goods, healthcare and sports and entertainment Eric Blabac He is also the author of The Encyclopedia of Director of Decision Baseball Statistics - From A to ZR, a complete Science – Membership reference of all modern baseball statistics, what they really mean, how to calculate them and Analytics how to use them. Sam’s Club Eric holds two Masters degrees (MS), Statistics and Applied Mathematics from Iowa State University, a Master’s in Business Administration (MBA) from Grand Canyon University and a Bachelors (BSc) degree in Mathematics from Iowa State University © 2017 SAP AG or an SAP affiliate company. All rights reserved. 3 SAP Partnership with the NHL *5-year sponsorship agreement Phase 1a (Enhanced Stats) : Oct 2014 – Feb 2015 Phase 1b (Playoff Predictions): Jan – April 2015 Phases 2 and 3 (UX Revamp + Additional Stats): June – Aug 2015 • Advanced Game Level Filtering • Statistic Charting/Player Comparison • Stats by Context (Faceoffs By Zone, Shots By Type, etc …) • Stats by Strength (e.g. 3on3 Goals) • Team Power Index Phase 4 (Enhancements): Nov 2015 – Jan/Feb 2016 • Probability of Making Playoffs • Line Analysis • In-Game Win Expectancy Project Team: PM, 3 consultants (ETL/HANA modeling, Data Scientist, Solution Architect) © 2017 SAP AG or an SAP affiliate company. All rights reserved. 4 Playoff Predictions NHL Playoff Predictions Overview GOAL: Predict the Stanley Cup winner THINGS TO CONSIDER (aka Requirements): BUISINESS • Need to predict every game AND series leading up to the finals. What does the output need to look like? • Model needs to incorporate “Enhanced Statistics” (SAP Marketing) • The model needs to be EASY to explain/interpret (for the NHL) • The output need to be EASY to understand (for fans) • The factors used need to be EASY to understand, but compelling (for the NHL, fans, media) ‘ ANALYTICAL • What data should I use? Do I need to calculate additional variables? • Define “prediction” (e.g. explicit win/loss, win probability) • Which statistical model should I use? • How do I implement the model? How do I “simulate” the Stanley Cup playoffs? • Predictions for game x needs to account for results in game x-1 and previous series (Bracket) TIME • Began in early Jan, deadline of mid-March (three weeks before playoffs start) © 2017 SAP AG or an SAP affiliate company. All rights reserved. 6 NHL Playoff Predictions Solution Overview A logistic regression model was developed to calculate the probability a team would win a specific playoff game. This model incorporated various factors including: • Standard regular season stats • • Enhanced and Advanced regular season stats • • Penalty Kill %, Goals Against Per Game, etc … Shot Attempts % Behind, Save % on High Quality Shots, Shooting Efficiency %, etc … Game Context factors • Home vs. Away, Time Zones Travelled, Opponent Strength, etc … Regular Season Results Playoff Game Results Team Level Stats Game Context Streak and Strength Factors © 2017 SAP AG or an SAP affiliate company. All rights reserved. Simulating Remaining Bracket (Game Level) Game Level Win Probabilities Series Level Win Probabilities 7 NHL Playoff Predictions Solution Development DATA PREPERATION • Created an exhaustive list of all factors that we thought may be predictive (NHL/SAP) • We came up with 78 different factors, these factors generated 241 variables • • E.g. Factor: Winning Percentage → Variables: Current Winning Percentage, Opponent Winning Percentage, Winning Percentage Last X Games, Winning Percentage League Rank, etc … The data was prepared in a HANA (database) stored procedure utilizing over 20 different source tables in the NHL’s data landscape - over 1500 lines of code ‘ MODELING • Chose a model that was appropriate for the problem (classification) and met the NHL requirements (e.g. “EASY” to develop, interpret and explain) → Logistic Regression • I initially grouped the 241 variables into eight (8) subgroups based on the type of variable (e.g. Possession, Special Teams and Goalie, etc …). Models were ran on each subgroup to determine the factors with high predictive power. Each selected variable was then combined into one final model to yield the final 37. SIMULATION and IMPLEMENTATION • I then developed code to simulate the remaining bracket given the current state of the playoffs; loop through each game, series and round and predict each future game in the bracket format • Predictions were generated every morning and were available on the HANA cloud for fans to access over any platform on NHL.com © 2017 SAP AG or an SAP affiliate company. All rights reserved. 8 NHL Playoff Predictions Implementation NHL.com Series Preview SAP Match-up Analysis (Bracket Challenge) Do you notice anything missing? © 2017 SAP AG or an SAP affiliate company. All rights reserved. 9 NHL Playoff Predictions Day “0” (Before the Playoffs Began) Bracket Predictions and Results © 2017 SAP AG or an SAP affiliate company. All rights reserved. 10 NHL Playoff Predictions Initial Results The initial results were mixed, but overall positive as the model successfully predicted the Chicago Blackhawks to win the Cup on “Day 0” (!!) However, the game level model had some issues: • Stubbornness and Predicting “too many” sweeps • • In many cases, the model stuck with the initial series prediction, even based on in-series performance (e.g. team lost first 2 games, team down 3 games to 1) Picked “too many” big upsets • While some upsets turned out to be predicted correctly, “too many” big upsets simply didn’t look right (both examples below were upsets of President’s trophy winners) • E.g. PIT over NYR in 2014-2015 • E.g. PHI over WSH in 2015-2016 © 2017 SAP AG or an SAP affiliate company. All rights reserved. 11 NHL Playoff Predictions Second Version The second version of the playoff predictions was structured in “phases”. The first phase being a series level prediction, using all available historical playoff data (back to 1987-88 season). This series level prediction can be utilized on its own (e.g. Bracket Challenge), but is also used as an input into a new game level model. Regular Season Results Team Level Stats Playoff Game Results Game Context Simulated Remaining Bracket (Series Level)* Series Level Win Probabilities Game Level Win Probabilities Simulated Remaining Bracket (Game Level)* Historical Playoff Performance The new game level model takes into account game context factors (home vs. away, days between games, etc …) plus takes into account historical playoff performance for more “realistic” game predictions. © 2017 SAP AG or an SAP affiliate company. All rights reserved. 12 NHL Playoff Predictions Results Comparison – Series Level (2015-2016) v1 DAL MIN 32.66% 67.35% DAL STL STL CHI 75.18% 24.82% ANA NSH 70.85% 29.15% SJS PIT 67.47% 32.53% STL SJS NSH SJS 39.25% 60.75% 14.87% 85.13% 24.43% 75.57% 61.49% 38.51% 34.76% 65.24% FLA NYI 27.18% 72.82% TBL DET 43.25% 56.75% WSH PHI NYI TBL TBL PIT 2.39% 97.61% 16.12% 83.88% WSH PIT LAK SJS 64.12% 35.88% 69.38% 30.62% PIT NYR DAL MIN 50.96% 49.04% 49.26% 50.74% FLA NYI 71.39% 28.61% TBL DET 62.69% 37.31% WSH PHI 68.59% 31.41% PIT NYR v2 DAL STL STL CHI 42.28% 57.72% ANA NSH 38.62% 61.38% STL SJS NSH SJS LAK SJS SJS PIT 63.77% 36.23% 48.18% 51.82% 48.64% 51.36% © 2017 SAP AG or an SAP affiliate company. All rights reserved. 33.94% 66.06% 48.65% 51.35% 44.27% 55.73% 48.53% 51.47% NYI TBL TBL PIT 51.99% 48.01% WSH PIT 13 NHL Playoff Predictions Results Comparison – Game Level (2014-2015) CGY vs VAN, 1st round Calgary Flames Win Historical Win Timezone Series Probability Home Probability of Probability Difference W/L SERIES Winning Series GAME 49.31% 0 -1 0-0 50.00% 43.21% 49.31% 0 -1 1-0 65.93% 43.27% 49.31% 1 1 1-1 50.00% 56.11% 49.31% 1 1 2-1 66.78% 56.27% 49.31% 0 -1 3-1 87.23% 43.34% 49.31% 1 1 3-2 75.61% 56.36% Vancouver Canucks Win Game # Win 1 0 1 1 0 1 1 2 3 4 5 6 0 1 0 0 1 0 Win Historical Probability Probability of GAME Winning Series 56.79% 50.00% 56.73% 34.07% 43.89% 50.00% 43.73% 33.22% 56.66% 12.77% 43.64% 24.39% Series W/L 0-0 0-1 1-1 1-2 1-3 2-3 Win Timezone Home Probability Difference SERIES 1 1 50.69% 1 1 50.69% -1 0 50.69% -1 0 50.69% 1 1 50.69% -1 0 50.69% Note: Even though the initial series level model showed a slight edge to VAN, the game level model utilized the current series performance to give the eventual edge to CGY late in the series NYR vs WSH, 2nd round New York Rangers Win Days Historical Win Series Probability Home Between Probability of Probability W/L SERIES GP Winning Series GAME 66.58% 1 6 0-0 50.00% 61.65% 66.58% 1 2 0-1 34.07% 62.33% 66.58% 0 2 1-1 50.00% 49.17% 66.58% 0 2 1-2 33.22% 47.79% 66.58% 1 2 1-3 12.77% 63.78% 66.58% 0 2 2-3 24.39% 47.06% 66.58% 1 2 3-3 50.00% 61.24% © 2017 SAP AG or an SAP affiliate company. All rights reserved. Washington Capitals Win Game # Win 0 1 0 0 1 1 1 1 2 3 4 5 6 7 1 0 1 1 0 0 0 Win Historical Probability Probability of GAME Winning Series 38.35% 50.00% 37.67% 65.93% 50.83% 50.00% 52.21% 66.78% 36.22% 87.23% 52.94% 75.61% 38.76% 50.00% Series W/L 0-0 1-0 1-1 2-1 3-1 3-2 3-3 Days Win Between Home Probability GP SERIES 3 0 33.42% 2 0 33.42% 2 1 33.42% 2 1 33.42% 2 0 33.42% 2 1 33.42% 2 0 33.42% 14 NHL Playoff Predictions Results Comparison Notes • • Better overall success at the series level (v1 vs. v2) • 2014-2015: 11/15 (73%) vs. 9/15 (60%) • 2015-2016: 12/15 (80%) vs. 8/15 (53%) Both new series and game level predictions are much more “conservative” • • In fact, with the new model using 28 seasons of playoff data (420 series), only 39 series had more than an 80% series win probability Better “eye test” success (NYR and WSH being Presidents’ Trophy winners) • • 2014-2015: NYR vs PIT • Old: NYR (16.01%) vs. PIT (83.99%) • New: NYR (63.81%) vs. PIT (36.19%) 2015-2016: WSH vs PHI • Old: WSH (43.25%) vs. PHI (56.75%) • New: WSH (62.69%) vs. PHI (37.31%) © 2017 SAP AG or an SAP affiliate company. All rights reserved. 15 Questions? Thank you! © 2014 SAP AG or an SAP affiliate company. All rights reserved.
© Copyright 2026 Paperzz