NHL Playoff Predictions - Statistical Sports Consulting

Predictive Analytics at the NHL
Eric Blabac, Director of Decision Science – Membership Analytics, Sam’s Club
May 6th, 2017
Agenda
 Introductions
 SAP Partnership with the NHL
 Playoff Predictions
 Probability of Making Playoffs
 Q&A
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
2
Introductions
Global analytics expert, data science evangelist and is currently the
Director of Decision Science, Membership Analytics at Sam’s Club.
Prior to Sam’s Club, Eric held the role of Principal Data Scientist at
SAP.
Eric’s background is based on advanced analytics with substantial
experience in statistical modeling, predictive analytics, data mining,
forecasting and management consulting. He has worked across a
variety of industries including retail. financial services, consumer
product goods, healthcare and sports and entertainment
Eric Blabac
He is also the author of The Encyclopedia of
Director of Decision Baseball Statistics - From A to ZR, a complete
Science – Membership reference of all modern baseball statistics, what
they really mean, how to calculate them and
Analytics
how to use them.
Sam’s Club
Eric holds two Masters degrees (MS), Statistics and Applied
Mathematics from Iowa State University, a Master’s in Business
Administration (MBA) from Grand Canyon University and
a Bachelors (BSc) degree in Mathematics from Iowa State
University
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
3
SAP Partnership with the NHL
*5-year sponsorship agreement
Phase 1a (Enhanced Stats) : Oct 2014 – Feb 2015
Phase 1b (Playoff Predictions): Jan – April 2015
Phases 2 and 3 (UX Revamp + Additional Stats):
June – Aug 2015
•
Advanced Game Level Filtering
•
Statistic Charting/Player Comparison
•
Stats by Context (Faceoffs By Zone, Shots By
Type, etc …)
•
Stats by Strength (e.g. 3on3 Goals)
•
Team Power Index
Phase 4 (Enhancements): Nov 2015 – Jan/Feb 2016
•
Probability of Making Playoffs
•
Line Analysis
•
In-Game Win Expectancy
Project Team: PM, 3 consultants (ETL/HANA modeling, Data Scientist, Solution Architect)
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
4
Playoff Predictions
NHL Playoff Predictions
Overview
GOAL: Predict the Stanley Cup winner
THINGS TO CONSIDER (aka Requirements):
BUISINESS
•
Need to predict every game AND series leading up to the finals. What does the output need to
look like?
•
Model needs to incorporate “Enhanced Statistics” (SAP Marketing)
•
The model needs to be EASY to explain/interpret (for the NHL)
•
The output need to be EASY to understand (for fans)
•
The factors used need to be EASY to understand, but compelling (for the NHL, fans, media)
‘
ANALYTICAL
•
What data should I use? Do I need to calculate additional variables?
•
Define “prediction” (e.g. explicit win/loss, win probability)
•
Which statistical model should I use?
•
How do I implement the model? How do I “simulate” the Stanley Cup playoffs?
•
Predictions for game x needs to account for results in game x-1 and previous series (Bracket)
TIME
•
Began in early Jan, deadline of mid-March (three weeks before playoffs start)
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
6
NHL Playoff Predictions
Solution Overview
A logistic regression model was developed to calculate the probability a team would win a
specific playoff game. This model incorporated various factors including:
•
Standard regular season stats
•
•
Enhanced and Advanced regular season stats
•
•
Penalty Kill %, Goals Against Per Game, etc …
Shot Attempts % Behind, Save % on High Quality Shots, Shooting Efficiency %, etc …
Game Context factors
•
Home vs. Away, Time Zones Travelled, Opponent Strength, etc …
Regular Season
Results
Playoff Game
Results
Team Level
Stats
Game
Context
Streak and
Strength Factors
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
Simulating Remaining
Bracket (Game Level)
Game Level Win
Probabilities
Series Level Win
Probabilities
7
NHL Playoff Predictions
Solution Development
DATA PREPERATION
•
Created an exhaustive list of all factors that we thought may be predictive (NHL/SAP)
•
We came up with 78 different factors, these factors generated 241 variables
•
•
E.g. Factor: Winning Percentage → Variables: Current Winning Percentage, Opponent Winning Percentage, Winning
Percentage Last X Games, Winning Percentage League Rank, etc …
The data was prepared in a HANA (database) stored procedure utilizing over 20 different
source tables in the NHL’s data landscape - over 1500 lines of code
‘
MODELING
•
Chose a model that was appropriate for the problem (classification) and met the NHL
requirements (e.g. “EASY” to develop, interpret and explain) → Logistic Regression
•
I initially grouped the 241 variables into eight (8) subgroups based on the type of variable (e.g.
Possession, Special Teams and Goalie, etc …). Models were ran on each subgroup to
determine the factors with high predictive power. Each selected variable was then combined into
one final model to yield the final 37.
SIMULATION and IMPLEMENTATION
•
I then developed code to simulate the remaining bracket given the current state of the playoffs;
loop through each game, series and round and predict each future game in the bracket format
•
Predictions were generated every morning and were available on the HANA cloud for fans to
access over any platform on NHL.com
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
8
NHL Playoff Predictions
Implementation
NHL.com Series Preview
SAP Match-up Analysis
(Bracket Challenge)
Do you notice anything missing?
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
9
NHL Playoff Predictions
Day “0” (Before the Playoffs Began) Bracket Predictions and Results
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
10
NHL Playoff Predictions
Initial Results
The initial results were mixed, but overall positive as the model successfully
predicted the Chicago Blackhawks to win the Cup on “Day 0” (!!)
However, the game level model had some issues:
•
Stubbornness and Predicting “too many” sweeps
•
•
In many cases, the model stuck with the initial series prediction, even based on in-series
performance (e.g. team lost first 2 games, team down 3 games to 1)
Picked “too many” big upsets
•
While some upsets turned out to be predicted correctly, “too many” big upsets simply
didn’t look right (both examples below were upsets of President’s trophy winners)
•
E.g. PIT over NYR in 2014-2015
•
E.g. PHI over WSH in 2015-2016
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
11
NHL Playoff Predictions
Second Version
The second version of the playoff predictions was structured in “phases”. The first
phase being a series level prediction, using all available historical playoff data (back
to 1987-88 season). This series level prediction can be utilized on its own (e.g.
Bracket Challenge), but is also used as an input into a new game level model.
Regular Season
Results
Team Level
Stats
Playoff Game
Results
Game
Context
Simulated
Remaining Bracket
(Series Level)*
Series Level Win
Probabilities
Game Level Win
Probabilities
Simulated
Remaining Bracket
(Game Level)*
Historical Playoff
Performance
The new game level model takes into account game context factors (home vs.
away, days between games, etc …) plus takes into account historical playoff
performance for more “realistic” game predictions.
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
12
NHL Playoff Predictions
Results Comparison – Series Level (2015-2016)
v1
DAL
MIN
32.66%
67.35%
DAL
STL
STL
CHI
75.18%
24.82%
ANA
NSH
70.85%
29.15%
SJS
PIT
67.47%
32.53%
STL
SJS
NSH
SJS
39.25%
60.75%
14.87%
85.13%
24.43%
75.57%
61.49%
38.51%
34.76%
65.24%
FLA
NYI
27.18%
72.82%
TBL
DET
43.25%
56.75%
WSH
PHI
NYI
TBL
TBL
PIT
2.39%
97.61%
16.12%
83.88%
WSH
PIT
LAK
SJS
64.12%
35.88%
69.38%
30.62%
PIT
NYR
DAL
MIN
50.96%
49.04%
49.26%
50.74%
FLA
NYI
71.39%
28.61%
TBL
DET
62.69%
37.31%
WSH
PHI
68.59%
31.41%
PIT
NYR
v2
DAL
STL
STL
CHI
42.28%
57.72%
ANA
NSH
38.62%
61.38%
STL
SJS
NSH
SJS
LAK
SJS
SJS
PIT
63.77%
36.23%
48.18%
51.82%
48.64%
51.36%
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
33.94%
66.06%
48.65%
51.35%
44.27%
55.73%
48.53%
51.47%
NYI
TBL
TBL
PIT
51.99%
48.01%
WSH
PIT
13
NHL Playoff Predictions
Results Comparison – Game Level (2014-2015)
CGY vs VAN, 1st round
Calgary Flames
Win
Historical
Win
Timezone Series
Probability Home
Probability of Probability
Difference W/L
SERIES
Winning Series
GAME
49.31%
0
-1
0-0
50.00%
43.21%
49.31%
0
-1
1-0
65.93%
43.27%
49.31%
1
1
1-1
50.00%
56.11%
49.31%
1
1
2-1
66.78%
56.27%
49.31%
0
-1
3-1
87.23%
43.34%
49.31%
1
1
3-2
75.61%
56.36%
Vancouver Canucks
Win
Game #
Win
1
0
1
1
0
1
1
2
3
4
5
6
0
1
0
0
1
0
Win
Historical
Probability Probability of
GAME
Winning Series
56.79%
50.00%
56.73%
34.07%
43.89%
50.00%
43.73%
33.22%
56.66%
12.77%
43.64%
24.39%
Series
W/L
0-0
0-1
1-1
1-2
1-3
2-3
Win
Timezone
Home Probability
Difference
SERIES
1
1
50.69%
1
1
50.69%
-1
0
50.69%
-1
0
50.69%
1
1
50.69%
-1
0
50.69%
Note: Even though the initial series level model showed a slight edge to VAN, the game level model utilized the current series
performance to give the eventual edge to CGY late in the series
NYR vs WSH, 2nd round
New York Rangers
Win
Days
Historical
Win
Series
Probability Home Between
Probability of Probability
W/L
SERIES
GP
Winning Series
GAME
66.58%
1
6
0-0
50.00%
61.65%
66.58%
1
2
0-1
34.07%
62.33%
66.58%
0
2
1-1
50.00%
49.17%
66.58%
0
2
1-2
33.22%
47.79%
66.58%
1
2
1-3
12.77%
63.78%
66.58%
0
2
2-3
24.39%
47.06%
66.58%
1
2
3-3
50.00%
61.24%
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
Washington Capitals
Win
Game #
Win
0
1
0
0
1
1
1
1
2
3
4
5
6
7
1
0
1
1
0
0
0
Win
Historical
Probability Probability of
GAME
Winning Series
38.35%
50.00%
37.67%
65.93%
50.83%
50.00%
52.21%
66.78%
36.22%
87.23%
52.94%
75.61%
38.76%
50.00%
Series
W/L
0-0
1-0
1-1
2-1
3-1
3-2
3-3
Days
Win
Between Home Probability
GP
SERIES
3
0
33.42%
2
0
33.42%
2
1
33.42%
2
1
33.42%
2
0
33.42%
2
1
33.42%
2
0
33.42%
14
NHL Playoff Predictions
Results Comparison Notes
•
•
Better overall success at the series level (v1 vs. v2)
•
2014-2015: 11/15 (73%) vs. 9/15 (60%)
•
2015-2016: 12/15 (80%) vs. 8/15 (53%)
Both new series and game level predictions are much more “conservative”
•
•
In fact, with the new model using 28 seasons of playoff data (420 series), only 39 series had more
than an 80% series win probability
Better “eye test” success (NYR and WSH being Presidents’ Trophy winners)
•
•
2014-2015: NYR vs PIT
•
Old: NYR (16.01%) vs. PIT (83.99%)
•
New: NYR (63.81%) vs. PIT (36.19%)
2015-2016: WSH vs PHI
•
Old: WSH (43.25%) vs. PHI (56.75%)
•
New: WSH (62.69%) vs. PHI (37.31%)
© 2017 SAP AG or an SAP affiliate company. All rights reserved.
15
Questions?
Thank you!
© 2014 SAP AG or an SAP affiliate company. All rights reserved.