paper - People Server at UNCW

Elijah Molloy
Bradley Korabik
Brandon Harris
NHL Game Prediction Project Description
Our group project revolves around finding an accurate way to predict the winner of a
hockey game between various matchups in the Western Conference (Central Division) of the
NHL. Some of the statistics we would be using in our algorithm for predicting a game winner are
a team’s win loss record, the number of games they play at home versus the number of games
they play away, and also various league averages that can be used to compare one team to
another. We are using the 2014/2015 season’s data for the prediction process of certain NHL
games.
The formal definition of predicting win probability is as follows, “Given input variables
from two separate NHL teams, find the probability of winning by using discrete probability
distribution (Poisson Distribution).” Our formula is the summation of k=1 to n of (Pk – Ak)2 = 0.
With this equation our goal is to minimize the output as close to 0, therefore eliminating as much
error as possible. In addition, we are also utilizing poisson distribution as a tool to assist in the
prediction generating process.
In researching this problem our group found that there was little analysis done to predict
actual hockey games on a game by game basis. We found that most of the work had been done to
predict soccer games, and soccer is a similar sport to compare to hockey as far as statistics and
overall game play go. We utilized our research into the soccer game predictions to develop our
algorithms for predicting hockey games.
Our first game prediction algorithm, Poisson Distribution, utilizes the statistics of “Home
Team Goals For”, “Home Team Goals Against”, “Away Team Goals For”, and “Away Team
Goals Against.” Poisson distribution is defined as
Is used to determine which team would win in a game.  is used to represent the average number
of goals that a particular team is expected to score in any particular game, while k is used to
represent the actual number of goals scored by the same particular team. Start by taking one
team, team one and calculating the average number of goals they score by dividing the amount of
goals they score by the number of games team one has played. Secondly determine team one’s
attack strength by taking the average goals scored that was calculated above and dividing it by
the league scoring average to find out how team one ranks in comparison to the rest of the league
in terms of offense. To find the defensive strength of the opposing team, team two takes the
number of goals team two has allowed and divides it by the number of games they have played.
To calculate team two’s defense strength, take the average above and divide it by the league
average for goals allowed. Once the offensive strength for team one and defensive strength for
team two is considered we can create an average number of goals that team one is most likely to
score. To find team two’s average number of goals score we can take the method used above and
reverse it by substituting team one for team two and vise versa. Once you have the average for
team one and two, poisson distribution can be used to predict how many goals each team would
score. Poisson distribution would only go from zero to eight goals because a team is not very
likely to score more than eight goals looking at NHL games last season. Once the results from
poisson are calculated, each team’s win percentage can be calculated by summing the multiplied
percentages from 1 to 0 goals, 2 to 0 goals, 2 to 1 goals and so on up to 8 to 7 goals. The higher
percentage between the two teams is the winner for that game.
The second game prediction algorithm uses “Average Home Team Goals Scored” and
“Average Away Team Goals Scored.” Skellam Distribution is defined as
where I k(z) is a modified Bessel function of the first kind. It is used to determine the probability
of one team beating another team with respect to only the home and away games between the
two teams, and no other statistics for a given team. Specifically, it will determine the probability
of the difference of two statistically independent random variables, using expected values, 1 and
2, representing “Average Home Team Goals Scored/Game” and “Average Away Team Goals
Scored/Game.” The algorithm will determine the probability of a specific difference in goals
scored by team on and team two. The sum of the probabilities of team one winning by any
number of goals can be summed to represent the total probability of team one winning. The same
goes for the probability of team two winning. After determining the total probability of team a
and team two winning by any number of goals, the team with the higher probability can be
expected to win.
The third game prediction algorithm used the total probability of a specific team winning
found using Skellam Distribution, total number of trials, and number of successes over those
trials. Negative Binomial Distribution can be defined as
This algorithm determines the probability, represented as P, of a certain number of successes,
represented as r, over a specified number of trials, represented as k. Using the probability from
Skellam of a specific team one beating team two, and the number of games those teams have
played each other over a season as k, one can determine the probability of team one winning k
number of times against team two. Utilizing the series of Chicago playing St. Louis, Dallas and
Nashville through out the season we were able to calculate the percentage of games each team
would win both home and away.
Overall, our results were not as good as we thought we could get from our algorithms.
We admit to the fact that the way we determined which team would win based off of statistical
evidence and choosing a higher percentage led to some inaccuracy. We noticed that a lot of the
percentages were pretty close to one another, so sometimes the team that wasn’t predicted to win
actually won, due to their closeness in percentages. Our predictions for the four or five game
series between Chicago, Dallas, St. Louis, and Nashville often were able to get 2 or 3 of the
games right, which is not bad for a small sample size. In our future work we would like to
incorporate a larger sample set and use Chicago versus any team over the entire season and add
more factors to influence two teams other than pure statistical goals scored and allowed.
In the future we would like to incorporate more data into our algorithms which would
allow us to predict games for all teams in the NHL. Additionally, we would also like to include
other variables or factors into the algorithms such as power plays, injuries, penalty minutes,
games played in a row, fan attendance and so on. We think that by adding in some of these
different factors we could help to more accurately predict the outcome of games. Furthermore,
we would also like to develop a way to model a prediction graph using live game results. It
would be neat to see how a graph could change utilizing real time stats and goals. The algorithms
we designed intrigued our interest in this problem and we hope to continue our progress in the
future in solving this problem.