1 | Page - Troubadour Research and Consulting

1|Page
Contents
Introduction .................................................................................................................................................. 3
Methodology................................................................................................................................................. 4
Assumptions of the Model ............................................................................................................................ 6
Analysis ......................................................................................................................................................... 7
Further Analysis ............................................................................................................................................ 9
Discussion.................................................................................................................................................... 12
Why ............................................................................................................................................................. 12
Figures
Figure 1 - Play Calling by Down 2000-2013 ................................................................................................. 4
Figure 2 - Number of Drives Per Game ........................................................................................................ 5
Figure 3 – Simulation Score Differential: Benchmarking Test .................................................................... 7
Figure 4 – Simulation Score Differential: Never Punt Test ......................................................................... 8
Figure 5 – Winning Percentages: Benchmark and Never Punt Test ........................................................... 8
Figure 6 – Winning Percentages: Never Punt After the 50 Test ................................................................. 9
Figure 7 – Simulation Score Differential: Never Punt After 50 Test ......................................................... 10
Figure 8 – Winning Percentages: Never Punt Never Kick ......................................................................... 11
Figure 9 – Simulation Score Differential: Never Punt After 50 Test ......................................................... 11
2|Page
Introduction
Likely anyone who has played a football video game began with a super aggressive strategy that always
involved going for it on fourth down. This year, I even named my fantasy football team “The
Neverpunting Story.” Primarily, I was looking for a fun, quotable name with some sort of football pun
(the object of every fantasy team name, in my humble opinion), but it got me to thinking. Have you ever
wondered what would happen if an NFL team changed its strategy to never punt?
With a big surge in publicly available data and advanced analytics over the last few years throughout the
sports world, armchair coaches across the country now have the numbers to back up their stand-up, fistshaking pleas. “It’s 4th and 1! Just go for it!” Kevin Kelley, head coach of the Pulaski Academy football
team in Little Rock, Arkansas wouldn’t think to do any different. Featured by Sports Illustrated1, ESPN2,
the New York Times3, and dozens of other sports media sites/companies, Coach Kelley is turning the
football playbook inside out by rarely punting and calling for on-side kicks more often than not. In a
video by produced by Grantland4, Coach Kelley talks about the statistics that led him to question
traditional play-calling and in the process become one of the most unconventional coaches in football.
Late in the 2013 NFL season, the New York Times launched the NYT 4th Down Bot5. Using NFL data from
over the last 10 years, the 4th Down Bot has developed a model6 to determine when teams should and
shouldn’t go for it on 4th down instead of punting. The bot even live tweets7 during NFL games about
what decision it would have made on any given team’s 4th down situation (as it points out to its 14,000+
followers in its Twitter bio, it “mostly tweet[s] disagreements”).
In this analysis, we are only looking at the decision to never punt, rather than a broader change in
kicking strategy (e.g. never field goal, always onside kick). Committing to never punt the ball away
means more than simply going for it on fourth down; the psychology behind play calling would be
different as well. Currently, first down is treated as the “feeling out” down; see what can happen.
Second down is a response to first down, but coaches are allowed to be more aggressive. Third down is
a conservative down, where the objective is to convert to a first down. If the objective were to never
punt, that down psychology would change in that third down could also be aggressive like second down,
and fourth down becomes the conservative down.
1
http://www.si.com/more-sports/2011/09/15/kelley-pulaski
2
http://espn.go.com/espn/playbook/story/_/id/8307736/tmq-praises-coach-punt-celebrates-innovative-mindfootball
3
http://www.nytimes.com/2012/08/19/sports/football/calculating-footballs-risk-of-not-punting-on-fourthdown.html
4
https://www.youtube.com/watch?v=AGDaOJAYHfo
5
http://nyt4thdownbot.com/
6
http://www.nytimes.com/2014/09/05/upshot/4th-down-when-to-go-for-it-and-why.html
7
https://twitter.com/nyt4thdownbot
3|Page
Figure 1 - Play Calling by Down 2000-2013
The purpose of this report is to provide an analysis and an estimate of the value of a punter to the team.
Methodology
Using NFL play data from 2010 to 2013, we performed an analysis to understand how teams, on
average, make play calls and drive the ball down the field. The NFL play database was downloaded from
ArmchairAnalysis.com. The end goal of the analysis was to run a simulation whereby the play calling
strategy is changed for one of the teams in the simulation.
The data were filtered to exclude plays from the fourth quarter of the game or overtime. The
assumption was that play calling philosophies change more dramatically in the fourth quarter depending
on score differential and whether the team is winning or losing. Since the goal is to determine a longer
term effect on removing the punter from the game, it’s more important to test assuming the first three
quarters of the game. Certainly, by that point a team could begin to play more conservatively and begin
punting if they were ahead, and likewise a more aggressive play from the losing team may also make the
‘never punt’ strategy more or less successful.
For first down, second down, and third down, play calling strategy was estimated as a function of the
current down and yards-to-go using a logistic regression model to determine pass or run. Field position
was added as an additional explanatory variable, but there was no meaningful difference with the
addition of the variable since yards-to-go already contained the necessary information if they were close
to the goal line. (Note: because the sample size of plays was so large, many differences were statistically
significant, so variable selection was based on analytical judgment.)
4|Page
Fourth down play calling used a multinomial logit model to predict play choice probabilities. In this case,
play choice is regarded as a function of yards-to-go and field position.
Play outcomes were modeled by first determining the categorical outcome of the play: touchdown, pick
6 (which was used as a category defining any defensive touchdown), fumble lost, interception, sack,
safety, field goal attempt, punt, complete pass, incomplete pass, or run. Each outcome was analyzed
separately to determine the probability distributions for yardage (or yard differential in the case of
punts and turnovers).
Having the analytical inputs for the model, a simulation algorithm was created to simulate, play by play,
two teams playing against each other. The experiment was set up where the two simulated teams
would play a game of football. However, the rules for the test environment were such that:
•
•
There were no penalties, and
Instead of a timed game, the match was set up whereby there were a total of 10 drives
(after the 10th drive, the game ended).
We regarded these rules to be sufficient for addressing the research objective as to whether or not a
team could have long-term success with a ‘never punt’ strategy.
One thousand simulations were run to assess how the two teams with different strategies performed.
Often Monte Carlo simulations involve as many as 50,000 or 100,000 iterations, however, the
complexity of this simulation had a significant impact on processing time. One thousand games meant
10,000 drives to simulate and resulted in over 68,000 individual plays. While much of the post analysis
is conducted at the game level, 1,000 is sufficient for statistical testing and drawing conclusions. Those
tests are conducted at the 99% confidence level.
Figure 2 - Number of Drives Per Game
5|Page
With a typical game containing between 20 and 27 drives, the results of this analysis would refer to a
little less than half of a typical NFL game.
Assumptions of the Model
Due to the nature of the analysis and simulations, many factors are held constant including home field
advantage/disadvantage, weather, and turf type.
The analysis holds constant the talent and play calling philosophies of specific teams, so it should be
regarded that all findings are among an average offense playing an average defense. It’s likely that no
game has existed between two teams both exhibiting average skill and philosophies on both offense and
defense, but the point here is to identify whether or not, in the long run, a ‘never-punt’ strategy could
have legs (so to speak).
Further, philosophies and play calling aggression do not change as the game progresses. Because the
purpose is to assess long-term viability of the strategy, we deem it acceptable to hold this constant.
Therefore, the analysis should be regarded as only pertaining to first half play, or perhaps first three
quarters.
Finally, a key assumption being made with regard to play calling among the ‘never punt’ team (hereafter
referred to as the Treatment team) treats third down decisions as though they were second down
decisions and fourth down decisions as third down (that is, those decisions are based on the analysis
pertaining to the prior down). It may be that a real team employing the strategy would evolve toward
different second, third, and fourth down strategies in a way that would be more effective. If so, this
analysis would be considered more conservative.
6|Page
Analysis
Figure 3 – Simulation Score Differential: Benchmarking Test
In the build-up to Super Bowl XLVIII between the Seattle Seahawks and Denver Broncos, the match-up
was being touted as one of the closest match-ups in Super Bowl history. Betting lines had a spread as
large as 3pts (-2.5) and as small as 1pt (-0.5), with some giving the edge to the Seahawks, others to the
Broncos. The outcomes being predicted by Vegas and experts were far from unanimous. The final score
was something few, if any, predicted: Seattle cruised to a 43-8 victory over Denver. For a match-up that
was supposed to be a coin toss in most people’s eyes, the final score seemed to show otherwise.
However, that doesn’t mean the game wasn’t truly a “coin toss”.
In our simulation of two evenly matched teams playing 10 drives, the average score differential was
almost exactly zero with a standard deviation of 9.7 points. Revisiting the Super Bowl XLVIII score
differential of 35 points and scaling it to a 10 drive game by dividing it by 1.9 (there were 19 drives in the
Super Bowl), we get a score differential of 18.4 points. In our simulation, 94.9% of games concluded with
a score differential of 18 or less. Put another way, if the Seahawks and Broncos were perfectly matched,
then there was a 5% probability of seeing the game end with 35 point differential (or worse) going for
either team winning. But then, perhaps the two were not perfectly matched.
7|Page
Figure 4 – Simulation Score Differential: Never Punt Test
In our “Never Punt” simulation, the score differentials looked a little different. On average, a “Never
Punt” team playing against an equivalent NFL team will lose 2.4 points over a 10 drives, or 5.6 points in
an average NFL game.
Figure 5 – Winning Percentages: Benchmark and Never Punt Test
8|Page
Looking at the winning percentages in the benchmark simulation, both the control and treatment teams
won nearly the same percentage of games (46% and 45% respectively, tying the other 9%), as one would
expect of two equal teams playing each other. When moving to the “Never Punt” simulation, the control
team won more than half of games played and, again, the teams tying 9% of the time. Now the
treatment teams wins slightly more than a third of games. A Chi-Square analysis returned a test value of
less than 0.0001, indicating that the difference in the treatment team’s winning percentage between the
two simulations is statistically significant (strongly so at that). There seems to be a simple conclusion –
the never punt strategy is a losing one. However, let’s look at the games the treatment team does win.
Of their wins, the “Never Punt” treatment team won 38% of games by 9 or more points, the equivalent
of 21+ points in an average NFL game which is considered by most as a blowout victory. Almost twothirds, 62%, of wins were by 6 or more points, or 14+ points in a NFL game. Looking at their losses, 56%
were by 9 or more points and 73% by 6 or more points.
The take-away is that there remains a lot of variability in a game. There are lots of opportunities to
make a play and keep a drive going. Sometimes a run of unsuccessful plays stops you. In the long run,
the “Never Punt” strategy helps marginally in keeping the drive alive, but having that run of bad luck can
have more serious consequences.
Further Analysis
While we’ve shown that, in the long run, for
an average team, firing the punter would
not be a good idea, we’ve also tried a
couple of modifications of the test. In an
additional test, we modified the play calling
rule to be that they never punt after
crossing the 50 yard line. This provides a
greater chance that the opposing team, if
giving up the ball on 4th down, is not easily
within field goal range.
9|Page
Figure 6 – Winning Percentages: Never Punt After the 50 Test
Figure 7 – Simulation Score Differential: Never Punt After 50 Test
While the mean score left the Treatment team down by six-tenths of a point, the difference was not
statistically different from zero. Similarly, the win/loss/tie percentages were not statistically different
from the benchmark test. From these simulations, there’s not enough evidence to conclude if a team
would be better off or worse off by never punting after the 50. Such a strategy would likely have the
same end result as the typical strategy coaches employ (although we have not accounted for team
specific variables that might allow this to be more or less successful).
On the other hand, we ran another test, going even more aggressive. We conducted the simulation as
though the coach fired the kicker as well as the punter.
10 | P a g e
Figure 8 – Winning Percentages: Never Punt Never Kick
Figure 9 – Simulation Score Differential: Never Punt After 50 Test
If your team fired the punter and the kicker, you would have a right to be angry. While the peak is at a 4
point differential in favor of the Treatment team, you can see that there are more significant wins
overall on the Control side. In fact, the Treatment team lost by an average of 2 points, and only won
41% of the time (compared to the benchmark of 45%).
What’s interesting is that this suggests that if a coach fired the punter, he might as well go ahead and
fire the kicker as well, and commit to going for it on every 4th down.
11 | P a g e
Discussion
There are certainly other modifications we can make to the strategy that might prove to be a productive
move in the 4th down strategy of NFL teams. There may also be particular team dynamics that would
allow for a never-punt, modified never-punt (i.e. never after the 50), or a never-punt-never-kick
strategy.
If there’s a particular strategy you’d like us to test, please let us know and we’ll run it and post it to our
blog.
Why
So why did we do this?
1.
2.
We love sports, and watching how competitive strategies are implemented and measured –
Sports are real world experiments, success or fail, every game;
We love data, and are really passionate about finding multiple sources of data, linking them
together, and gaining some new level of understanding;
So to conclude, we did this because we love data, sports, and we are really good at linking them
together. But that doesn’t just go for sports data – we love all data, measuring strategies, suggesting
implementations, and giving YOU a deeper level of understanding. Whether it be <sports reference> or
your consumer behavior and loyalty, this is what we love. And we are really good at it.
Troubadour Research & Consulting is a marketing sciences and research firm dedicated to helping
businesses plot a data-driven strategy to better connect with their target customer and stay ahead of
the competition. “Big data” has become a big buzzword over the past couple of years, and that means
there are many new sources for understanding consumer behavior and preferences. We make sense out
of that data.
For a free quote or consultation on how Troubadour can improve your business strategy, visit us at
www.troubadourconsulting.com/contact-us.
12 | P a g e