Empirical Project: NFL Home-Field Advantage 1 NFL

Empirical Project: NFL Home-Field Advantage 1
NFL Home-Field Advantage:
Does It Exist?
Shawn Rembecky
The College of New Jersey
ECO 231-05 Applied Business Statistics
Dr. David W. Letcher
December 4, 2009
Empirical Project: NFL Home-Field Advantage 2
Table of Contents
I. Purpose of the Study .................................................................................................................... 3
II. Review of the Literature ............................................................................................................. 4
III. Formulation of the Models........................................................................................................ 7
IV. Data Sources and Description ................................................................................................. 12
V. Analysis and Interpretation of Results ..................................................................................... 13
VI. Interpretation of the Analyses, Predictions, and Conclusions ................................................ 19
VII. Suggestions for Future Research ........................................................................................... 20
VIII. References ............................................................................................................................ 21
IX. “Home” Dataset ...................................................................................................................... 22
X. “Home” Dataset ....................................................................................................................... 23
XI. “Away” Dataset ...................................................................................................................... 24
XII. “Away” Dataset ..................................................................................................................... 25
Empirical Project: NFL Home-Field Advantage 3
I. Purpose of the Study
Being I am a marketing major aspiring to one day work for a professional football team, I
chose to analyze how much an advantage a National Football League, NFL, team has over their
opponent when playing in their own stadium as opposed to playing on the road. Whether it is
because the home team does not travel, they are very familiar with their home stadium’s
particulars, their fans make a big difference, or any number of other theories, I hypothesize the
study’s results will show that NFL home teams have a distinct advantage over the visiting team.
In order to do this, a number of variables that affect the number of games an NFL team
wins at home will be examined: the number of points the home team scored, the number of
points the visiting team scored, the number of offensive and defensive penalties committed by
the home team, how many games were nationally televised while the team was at home, the
average percentage of home attendance, and whether or not the home stadium has a roof. This
first dataset will be referred to as the “Home” data. These same variables will also be examined
when the team is on the road to show how much of a disadvantage they are at against the home
team. This second dataset will be referred to as the “Away” data.
It should be noted that the data collected for this study is derived from the NFL 2005
regular season. Also, due to Hurricane Katrina, the New Orleans Saints did not have access to
their home stadium, the Louisiana Superdome, during the 2005 NFL season.
Empirical Project: NFL Home-Field Advantage 4
II. Review of the Literature
Prior to creating and analyzing the “Home” and “Away” data, a literature review was
performed in order to determine what research and knowledge of the NFL home field advantage
already exists. Below are the titles of each article discovered during the literature review and the
summary of the findings from each of the articles.
NFL Home Field Advantage Research
The statisticians of TwoMinuteWarning.com analyzed the supposed NFL home field
advantage by gathering data from the 1999 to 2002 seasons. They looked at the effect of NFL
home field advantage (HFA) week-by-week by using a ‘smoothed’ average of their calculated
HFAs. The results showed a clear home field advantage: From Weeks 1 to Week 12, the home
team scored an average of 1.3 more points than the visiting team. Then, from Week 13 on the
home team’s point differential jumped to an enormous 5.3 more points scored than the visiting
team. The statisticians of TwoMinuteWarning.com concluded that not only does a home field
advantage exist in the NFL, but it appears as though something drastic is happening in the
closing games of the season that would indicate playing at home has more of an advantage at the
end of the season than it does during the first two-thirds of the season. Since the study was based
on only four NFL seasons, the statisticians of TwoMinuteWarning.com concluded further study
is still needed.
NFL Home Field Advantage and Team Strength
The statistician of AdvancedNFLStats.com hypothesizes that the supposed NFL home
field advantage is increasingly evident when teams are more evenly matched. In order to
Empirical Project: NFL Home-Field Advantage 5
conduct this study, data was gathered from the 2002 to 2005 seasons. All of the match-ups were
categorized as either being “good vs. good,” where two playoff-caliber teams played against
each other, “good vs. bad,” one playoff-caliber team against a weak team, and “bad vs. bad,”
where two weak teams played against each other. A graph then plotted the home team’s winning
percentage against the season win total differential between the two teams playing. The graph’s
least squares line clearly showed that as the season win total differential between the two teams
playing decreased (both teams finished with relatively the same record), the home team’s
winning percentage significantly increased. However, the statistician of AdvancedNFLStats.com
also admits that since the plotted data on the graph are not consistently smooth, there is a good
deal of randomness involved.
Home No Big NFL Advantage Lately
Author Larry Weisman takes a closer look into how NFL home field advantage has
changed over the last several seasons. At the time this article was published, all four home teams
had just been swept in the wild-card round of the 2006 Playoffs for the first time since it last
happened in the 2002 Playoffs. The year before, the 2005 Playoffs, three of the four home teams
had lost in the wild-card round. Weisman points out that “from 1993-2002, home teams went
75-25 (.750) in the playoffs. During 2003-2005: 16-14 (.533).” Although there may be evidence
of home field advantage during the NFL regular season, Weisman concludes that it virtually
disappears during the playoffs.
Empirical Project: NFL Home-Field Advantage 6
Home Not So Sweet Anymore in NFL
Author G.E. Branch III notices a shift in the effectiveness of NFL home field advantage
during the regular season. Although some teams like Denver (high-altitude), Miami (humidity)
and Pittsburgh (reigning champs, enthusiastic fans) have managed to maintain their home field
edge by rating a 3½ to 4-point advantage for playing at home, other teams St. Louis, Tampa Bay,
Cleveland, and Detroit are noticeably lacking the loud, passionate, loyal fans the more successful
teams at home all have in common. As Branch also points out, while debuting a brand new
stadium is expected to excite a fan base, those teams are only .500 this decade. Branch attributes
this shift to the “spreading of talent since liberalized free agency began in 1993.” Rather than
rooting for an entire team like fans have traditionally done in the past, fans are rooting for
individual players and cheering them on the road—decreasing the true value of home field
advantage in the NFL.
Empirical Project: NFL Home-Field Advantage 7
III. Formulation of the Models
During the process of discovering which predictor variables were good predictors or not,
six separate models were created for both “Home” and “Away” datasets. Here is how our two
final models were formed:
First, we look at our original dataset of our “Home” data. We will call this dataset
“Home Stats – 1.” Since there were eight predictor variables, the first version of the model
shows each of the numbered x variables while the second version shows the names of each
variable where E(y) = average home wins:
E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x7 + β8x8
E(y) = βo + β1(ENC) + β2(FS) + β3(NZI) + β4(OFF) + β5(OFK) + β6(nationally televised) +
β7(home attendance) + β8(roof?)
The predictors are defined as follows:
ENC
FS
NZI
OFF
OFK
nationally televised
home attendance
roof?
= encroachment penalty committed at home
= false start penalty committed at home
= neutral zone infraction penalty committed at home
= offside penalty committed at home
= offside on free kick penalty committed at home
= number of home games that were nationally televised
= average percentage of home attendance
= 1 if the home stadium as a roof, 0 if not
However, since all but one predictor, home attendance, were deemed as poor predictors, a
second dataset was created to group all penalties into either offensive or defensive penalties. We
will call this dataset “Home Stats – 2.” After the grouping, there were five predictor variables.
The first version of the model below shows each of the numbered x variables while the second
version shows the names of each variable where E(y) = average home wins:
E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5
E(y) = βo + β1(offensive) + β2(defensive) + β3(nationally televised) + β4(home attendance) +
β5(roof?)
Empirical Project: NFL Home-Field Advantage 8
The new predictors, “offensive” and “defensive,” are defined as follows:
offensive
defensive
= offensive penalty committed at home
= defensive penalty committed at home
Again, all but one predictor, home attendance, were deemed as poor predictors. It
became apparent that perhaps more data would have to be added in order to discover good
predictor variables. This revelation prompted the addition of two new predictors, “points
scored” and “opp points scored,” to the third dataset. We will call this dataset “Home Stats – 3.”
With the addition of the two predictors, there we seven predictor variables. The first version of
the model below shows each of the numbered x variables while the second version shows the
names of each variable where E(y) = average home wins:
E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x7
E(y) = βo + β1(points scored) + β2(opp points scored) + β3(offensive) + β4(defensive) +
β5(nationally televised) + β6(home attendance) + β7(roof?)
The new predictors, “points scored” and “opp points scored,” are defined as follows:
points scored
opp points scored
= sum of all points scored by the team playing at home
= sum of all points scored by the team playing on the road
Although “Home Stats – 3” resulted in the highest R2 value, three of the seven predictors
were still deemed as poor predictors. The final three adjustments made to the dataset were
consecutive omissions of the three poor predictors. In doing so, there was hope that omission of
one, or two, may result in the second, or third, predictor becoming a good predictor.
Unfortunately, this did not happen. The final dataset, “Home Stats – 6,” consisted of the four
good predictor variables and the final model was created. The first version of the model below
shows each of the numbered x variables while the second version shows the names of each
variable where E(y) = average home wins:
Empirical Project: NFL Home-Field Advantage 9
E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4
E(y) = βo + β1(points scored) + β2(opp points scored) + β3(nationally televised) + β4(home
attendance)
Next, we will look at our original dataset of our “Away” data. We will call this first
dataset “Away Stats – 1.” Since there were six predictor variables, the first version of the model
shows each of the numbered x variables while the second version shows the names of each
variable where E(y) = average away wins:
E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6
E(y) = βo + β1(ENC) + β2(FS) + β3(NZI) + β4(OFF) + β5(OFK) + β6(nationally televised)
The predictors are defined as follows:
ENC
FS
NZI
OFF
OFK
nationally televised
= encroachment penalty committed on the road
= false start penalty committed on the road
= neutral zone infraction penalty committed on the road
= offside penalty committed on the road
= offside on free kick penalty committed on the road
= number of games on the road that were nationally televised
Similar to our “home data,” all of the predictor variables proved to be poor predictors and
a second dataset was created to group all penalties into either offensive or defensive penalties.
We will call this dataset “Away Stats – 2.” After the grouping, there were but three predictor
variables. The first version of the model below shows each of the numbered x variables while
the second version shows the names of each variable where E(y) = average away wins:
E(y) = βo + β1x1 + β2x2 + β3x3
E(y) = βo + β1(offensive) + β2(defensive) + β3(nationally televised)
The new predictors, “offensive” and “defensive,” are defined as follows:
offensive
defensive
= offensive penalty committed on the road
= defensive penalty committed on the road
Empirical Project: NFL Home-Field Advantage 10
Once again, all three predictor variables were deemed as poor predictors. Like our
“home data,” it became apparent that perhaps more data would have to be added in order to
discover good predictor variables. This revelation prompted the addition of two new predictors,
“points scored” and “opp points scored,” to the third dataset. We will call this dataset “Away
Stats – 3.” With the addition of the two predictors, there we five predictor variables. The first
version of the model below shows each of the numbered x variables while the second version
shows the names of each variable where E(y) = average away wins:
E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5
E(y) = βo + β1(points scored) + β2(opp points scored) + β3(offensive) + β4(defensive) +
β5(nationally televised)
The new predictors, “points scored” and “opp points scored,” are defined as follows:
points scored
opp points scored
= sum of all points scored by the team playing on the road
= sum of all points scored by the team playing at home
Although “Away Stats – 3” resulted in the highest R2 value, three of the five predictors
were still deemed as poor predictors. The final three adjustments made to the dataset were
consecutive omissions of the three poor predictors. In doing so, there was hope that omission of
one, or two, may result in the second, or third, predictor becoming a good predictor.
Unfortunately, this did not happen. The final dataset, “Away Stats – 6,” consisted of only two
good predictor variables and the final model was created. The first version of the model below
shows each of the numbered x variables while the second version shows the names of each
variable where E(y) = average away wins:
E(y) = βo + β1x1 + β2x2
E(y) = βo + β1(points scored) + β2(opp points scored)
Empirical Project: NFL Home-Field Advantage 11
Interestingly, while the “nationally televised” variable for the “Home” model proved to
be a good predictor of “home wins,” the “nationally televised” variable for the “Away” model
proved to be a very poor predictor of “away wins.” This may suggest that nationally televised
home games play a very influential factor into home field advantage, even though the opposing
team is indifferent to playing in a nationally televised game on the road.
Empirical Project: NFL Home-Field Advantage 12
IV. Data Sources and Description
Data used for this study was collected from several sources. Three separate datasets
provided data regarding the win-loss record for each team, final scores of each game, and
penalties committed by each team per game. Other data, such as how many points a team scored
at home or on the road and a count of how many offensive or defensive penalties were
committed per team, were also derived from the three datasets. Unfortunately, the origin of the
three datasets is currently unknown. Websites such as NFL.com and ESPN.com provided home
and away win-loss records, fan attendance, the number of nationally televised games the teams
played in during the season, and whether the team’s home stadium had a roof or not.
Empirical Project: NFL Home-Field Advantage 13
V. Analysis and Interpretation of Results
Upon formulating two final models, a multiple regression analysis was performed in
order to determine the validity of an NFL home field advantage. First, we look at our original
dataset of our “Home” data. The multiple regression analysis revealed a strong R2 of 0.8654
which means that 86.54% of the total variation in home wins is accounted for by using the sum
of all points scored by the team playing at home, the sum of all points scored by the team playing
on the road, the number of home games that were nationally televised, and the average
percentage of home attendance in the regression analysis. Clearly, all four variables, when used
together, are significant predictors of home wins.
The residual plots of each predictor were examined to show that the numbers were drawn
randomly from a normal distribution. Below are the four plots of the predictor variables.
"opppointsscored"ResidualPlot
2
2
1
1
0
-1
0
100
-2
200
300
400
Residuals
Residuals
"pointsscored"ResidualPlot
0
-1
-2
pointsscored
2
2
1
1
0
-1
-2
0
1
2
3
na2onallytelevised
4
100
200
300
opppointsscored
"homea7endance"ResidualPlot
5
Residuals
Residuals
"na2onallytelevised"ResidualPlot
0
0
0.00% 25.00% 50.00% 75.00% 100.00%
-1
-2
homea7endance
Empirical Project: NFL Home-Field Advantage 14
In order for these residual plots to deemed valid, approximately 95% of the residuals
must lie within 2•Standard Error of the zero line, no concave trends, and no fanning patterns.
The red lines in each of the residual plots resemble 2•Standard Error. For the “Home” data,
2•Standard Error = 1.4152. Each of the plots contain approximately 95% of their residuals
within 2•Standard Error of the zero line. There also appear to be no concave trends. With the
exception of the “home attendance” residual plot, there are no fanning patterns.
A test for multi-collinearity was also performed and yielded results that would suggest
multi-collinearity is not occurring in the “Home” data. Large correlation coefficients were not
detected between pairs of predictors. Then, a global F-test for assessment of overall fit showed
we have a good fit between the regression equation and the data:
HO: β1 = β2 = β3 = β4 = 0
HA: at least one β ≠ 0
The p-value for F = 2.211 • 10-11 ≈ 0.0000 → Good fit!
Also, all four predictors have small p-values:
points scored
opp points scored
nationally televised
home attendance
= 1.903 • 10-8
= 6.752 • 10-9
= 0.009
= 0.014
<
<
<
<
α = 0.05
α = 0.05
α = 0.05
α = 0.05
Good predictor!
Good predictor!
Good predictor!
Good predictor!
Since our four predictors have proven their worth as good predictors of home wins, our
model will now be used for prediction. An average of the teams’ data was taken with the intent
to be used as the values for the prediction equation. The first version of the model below shows
the names of each variable while the second version has the coefficients and averaged data
values substituted for the variable names where E(y) = average home wins:
E(y) = βo + β1(points scored) + β2(opp points scored) + β3(nationally televised) + β4(home
attendance)
E(y) = 1.1076 + 0.0274(186.19) + (-0.0314(156.75)) + (-0.299(1.41)) + 4.0509(0.9504)
E(y) = 4.7156 home wins