Empirical Project: NFL Home-Field Advantage 1 NFL Home-Field Advantage: Does It Exist? Shawn Rembecky The College of New Jersey ECO 231-05 Applied Business Statistics Dr. David W. Letcher December 4, 2009 Empirical Project: NFL Home-Field Advantage 2 Table of Contents I. Purpose of the Study .................................................................................................................... 3 II. Review of the Literature ............................................................................................................. 4 III. Formulation of the Models........................................................................................................ 7 IV. Data Sources and Description ................................................................................................. 12 V. Analysis and Interpretation of Results ..................................................................................... 13 VI. Interpretation of the Analyses, Predictions, and Conclusions ................................................ 19 VII. Suggestions for Future Research ........................................................................................... 20 VIII. References ............................................................................................................................ 21 IX. “Home” Dataset ...................................................................................................................... 22 X. “Home” Dataset ....................................................................................................................... 23 XI. “Away” Dataset ...................................................................................................................... 24 XII. “Away” Dataset ..................................................................................................................... 25 Empirical Project: NFL Home-Field Advantage 3 I. Purpose of the Study Being I am a marketing major aspiring to one day work for a professional football team, I chose to analyze how much an advantage a National Football League, NFL, team has over their opponent when playing in their own stadium as opposed to playing on the road. Whether it is because the home team does not travel, they are very familiar with their home stadium’s particulars, their fans make a big difference, or any number of other theories, I hypothesize the study’s results will show that NFL home teams have a distinct advantage over the visiting team. In order to do this, a number of variables that affect the number of games an NFL team wins at home will be examined: the number of points the home team scored, the number of points the visiting team scored, the number of offensive and defensive penalties committed by the home team, how many games were nationally televised while the team was at home, the average percentage of home attendance, and whether or not the home stadium has a roof. This first dataset will be referred to as the “Home” data. These same variables will also be examined when the team is on the road to show how much of a disadvantage they are at against the home team. This second dataset will be referred to as the “Away” data. It should be noted that the data collected for this study is derived from the NFL 2005 regular season. Also, due to Hurricane Katrina, the New Orleans Saints did not have access to their home stadium, the Louisiana Superdome, during the 2005 NFL season. Empirical Project: NFL Home-Field Advantage 4 II. Review of the Literature Prior to creating and analyzing the “Home” and “Away” data, a literature review was performed in order to determine what research and knowledge of the NFL home field advantage already exists. Below are the titles of each article discovered during the literature review and the summary of the findings from each of the articles. NFL Home Field Advantage Research The statisticians of TwoMinuteWarning.com analyzed the supposed NFL home field advantage by gathering data from the 1999 to 2002 seasons. They looked at the effect of NFL home field advantage (HFA) week-by-week by using a ‘smoothed’ average of their calculated HFAs. The results showed a clear home field advantage: From Weeks 1 to Week 12, the home team scored an average of 1.3 more points than the visiting team. Then, from Week 13 on the home team’s point differential jumped to an enormous 5.3 more points scored than the visiting team. The statisticians of TwoMinuteWarning.com concluded that not only does a home field advantage exist in the NFL, but it appears as though something drastic is happening in the closing games of the season that would indicate playing at home has more of an advantage at the end of the season than it does during the first two-thirds of the season. Since the study was based on only four NFL seasons, the statisticians of TwoMinuteWarning.com concluded further study is still needed. NFL Home Field Advantage and Team Strength The statistician of AdvancedNFLStats.com hypothesizes that the supposed NFL home field advantage is increasingly evident when teams are more evenly matched. In order to Empirical Project: NFL Home-Field Advantage 5 conduct this study, data was gathered from the 2002 to 2005 seasons. All of the match-ups were categorized as either being “good vs. good,” where two playoff-caliber teams played against each other, “good vs. bad,” one playoff-caliber team against a weak team, and “bad vs. bad,” where two weak teams played against each other. A graph then plotted the home team’s winning percentage against the season win total differential between the two teams playing. The graph’s least squares line clearly showed that as the season win total differential between the two teams playing decreased (both teams finished with relatively the same record), the home team’s winning percentage significantly increased. However, the statistician of AdvancedNFLStats.com also admits that since the plotted data on the graph are not consistently smooth, there is a good deal of randomness involved. Home No Big NFL Advantage Lately Author Larry Weisman takes a closer look into how NFL home field advantage has changed over the last several seasons. At the time this article was published, all four home teams had just been swept in the wild-card round of the 2006 Playoffs for the first time since it last happened in the 2002 Playoffs. The year before, the 2005 Playoffs, three of the four home teams had lost in the wild-card round. Weisman points out that “from 1993-2002, home teams went 75-25 (.750) in the playoffs. During 2003-2005: 16-14 (.533).” Although there may be evidence of home field advantage during the NFL regular season, Weisman concludes that it virtually disappears during the playoffs. Empirical Project: NFL Home-Field Advantage 6 Home Not So Sweet Anymore in NFL Author G.E. Branch III notices a shift in the effectiveness of NFL home field advantage during the regular season. Although some teams like Denver (high-altitude), Miami (humidity) and Pittsburgh (reigning champs, enthusiastic fans) have managed to maintain their home field edge by rating a 3½ to 4-point advantage for playing at home, other teams St. Louis, Tampa Bay, Cleveland, and Detroit are noticeably lacking the loud, passionate, loyal fans the more successful teams at home all have in common. As Branch also points out, while debuting a brand new stadium is expected to excite a fan base, those teams are only .500 this decade. Branch attributes this shift to the “spreading of talent since liberalized free agency began in 1993.” Rather than rooting for an entire team like fans have traditionally done in the past, fans are rooting for individual players and cheering them on the road—decreasing the true value of home field advantage in the NFL. Empirical Project: NFL Home-Field Advantage 7 III. Formulation of the Models During the process of discovering which predictor variables were good predictors or not, six separate models were created for both “Home” and “Away” datasets. Here is how our two final models were formed: First, we look at our original dataset of our “Home” data. We will call this dataset “Home Stats – 1.” Since there were eight predictor variables, the first version of the model shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average home wins: E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x7 + β8x8 E(y) = βo + β1(ENC) + β2(FS) + β3(NZI) + β4(OFF) + β5(OFK) + β6(nationally televised) + β7(home attendance) + β8(roof?) The predictors are defined as follows: ENC FS NZI OFF OFK nationally televised home attendance roof? = encroachment penalty committed at home = false start penalty committed at home = neutral zone infraction penalty committed at home = offside penalty committed at home = offside on free kick penalty committed at home = number of home games that were nationally televised = average percentage of home attendance = 1 if the home stadium as a roof, 0 if not However, since all but one predictor, home attendance, were deemed as poor predictors, a second dataset was created to group all penalties into either offensive or defensive penalties. We will call this dataset “Home Stats – 2.” After the grouping, there were five predictor variables. The first version of the model below shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average home wins: E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 E(y) = βo + β1(offensive) + β2(defensive) + β3(nationally televised) + β4(home attendance) + β5(roof?) Empirical Project: NFL Home-Field Advantage 8 The new predictors, “offensive” and “defensive,” are defined as follows: offensive defensive = offensive penalty committed at home = defensive penalty committed at home Again, all but one predictor, home attendance, were deemed as poor predictors. It became apparent that perhaps more data would have to be added in order to discover good predictor variables. This revelation prompted the addition of two new predictors, “points scored” and “opp points scored,” to the third dataset. We will call this dataset “Home Stats – 3.” With the addition of the two predictors, there we seven predictor variables. The first version of the model below shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average home wins: E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x7 E(y) = βo + β1(points scored) + β2(opp points scored) + β3(offensive) + β4(defensive) + β5(nationally televised) + β6(home attendance) + β7(roof?) The new predictors, “points scored” and “opp points scored,” are defined as follows: points scored opp points scored = sum of all points scored by the team playing at home = sum of all points scored by the team playing on the road Although “Home Stats – 3” resulted in the highest R2 value, three of the seven predictors were still deemed as poor predictors. The final three adjustments made to the dataset were consecutive omissions of the three poor predictors. In doing so, there was hope that omission of one, or two, may result in the second, or third, predictor becoming a good predictor. Unfortunately, this did not happen. The final dataset, “Home Stats – 6,” consisted of the four good predictor variables and the final model was created. The first version of the model below shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average home wins: Empirical Project: NFL Home-Field Advantage 9 E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 E(y) = βo + β1(points scored) + β2(opp points scored) + β3(nationally televised) + β4(home attendance) Next, we will look at our original dataset of our “Away” data. We will call this first dataset “Away Stats – 1.” Since there were six predictor variables, the first version of the model shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average away wins: E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 E(y) = βo + β1(ENC) + β2(FS) + β3(NZI) + β4(OFF) + β5(OFK) + β6(nationally televised) The predictors are defined as follows: ENC FS NZI OFF OFK nationally televised = encroachment penalty committed on the road = false start penalty committed on the road = neutral zone infraction penalty committed on the road = offside penalty committed on the road = offside on free kick penalty committed on the road = number of games on the road that were nationally televised Similar to our “home data,” all of the predictor variables proved to be poor predictors and a second dataset was created to group all penalties into either offensive or defensive penalties. We will call this dataset “Away Stats – 2.” After the grouping, there were but three predictor variables. The first version of the model below shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average away wins: E(y) = βo + β1x1 + β2x2 + β3x3 E(y) = βo + β1(offensive) + β2(defensive) + β3(nationally televised) The new predictors, “offensive” and “defensive,” are defined as follows: offensive defensive = offensive penalty committed on the road = defensive penalty committed on the road Empirical Project: NFL Home-Field Advantage 10 Once again, all three predictor variables were deemed as poor predictors. Like our “home data,” it became apparent that perhaps more data would have to be added in order to discover good predictor variables. This revelation prompted the addition of two new predictors, “points scored” and “opp points scored,” to the third dataset. We will call this dataset “Away Stats – 3.” With the addition of the two predictors, there we five predictor variables. The first version of the model below shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average away wins: E(y) = βo + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 E(y) = βo + β1(points scored) + β2(opp points scored) + β3(offensive) + β4(defensive) + β5(nationally televised) The new predictors, “points scored” and “opp points scored,” are defined as follows: points scored opp points scored = sum of all points scored by the team playing on the road = sum of all points scored by the team playing at home Although “Away Stats – 3” resulted in the highest R2 value, three of the five predictors were still deemed as poor predictors. The final three adjustments made to the dataset were consecutive omissions of the three poor predictors. In doing so, there was hope that omission of one, or two, may result in the second, or third, predictor becoming a good predictor. Unfortunately, this did not happen. The final dataset, “Away Stats – 6,” consisted of only two good predictor variables and the final model was created. The first version of the model below shows each of the numbered x variables while the second version shows the names of each variable where E(y) = average away wins: E(y) = βo + β1x1 + β2x2 E(y) = βo + β1(points scored) + β2(opp points scored) Empirical Project: NFL Home-Field Advantage 11 Interestingly, while the “nationally televised” variable for the “Home” model proved to be a good predictor of “home wins,” the “nationally televised” variable for the “Away” model proved to be a very poor predictor of “away wins.” This may suggest that nationally televised home games play a very influential factor into home field advantage, even though the opposing team is indifferent to playing in a nationally televised game on the road. Empirical Project: NFL Home-Field Advantage 12 IV. Data Sources and Description Data used for this study was collected from several sources. Three separate datasets provided data regarding the win-loss record for each team, final scores of each game, and penalties committed by each team per game. Other data, such as how many points a team scored at home or on the road and a count of how many offensive or defensive penalties were committed per team, were also derived from the three datasets. Unfortunately, the origin of the three datasets is currently unknown. Websites such as NFL.com and ESPN.com provided home and away win-loss records, fan attendance, the number of nationally televised games the teams played in during the season, and whether the team’s home stadium had a roof or not. Empirical Project: NFL Home-Field Advantage 13 V. Analysis and Interpretation of Results Upon formulating two final models, a multiple regression analysis was performed in order to determine the validity of an NFL home field advantage. First, we look at our original dataset of our “Home” data. The multiple regression analysis revealed a strong R2 of 0.8654 which means that 86.54% of the total variation in home wins is accounted for by using the sum of all points scored by the team playing at home, the sum of all points scored by the team playing on the road, the number of home games that were nationally televised, and the average percentage of home attendance in the regression analysis. Clearly, all four variables, when used together, are significant predictors of home wins. The residual plots of each predictor were examined to show that the numbers were drawn randomly from a normal distribution. Below are the four plots of the predictor variables. "opppointsscored"ResidualPlot 2 2 1 1 0 -1 0 100 -2 200 300 400 Residuals Residuals "pointsscored"ResidualPlot 0 -1 -2 pointsscored 2 2 1 1 0 -1 -2 0 1 2 3 na2onallytelevised 4 100 200 300 opppointsscored "homea7endance"ResidualPlot 5 Residuals Residuals "na2onallytelevised"ResidualPlot 0 0 0.00% 25.00% 50.00% 75.00% 100.00% -1 -2 homea7endance Empirical Project: NFL Home-Field Advantage 14 In order for these residual plots to deemed valid, approximately 95% of the residuals must lie within 2•Standard Error of the zero line, no concave trends, and no fanning patterns. The red lines in each of the residual plots resemble 2•Standard Error. For the “Home” data, 2•Standard Error = 1.4152. Each of the plots contain approximately 95% of their residuals within 2•Standard Error of the zero line. There also appear to be no concave trends. With the exception of the “home attendance” residual plot, there are no fanning patterns. A test for multi-collinearity was also performed and yielded results that would suggest multi-collinearity is not occurring in the “Home” data. Large correlation coefficients were not detected between pairs of predictors. Then, a global F-test for assessment of overall fit showed we have a good fit between the regression equation and the data: HO: β1 = β2 = β3 = β4 = 0 HA: at least one β ≠ 0 The p-value for F = 2.211 • 10-11 ≈ 0.0000 → Good fit! Also, all four predictors have small p-values: points scored opp points scored nationally televised home attendance = 1.903 • 10-8 = 6.752 • 10-9 = 0.009 = 0.014 < < < < α = 0.05 α = 0.05 α = 0.05 α = 0.05 Good predictor! Good predictor! Good predictor! Good predictor! Since our four predictors have proven their worth as good predictors of home wins, our model will now be used for prediction. An average of the teams’ data was taken with the intent to be used as the values for the prediction equation. The first version of the model below shows the names of each variable while the second version has the coefficients and averaged data values substituted for the variable names where E(y) = average home wins: E(y) = βo + β1(points scored) + β2(opp points scored) + β3(nationally televised) + β4(home attendance) E(y) = 1.1076 + 0.0274(186.19) + (-0.0314(156.75)) + (-0.299(1.41)) + 4.0509(0.9504) E(y) = 4.7156 home wins
© Copyright 2026 Paperzz