Kentucky Derby Historical Winning Times By: xxxxx xxxxx Fall 2012 1/16/2013 Introduction: The Kentucky Derby is an annual race held on the first Saturday of May for three-yearold thoroughbreds. The race takes place in Louisville, Kentucky, and always draws an excellent crowd of spectators, who notoriously wear large, fancy hats and drink mint juleps. The Kentucky Derby is one and a quarter miles in length, and is often called “The Most Exciting Two Minutes in Sports.” The race has gained great popularity over the years, and now without a doubt is the most famous thoroughbred race in the United States. Over the past couple years; I have started to follow this race closer, and quite enjoy the festivities and analysis related around it. Therefore, I thought this project would be a great opportunity to learn more about the race, and determine if there has been any time trend in the “Most Exciting Two Minutes in Sports.” Data: For the data I wanted to make sure it was credible, so I obtained the winning race times from ESPN. The data can be seen in the following hyperlink, or on the “Data” tab of my excel workbook http://espn.go.com/sports/horse/topics/_/page/kentucky-derby. The fields contained within the data are the year, winning horse, jockey, trainer, and winning time in minutes. The only data that was used for the analysis was the year and winning time, but for informative purposes I included the name of those involved in winning this prestigious race. Using this data I created a plot of year vs. winning time (in minutes), which can be seen below, and also on the “Time vs. Year” tab. Kentucky Derby Winner Times (In Minutes) 2.90 2.80 2.70 Race Time in Minutes 2.60 2.50 2.40 2.30 2.20 2.10 2.00 2010 2005 2000 1995 1990 1985 1980 1975 1970 1965 1960 1955 1950 1945 1940 1935 1930 1925 1920 1915 1910 1905 1900 1895 1890 1885 1880 1875 1.90 Year Model Specification: The later years, roughly 1875 to 1895, are much higher than the recent years, and after 1985 there is a sharp decrease to stable winning times around 2.05 minutes. Since the data doesn’t appear to exhibit any strong positive or negative trends after this decrease in 1895, the data appears to be stationary. In order to verify that this is indeed the case, I examined the autocorrelation to see if this is painting the same picture. These autocorrelation calculations can be seen in the “Autocorrelation Calc” tab, in addition to the following Correlogram depicted below. Autocorrelation 1 0.8 Auto Correlation 0.6 0.4 0.2 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 0 -0.2 -0.4 Lag Auto-Regression Analysis: Looking at the Correlogram, since the autocorrelations decline geometrically to zero, we are able to confirm that the data is stationary, and also that an autoregressive model might be p a good fit. The autoregressive model is as follows: yt i yt i t i 1 Therefore, using this knowledge, I decided to examine the AR(1), AR(2), and AR(3) model. All three of these models can be seen in the subsequent sections. AR(1) Model: Using the data from the “AR(P) Calc” tab and the data analysis package within Excel, I was able to determine the parameters for the AR(1) model. A summary of the output can be seen below, as well as on the “AR(1) Regression” Tab. SUMMARY OUTPUT Regression Statistics Multiple R 0.960666311 R Square 0.922879762 Adjusted R Square 0.922308501 Standard Error 0.061141762 Observations 137 ANOVA df Regression Residual Total Intercept ф₁ 1 135 136 SS MS F Significance F 6.039297654 6.039297654 1615.513 5.45753E-77 0.50467254 0.003738315 6.543970195 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.056327617 0.052642993 1.069992674 0.286532 -0.047784022 0.160439257 -0.047784022 0.160439257 0.975915585 0.024280464 40.19344804 5.46E-77 0.927896299 1.023934871 0.927896299 1.023934871 Using this output we are able to determine that ф = .976 and δ = .0563. Since the equation for AR(1) is as follows: Y₊ = δ + фY₊₋₁ , the process is Y₊ = .0563 + .976Y₊₋₁. We see that the absolute value of ф is less than one, which further proves that our process is stationary. The AR(1) process gives a strong R², low standard error, and high F-Statistic, which implies that it is a good fit. However, in order to investigate if it is the best fit, I will continue with the AR(2) and AR(3) processes. AR(2) Model: Similar to the procedure used for the AR(1) process, I used the data from the “AR(P) Calc” tab and the data analysis package within Excel to determine the parameters for the AR(2) model. A summary of the output can be seen below, as well as on the “AR(2) Regression” Tab. SUMMARY OUTPUT Regression Statistics Multiple R 0.960764127 R Square 0.923067708 Adjusted R Square 0.921919464 Standard Error 0.061294653 Observations 137 ANOVA df Regression Residual Total 2 134 136 SS MS F Significance F 6.040527566 3.020264 803.8956651 2.34035E-75 0.503442629 0.003757 6.543970195 Coefficients Standard Error t Stat P-value Lower 95% 0.05606173 0.052776678 1.062244 0.290035411 -0.048321342 0.95979658 0.037231404 25.77922 8.66837E-54 0.886159352 0.016388538 0.028643487 0.572156 0.568175046 -0.040263288 Intercept ф₁ ф₂ Upper 95% 0.160444802 1.033433808 0.073040365 Lower 95.0% -0.048321342 0.886159352 -0.040263288 Upper 95.0% 0.160444802 1.033433808 0.073040365 Using this output we are able to determine that ф₁ = .9598, ф₂ = .016 and δ = .0561. Since the equation for AR(2) is as follows: Y₊ = δ + ф₁Y₊₋₁ + ф₂Y₊₋₂ , the process is Y₊ = .0561 + .9598Y₊₋₁ +.016Y₊₋₂. AR(3) Model: Lastly, using the procedure mentioned above, we were able to determine the parameters for the AR(3) model, and they can be seen below, as well as on the “AR(3) Regression” tab. SUMMARY OUTPUT Regression Statistics Multiple R 0.960769808 R Square 0.923078623 Adjusted R Square 0.921343554 Standard Error 0.061520288 Observations 137 ANOVA df Regression Residual Total Intercept ф₁ ф₂ ф₃ 3 133 136 SS MS F Significance F 6.040598996 2.013532999 532.0127368 7.43565E-74 0.503371198 0.003784746 6.543970195 Coefficients Standard Error t Stat 0.056035689 0.052971296 1.05785006 0.959873729 0.037372677 25.6838363 0.020237463 0.040142651 0.50413869 -0.0039501 0.028753088 -0.137380022 P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.292041365 -0.048739482 0.160810859 -0.048739482 0.160810859 2.11499E-53 0.885952022 1.033795436 0.885952022 1.033795436 0.614998044 -0.059163144 0.09963807 -0.059163144 0.09963807 0.890938179 -0.060822593 0.052922393 -0.060822593 0.052922393 Using this output we are able to determine that ф₁ = .9599, ф₂ = .02, ф₃ = -.004 and δ = .0561. Since the equation for AR(3) is as follows: Y₊ = δ + ф₁Y₊₋₁ + ф₂Y₊₋₂ + ф₃Y₊₋₃ , the process is Y₊ = .0561 + .9599Y₊₋₁ +.02Y₊₋₂ -.004Y₊₋₃. Analysis: Now that we’ve run the AR(1), AR(2), and AR(3) model, we must determine which model does the best job at explaining the data. In order for a model to have a good fit, it will have a high R², adjusted R², and F-Statistic, while having a low standard error, and significance of FStatistic. The following table compares these items and it can also be seen on the “Model Comparison” tab. Adjusted Process Standard R Squared Significance F-Statistic R Squared Error of F-Statistic AR(1) 0.9228798 0.9223085 0.061142 1615.5133 5.45753E-77 AR(2) 0.9230677 0.9219195 0.061295 803.89567 2.34035E-75 AR(3) 0.9230786 0.9213436 7.43565E-74 0.06152 532.01274 Conclusion: Using the Table above, we see that the AR(1) model has the highest Adjusted R² and FStatistic, while having the lowest Standard Error and Significance of F-Statistic. Although the AR(1) model doesn’t have the highest R² value, it is extremely close to the R² of the AR(2) and AR(3) models. Therefore, the best fit of the data is achieved through the AR(1) model. A visual representation of this fit can be seen in the graph below and on the “AR(1) Actual vs. Predicted” tab, which compares the actual data to the predicted data. 1875 1880 1885 1890 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Race Time in Minutes Kentucky Derby Actual Vs. Predicted Winning Times 2.80 2.60 2.40 Actual 2.20 Predicted 2.00 1.80 Year
© Copyright 2026 Paperzz