Kentucky Derby - Neas

Kentucky Derby
Historical Winning Times
By: xxxxx xxxxx
Fall 2012
1/16/2013
Introduction:
The Kentucky Derby is an annual race held on the first Saturday of May for three-yearold thoroughbreds. The race takes place in Louisville, Kentucky, and always draws an excellent
crowd of spectators, who notoriously wear large, fancy hats and drink mint juleps. The
Kentucky Derby is one and a quarter miles in length, and is often called “The Most Exciting Two
Minutes in Sports.” The race has gained great popularity over the years, and now without a
doubt is the most famous thoroughbred race in the United States. Over the past couple years; I
have started to follow this race closer, and quite enjoy the festivities and analysis related
around it. Therefore, I thought this project would be a great opportunity to learn more about
the race, and determine if there has been any time trend in the “Most Exciting Two Minutes in
Sports.”
Data:
For the data I wanted to make sure it was credible, so I obtained the winning race times
from ESPN. The data can be seen in the following hyperlink, or on the “Data” tab of my excel
workbook http://espn.go.com/sports/horse/topics/_/page/kentucky-derby. The fields
contained within the data are the year, winning horse, jockey, trainer, and winning time in
minutes. The only data that was used for the analysis was the year and winning time, but for
informative purposes I included the name of those involved in winning this prestigious race.
Using this data I created a plot of year vs. winning time (in minutes), which can be seen below,
and also on the “Time vs. Year” tab.
Kentucky Derby Winner Times (In Minutes)
2.90
2.80
2.70
Race Time in Minutes
2.60
2.50
2.40
2.30
2.20
2.10
2.00
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1955
1950
1945
1940
1935
1930
1925
1920
1915
1910
1905
1900
1895
1890
1885
1880
1875
1.90
Year
Model Specification:
The later years, roughly 1875 to 1895, are much higher than the recent years, and after
1985 there is a sharp decrease to stable winning times around 2.05 minutes. Since the data
doesn’t appear to exhibit any strong positive or negative trends after this decrease in 1895, the
data appears to be stationary. In order to verify that this is indeed the case, I examined the
autocorrelation to see if this is painting the same picture. These autocorrelation calculations
can be seen in the “Autocorrelation Calc” tab, in addition to the following Correlogram depicted
below.
Autocorrelation
1
0.8
Auto Correlation
0.6
0.4
0.2
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
0
-0.2
-0.4
Lag
Auto-Regression Analysis:
Looking at the Correlogram, since the autocorrelations decline geometrically to zero, we
are able to confirm that the data is stationary, and also that an autoregressive model might be
p
a good fit. The autoregressive model is as follows: yt     i yt i   t
i 1
Therefore, using this knowledge, I decided to examine the AR(1), AR(2), and AR(3)
model. All three of these models can be seen in the subsequent sections.
AR(1) Model:
Using the data from the “AR(P) Calc” tab and the data analysis package within Excel, I
was able to determine the parameters for the AR(1) model. A summary of the output can be
seen below, as well as on the “AR(1) Regression” Tab.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.960666311
R Square
0.922879762
Adjusted R Square 0.922308501
Standard Error
0.061141762
Observations
137
ANOVA
df
Regression
Residual
Total
Intercept
ф₁
1
135
136
SS
MS
F
Significance F
6.039297654 6.039297654 1615.513 5.45753E-77
0.50467254 0.003738315
6.543970195
Coefficients Standard Error
t Stat
P-value
Lower 95%
Upper 95% Lower 95.0% Upper 95.0%
0.056327617
0.052642993 1.069992674 0.286532 -0.047784022 0.160439257 -0.047784022 0.160439257
0.975915585
0.024280464 40.19344804 5.46E-77 0.927896299 1.023934871 0.927896299 1.023934871
Using this output we are able to determine that ф = .976 and δ = .0563. Since the
equation for AR(1) is as follows: Y₊ = δ + фY₊₋₁ , the process is Y₊ = .0563 + .976Y₊₋₁. We see
that the absolute value of ф is less than one, which further proves that our process is
stationary. The AR(1) process gives a strong R², low standard error, and high F-Statistic, which
implies that it is a good fit. However, in order to investigate if it is the best fit, I will continue
with the AR(2) and AR(3) processes.
AR(2) Model:
Similar to the procedure used for the AR(1) process, I used the data from the “AR(P)
Calc” tab and the data analysis package within Excel to determine the parameters for the AR(2)
model. A summary of the output can be seen below, as well as on the “AR(2) Regression” Tab.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.960764127
R Square
0.923067708
Adjusted R Square 0.921919464
Standard Error
0.061294653
Observations
137
ANOVA
df
Regression
Residual
Total
2
134
136
SS
MS
F
Significance F
6.040527566 3.020264 803.8956651 2.34035E-75
0.503442629 0.003757
6.543970195
Coefficients Standard Error t Stat
P-value
Lower 95%
0.05606173
0.052776678 1.062244 0.290035411 -0.048321342
0.95979658
0.037231404 25.77922 8.66837E-54 0.886159352
0.016388538
0.028643487 0.572156 0.568175046 -0.040263288
Intercept
ф₁
ф₂
Upper 95%
0.160444802
1.033433808
0.073040365
Lower 95.0%
-0.048321342
0.886159352
-0.040263288
Upper 95.0%
0.160444802
1.033433808
0.073040365
Using this output we are able to determine that ф₁ = .9598, ф₂ = .016 and δ = .0561.
Since the equation for AR(2) is as follows: Y₊ = δ + ф₁Y₊₋₁ + ф₂Y₊₋₂ , the process is Y₊ = .0561 +
.9598Y₊₋₁ +.016Y₊₋₂.
AR(3) Model:
Lastly, using the procedure mentioned above, we were able to determine the
parameters for the AR(3) model, and they can be seen below, as well as on the “AR(3)
Regression” tab.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.960769808
R Square
0.923078623
Adjusted R Square 0.921343554
Standard Error
0.061520288
Observations
137
ANOVA
df
Regression
Residual
Total
Intercept
ф₁
ф₂
ф₃
3
133
136
SS
MS
F
Significance F
6.040598996 2.013532999 532.0127368 7.43565E-74
0.503371198 0.003784746
6.543970195
Coefficients Standard Error
t Stat
0.056035689
0.052971296 1.05785006
0.959873729
0.037372677 25.6838363
0.020237463
0.040142651 0.50413869
-0.0039501
0.028753088 -0.137380022
P-value
Lower 95%
Upper 95% Lower 95.0% Upper 95.0%
0.292041365 -0.048739482 0.160810859 -0.048739482 0.160810859
2.11499E-53 0.885952022 1.033795436 0.885952022 1.033795436
0.614998044 -0.059163144 0.09963807 -0.059163144 0.09963807
0.890938179 -0.060822593 0.052922393 -0.060822593 0.052922393
Using this output we are able to determine that ф₁ = .9599, ф₂ = .02, ф₃ = -.004 and δ =
.0561. Since the equation for AR(3) is as follows: Y₊ = δ + ф₁Y₊₋₁ + ф₂Y₊₋₂ + ф₃Y₊₋₃ , the process
is Y₊ = .0561 + .9599Y₊₋₁ +.02Y₊₋₂ -.004Y₊₋₃.
Analysis:
Now that we’ve run the AR(1), AR(2), and AR(3) model, we must determine which model
does the best job at explaining the data. In order for a model to have a good fit, it will have a
high R², adjusted R², and F-Statistic, while having a low standard error, and significance of FStatistic. The following table compares these items and it can also be seen on the “Model
Comparison” tab.
Adjusted
Process
Standard
R Squared
Significance
F-Statistic
R Squared
Error
of F-Statistic
AR(1)
0.9228798 0.9223085 0.061142 1615.5133
5.45753E-77
AR(2)
0.9230677 0.9219195 0.061295 803.89567
2.34035E-75
AR(3)
0.9230786 0.9213436
7.43565E-74
0.06152
532.01274
Conclusion:
Using the Table above, we see that the AR(1) model has the highest Adjusted R² and FStatistic, while having the lowest Standard Error and Significance of F-Statistic. Although the
AR(1) model doesn’t have the highest R² value, it is extremely close to the R² of the AR(2) and
AR(3) models. Therefore, the best fit of the data is achieved through the AR(1) model. A visual
representation of this fit can be seen in the graph below and on the “AR(1) Actual vs. Predicted”
tab, which compares the actual data to the predicted data.
1875
1880
1885
1890
1895
1900
1905
1910
1915
1920
1925
1930
1935
1940
1945
1950
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
2010
Race Time in Minutes
Kentucky Derby Actual Vs. Predicted Winning Times
2.80
2.60
2.40
Actual
2.20
Predicted
2.00
1.80
Year