Data Mining Cave-Ins

Data Mining Cave-Ins
Roland Minton
Roanoke College
Golf by the Numbers, JHU Press
• System of volunteers and lasers
• 1.2 million shots (20,000 rounds) per
year, 2004-present
• PGA Tour events only
{R,PGATOUR,2008,380,01035,745,027,
Tommy,Armour III,01,The Barclays,Ridgewood
Country Club,
15, 3,03,155,01,S,TeeBox,,Green,
Unknown,05171,05400,N,N,,
00000000228,0907,Good,With,Level,
9,922.2740, 8,232.4720,
74.6350,00022,00000}
Where Did the Ball Stop?
• It started 5171 inches away.
• It traveled 5400 inches.
• It ended 228 inches away.
• The ball went too far (long).
• The ball was not on line.
Where Did the Ball Stop?
Where Did the Ball Stop?
Where Did the Ball Stop?
Da Secret Code
• It actually is the result of rounding.
• When I “unrounded” the data,
Da Secret Code
• The inner branch is approximately
Da Secret Code
• If B changes to B – 1 and x = 0,
Assume that B, d about the same,
and convert to feet.
Coming Up Short
Means of y-values
Different Strokes
Std deviations of y-values
Going Offline
Std.
Means
Dev. of
of|x|-values
|x|-values
Correlations
• Sort the data by tournament.
• Correlate percentage of putts made
from a given distance and score.
• The first time I did this I got
Correlations
•
•
•
•
•
•
•
From 0-3 feet, -.150
From 3-4 feet, -.104
From 4-5 feet, -.061
From 5-6 feet, -.034
From 10-15 feet, -.053
From 15-20 feet, .110
From 20-25 feet, .229
Correlations
• From 20-25 feet, the scatter plot is
Correlations
• When only those who made the cut
were included in the calculation,
• From 15-20 feet, -.051
• From 20-25 feet, -.008
• From 25-30 feet, -.096
• What does this mean?
Correlations
Lake Wobegon Open
•
•
•
•
Ranking system for bunker (sand) play:
For given player, for each bunker shot
Compare “score” to “average score”.
Add up all differences and divide by
total number of bunker shots.
Lake Wobegon Open
•
•
•
•
•
•
Compare “score” to “average score”.
Distance before B, distance after A
Replace A with tour avg. # putts from A
Find tour avg. distance after from B
Replace with tour avg. # putts
Difference is value of shot.
Lake Wobegon Open
•
•
•
•
•
Ranking of tour players looks right.
In 2008, #1 Mike Weir 0.147
The sum of all the ratings is 1.084.
(These guys are good!)
So A – A > 0 !
Lake Wobegon Open
•
•
•
•
Two problems with averaging.
1) The function avgputts(d) is not linear
but is concave down.
So 5 feet better beats 5 feet worse.
Lake Wobegon Open
• Two problems with averaging.
• 2) “All” is not the same as “all.”
• The tour average is ALL shots, but I
only computed ratings for the most
active 230 players.
• For this type of shot, the regular
players are better than the irregulars.
Any Questions?