Document

Modeling March Madness
Roland Minton
Roanoke College
NCAA Basketball Tournament
Four Regions of 16
Teams Seeded 1-16
1 v 16, 8 v 9, etc.
Advance in bracket
Is #8 better than #10?
- Properly seeded?
- Fair structure?
First Round Results
100 regions, 1991-2015
#1 100, #16 0; #2 93, #15 7; …
First Round Results
Why 5 v 12? Why linear?
#5 is “big” school, #12 not
Overall #5 wins 64%
Big #5s win 71%
Medium #5s win 40%
Med #5 v Small #12 31%
Medium #5s are bad bets.
Small #12s are hard to rate.
Other Fun Facts
Similar pattern for 6 v 11 games
Third round: 1 v 12 15-0
2 v 11 8-0
Finals 1 v 2 17-17
Two Models
Logistic (Elo)
A,B Ratings a, b
P(A) = 1 / (1+e-.24(a-b))
Log 5 (James)
A,B base probs p1, p2
P(A) = p1(1-p2) /
(p1(1-p2)+ p2(1-p1))
Find ratings to minimize SSE over all games 1986-2015.
e.g., if model predicts #1 v #12 to win 13.5/15,
error is 1.5, SE is 2.25.
Model Results
12 higher than 8,9,11
10 is even higher
7 higher than 5,6c
This is the result of:
- Bad seeding
- Over/underperforming
Model Results
Fit models to idealized
(linear) results. Here,
#9 > #10 > #11 > #12 …
Notice S-shape near
edges.
Model Results
Compute probabilities of seeds reaching given round.
Reach second round (win round 1):
Model Results
Reach third round (win rounds 1,2):
Model Results
Win regional (win rounds 1,2,3,4):
Conclusions
The Madness has a high level of statistical regularity.
The seeding structure creates some inequities.
There is evidence of mis-seeding and/or consistent
over/under performance.
Thank You!
Any Questions?