Modeling March Madness Roland Minton Roanoke College NCAA Basketball Tournament Four Regions of 16 Teams Seeded 1-16 1 v 16, 8 v 9, etc. Advance in bracket Is #8 better than #10? - Properly seeded? - Fair structure? First Round Results 100 regions, 1991-2015 #1 100, #16 0; #2 93, #15 7; … First Round Results Why 5 v 12? Why linear? #5 is “big” school, #12 not Overall #5 wins 64% Big #5s win 71% Medium #5s win 40% Med #5 v Small #12 31% Medium #5s are bad bets. Small #12s are hard to rate. Other Fun Facts Similar pattern for 6 v 11 games Third round: 1 v 12 15-0 2 v 11 8-0 Finals 1 v 2 17-17 Two Models Logistic (Elo) A,B Ratings a, b P(A) = 1 / (1+e-.24(a-b)) Log 5 (James) A,B base probs p1, p2 P(A) = p1(1-p2) / (p1(1-p2)+ p2(1-p1)) Find ratings to minimize SSE over all games 1986-2015. e.g., if model predicts #1 v #12 to win 13.5/15, error is 1.5, SE is 2.25. Model Results 12 higher than 8,9,11 10 is even higher 7 higher than 5,6c This is the result of: - Bad seeding - Over/underperforming Model Results Fit models to idealized (linear) results. Here, #9 > #10 > #11 > #12 … Notice S-shape near edges. Model Results Compute probabilities of seeds reaching given round. Reach second round (win round 1): Model Results Reach third round (win rounds 1,2): Model Results Win regional (win rounds 1,2,3,4): Conclusions The Madness has a high level of statistical regularity. The seeding structure creates some inequities. There is evidence of mis-seeding and/or consistent over/under performance. Thank You! Any Questions?
© Copyright 2026 Paperzz