Mathematics 344
“Goodness of Fit”
February 20, 2015
Model: multinomial with n trials and parameters π1 , . . . , πk .
Data: o1 , . . . , ok (the “observed”), number of outcomes in each class.
Null Hypothesis: the null hypothesis is a model for π1 , . . . , πk .
Test statistics: compute e1 , . . . , ek , the “expected” frequencies under the null hypothesis.
χ2 =
X (oi − ei )2
ei
G=2
X
oi ln
oi
ei
Under the null hypothesis, each of these has approximately a chi-square distribution with p degrees of freedom (p is
determined by the nature of the null hypothesis)
All the work is computing ei , the expected value under the null hypothesis.
Example: NBA free-throw shooting percentages.
Null hypothesis: free-throw shooting percentages in the NBA are beta-distributed.
ft <- mutate(freethrow, PCT <- FTM/FTA)
cutpts <- c(0, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 1)
obs <- tally(cut(ft$PCT, cutpts))
First argument should be a formula...
But I’ll try to guess what you meant
obs
(0,0.6] (0.6,0.65] (0.65,0.7] (0.7,0.75] (0.75,0.8] (0.8,0.85] (0.85,0.9]
3
8
14
19
33
37
18
(0.9,1]
5
params <- fitdistr(ft$PCT, "beta", start = list(shape1 = 10, shape2 = 5))$estimate
pihat <- diff(pbeta(cutpts, params[1], params[2]))
exp <- pihat * sum(obs)
exp
[1]
2.932
6.076 13.720 24.504 33.428 32.374 19.068
4.899
chi <- sum((obs - exp)^2/exp)
G <- 2 * sum(obs * log(obs/exp))
chi
[1] 2.581
G
[1] 2.602
Dimension of null hypothesis: 2 Dimension of paramter space: k − 1 = 8
Mathematics 344
“Goodness of Fit”
February 20, 2015
1 - pchisq(chi, 6)
[1] 0.8592
1 - pchisq(G, 6)
[1] 0.8569
p <- densityplot(~PCT, data = ft, plot.points = F)
plotDist("beta", params = params, add = T, col = "red")
5
Density
4
3
2
1
0
0.4
0.6
0.8
1.0
PCT
Subtle issue: which maximum likelihood problem to solve to find parameters? Maximize the likelihood of the null
hypothesis given the count data, not the raw data!
loglik <- function(p) sum(obs * log(diff(pbeta(cutpts, p[1], p[2]))))
maxlik <- nlmax(loglik, p = c(21, 6))
alpha <- maxlik$estimate[1]
beta <- maxlik$estimate[2]
pihat <- diff(pbeta(cutpts, shape1 = alpha, shape2 = beta))
exp <- sum(obs) * pihat
G <- 2 * sum(obs * log(obs/exp))
G
[1] 2.513
1 - pchisq(G, 6)
[1] 0.867
Similarly, minimize the value of the chisquare statistic.
© Copyright 2026 Paperzz