Statistical analysis of the FoPV duck race

Statistical analysis of the FoPV duck race
1. The Duck Race
There are N numbered ducks in the race. The first w ducks finishing the race determine the winning
numbers. From the random nature of the race (mixing of the ducks in a bag at the start, random
currents, eddies and obstacles in the river, etc), we can assume these w ducks are a random sample
from the N ducks, and each duck has equal probability 1/N of winning. These assumptions are
denoted by H.
Duck numbers are sold more than once in m stages of selling. Efforts are made to use up one set of
numbers (one per duck) before moving on to the next stage of selling. Prizes are provided for every
winning 'ticket': eg if there are three tickets for duck number i, and duck i wins, then three 1st prizes
are awarded. Consequently, there can be more than w winning tickets as each winning number
could have been sold several times, which may lead to post-race adjustment of w to suit the number
of prizes.
In the sixth duck race of 2014, N = 2500, m = 3 and w = 20. A total of 7,223 tickets were sold: so
2223 numbers were sold 3 times, and 277 were sold twice.
2. Probabilities of winning prizes
2.1 Winning 1st prize
Consider the simple proposition Di ≡ 'duck numbered i finishes first'. The propositions D1, D2, …,
DN form an equi-probable partition given H: i.e. the propositions are mutually exclusive, exhaustive
and have equal probabilities. It follows that the probability of Di is
1
P (Di ∣H )=
(2.1)
N
and the probability of the proposition E ≡ 'one of n specified ducks finishes first' is
n
P (E ∣H )=
(2.2)
N
Consider now the proposition Ti ,j ≡ 'ticket numbered i, sold in stage j of selling, wins a 1st prize'.
The propositions T1, j, T2, j, …, TN, j form a partition given j. Here there is a one-to-one
correspondence between tickets and ducks, so the partition is equi-probable given H and
1
P (T i , j ∣ j∧H )=
(2.3)
N
and the probability of the proposition F ≡ 'one of n tickets wins a 1st prize' is
n
P (F ∣ j∧ H )=
(2.4)
N
Now consider the proposition G ≡ 'one or more of n tickets wins a 1st prize where the tickets are
sold from different stages of selling'. Let the sample of n tickets consist of n1 tickets sold from the
1st stage of selling, n2 tickets from the 2nd stage etc, so that n1 + n2 + … + nm = n.
Let Tj ≡ 'a ticket sold in stage j of selling wins a 1st prize'. Then, since each stage of selling forms a
partition, and assuming the samples of tickets are random and mutually independent,
1
m
m
j=1
j =1
P (G ∣H )=1−P (¬G ∣H )=1−∏ P(¬T j ∣H )=1−∏
Rewriting (2.4) as
P (F ∣ j∧ H )=1−
N −n j
N
(2.5)
N −n
N
it's clear that P ( F ∣ j∧ H )>P (G ∣H ) since n > ni, for i = 1, 2, …, m. So it seems a better strategy
to buy tickets from one stage of selling, rather than randomly buy tickets from more than one stage.
This ensures all tickets have different numbers of course. The effects of buying tickets with the
same number is clear from the following. Suppose a punter buys two tickets, both for duck i, in
stages 1 and 2 of selling. Consider the probability of Ti, 1 or Ti, 2, or both tickets winning. By the
disjunction theorem, this probability is
P (T i ,1∨T i ,2 ∣H )=P (T i ,1 ∣H )+P (T i , 2 ∣H )−P (T i ,1 ∣H ) P (T i ,2 ∣T i ,1∧H )
But given Ti, 1 is true, then Ti, 2 is true and P (T i ,2 ∣T i ,1∧H )=1 , so
P (T i ,1∨T i ,2 ∣H )=P (T i ,2 ∣H )
This proves what may be obvious, that buying two tickets for the same duck has the same chance of
winning a 1st prize as buying one ticket. On the other hand, it offers the chance of winning two 1st
prizes. In general, the probability of winning m 1st prizes by buying m tickets for the same duck is
P (T i ,1∨T i ,2 ∨...∨T i ,m ∣H )=
1
N
(2.6)
Example 1
Boy B buys six tickets for the 2014 race. What is the probability one of his tickets wins a 1st prize?
(i) Assuming B's six numbers are all from one stage of selling, (2.4) gives a probability of exactly
0.0024, which is odds of between 415 and 416 to 1.
(ii) Suppose B randomly bought two tickets from each of the three stages of selling. The probability
of winning a 1st prize, from (2.5), would then be 0.002398 to six decimal places, which is odds of
slightly over 416 to 1.
(iii) Suppose B bought three pairs of tickets for three different ducks – the tickets in each pair for
the same duck. The three pairs are mutually exclusive, so the probability of winning two 1st prizes,
using (2.6), would be 3/2500 = 0.0008, which is odds of 1249 to 1.
2.2 Winning x prizes
Now consider prizes corresponding to all w winning ducks, and the proposition that in a sample of
n tickets, x tickets win a prize. This is the familiar situation of sampling without replacement from a
population of N elements consisting of w successes and (N − w) failures, also known as a deal of n
from (w, N − w). The probability of obtaining x winning tickets is then the binary hypergeometric
probability
2
n N −n
(
x )( w−x )
P ( X =x ∣ w∧H )=
( Nw )
(2.7)
where X is a random variable having a hypergeometric distribution and range [max(0, w + n – N) ,
n!
min(n, w)]. The bracketed expressions are binomial coefficients, eg n ≡
.
x
!
(n−x
)!
x
()
The probability is made conditional on w, to avoid, for the moment, the complication that w may
depend on the numbers of times the winning numbers were sold.
Consider now the m stages of selling. Let the sample of n tickets consist of n1 tickets sold from the
1st stage of selling, n2 tickets from the 2nd stage etc, so that n1 + n2 + … + nm = n. Let x1 be the
number of winning numbers from the 1st stage of selling, x2 winning number from the 2nd stage etc,
so that the total number of winning tickets in the sample is
x1 + x2 + … + xm = x
(2.8)
Each stage of selling forms a deal of ni from (w, N − w), so the probability of xi winning tickets is
P ( X =x i ∣w∧H )=
( )( )
ni
xi
N −ni
w−x i
(2.9)
()
N
w
for xi on the interval [max(0, w + ni – N) , min(ni, w)] and i = 1, 2, …, m. (Note that this probability
is no different in the mth stage, when fewer than N tickets are sold. The total number of tickets sold
in any stage is irrelevant – the deal of ni tickets always relates to N numbers.)
It's reasonable to assume the stages of selling tickets are mutually independent, so the probability of
the conjunction of x1, x2 , …, xm winning tickets is
m
P (( X = x1 )∧( X = x 2)∧...∧( X = x m)∣ w∧ H )=∏ P ( X =x i ∣ w∧H )
(2.10)
i =1
There are various combinations of x1, x2 , …, xm satisfying (2.8). Since these are mutually exclusive,
we sum the probabilities (2.10) for all such combinations to obtain the probability of a total of x
winning tickets:
m
P ( X =x ∣ w∧H )= ∑ [ ∏ P( X = xi ∣ w∧H )]
a ∈A
(2.11)
i=1
where the summation is over all m-tuple members of set A defined by
m
A={a :a=( x 1 , x 2 ,... , x m ), x i ∈ℤ , max (0, w+ni – N )≤ x i≤min( ni , w) , ∑ x i =x }
i=1
When w = 1, (2.7) and (2.11) give the probabilities of winning 1st prizes. The right-hand side of
3
(2.7) reduces to n/N, as given by (2.2), when w = 1 and x = 1 (the only possible value of x).
Probability (2.11) can be used to calculate the probability of obtaining exactly x = 1, 2, …, m
1st prizes. To take the simplest case, the probability of obtaining exactly one 1st prize when
n1 = n2 = … = nm = n' is, from (2.11),
m−1
mn ' N −n '
P ( X =1∣( w=1)∧H )=
(2.12)
N
N
(
)
Example 1 (continued)
(iv) Suppose B randomly bought two tickets from each of the three stages of selling. The
probability of winning exactly one 1st prize, from (2.12), would then be 0.002396 to six decimal
places, which is odds of over 416 to 1.
Example 2
In 2014, FoPV member C sold 245 tickets of which 5 won a prize.
(i) What is the probability of selling this many winning tickets (ie of selling 5 or more winning
tickets) knowing that there were 20 winning ducks?
Assume C sold exclusively from one series of numbers (ie no duplicated numbers). Using (2.7),
4
P ( X ≥5 ∣ w∧H )=1−∑ P ( X =x ∣ w∧H )=1−0.961=0.039
x=0
which is odds of about 24 to 1.
(ii) Suppose C sold numbers from all 3 stages of selling. Lets assume he sold almost equal numbers
of tickets from each stage: ie n1 = n2 = 82, n3 = 81. The probability using (2.11) is then
4
P ( X ≥5 ∣ w∧H )=1−∑ P ( X =x ∣ w∧H )=1−0.954=0.046
x=0
which is odds of 21 to 1.
2.3 Random number of winners W
The probabilities derived so far are conditional on w, but in reality the number of winners depends
on the number of times the winning duck numbers were sold. With a fixed number of prizes, z,
available, w may be reduced if most of the winning duck numbers were sold m times, or increased if
many were sold less than m times. In 2014 for example, 58 prizes were awarded to 18 numbers sold
3 times and 2 numbers sold twice. Suppose all 20 winning numbers had been sold 3 times, then 60
prizes would have been required. In this case, however, the two extra prizes would probably have
been provided, rather than change the nice round number of w = 20 winning numbers. At the other
extreme, suppose the first 30 ducks numbers had all been sold just twice. Given 60 prizes, w would
then be increased to 30. We have to consider then a random variable W taking values w on the
interval [20, 30]. If z is not fixed, we also have to consider a random variable Z which depends on
the particular rule we choose for determining the number of prizes.
Let ym and ym-1 be the number of winning ducks whose numbers were sold m and (m – 1) times, then
4
w = ym + ym-1
(2.13)
z = m ym + (m – 1) ym-1
(2.14)
such that
Eliminating ym-1 from these equations gives
w=
z− y m
m−1
(2.15)
Given Z = z, therefore, the probability distribution of W = w can be elaborated in terms of the
distribution of Y = ym . Incidentally, equation (2.15) shows that fixing the number of prizes z is
untenable, since it would not always give an integer value for w.
Consider sampling w ducks without replacement from a population of N ducks sold either m times
(“success”) or (m – 1) times (“failure”): ie a deal of w from (Nm, N – Nm) where Nm is the number of
ducks sold m times. The probability of ym winning ducks is then the binary hypergeometric
probability
w
N −w
y m N m− y m
P (Y = y m ∣(Z =z )∧H )=
N
Nm
( )( )
( )
where Y is a random variable having a hypergeometric distribution and range [max(0, w + Nm – N),
min(w, Nm)].
We need to define all possible values of z, which depend on the rule chosen for awarding prizes. Let
zmax be the maximum number of prizes available, then a simple rule would be to award prizes to the
maximum number of finishing ducks such that z ≤ zmax . Using this rule, the above probability
needs modification in cases when z = zmax – (m – 1), ie when the next duck could have been sold
(m – 1) times, satisfying the rule but incrementing the number of winning ducks, w, by one. In 2014,
for example, z = 58, y3 = 18 and w = 20. If zmax = 60, then w could have increased to 21 if the next
duck past the finishing line had been sold just twice. In elaborating the probability of W = 20,
therefore, we need to take into account the probability that the next duck was not sold twice, ie that
is was sold 3 times. Let Qm ≡ 'the next finishing duck was sold m times'. After ym of w winning
ducks sold m times, this probability is clearly
P (Qm ∣(Y = y m)∧H )=
N m− ym
N −w
so when z = zmax – (m – 1), using the multiplication law;
P ((Y = y m )∧ Q m∣(Z =z )∧H )=P (Qm ∣(Y = y m)∧H ) P (Y = y m ∣(Z =z )∧H )
=
( )(
w
ym
N −w−1
N m− y m−1
( )
N
Nm
5
)
after some simplification. So for this rule of determining z, we have
{
( )(
P (W =w ∣(Z =z )∧ H )=
w
N −w−1
y m N m − y m−1
( )
N
Nm
( )( )
( )
w
ym
N −w
N m− y m
N
Nm
)
, z = z max−m+1
(2.16)
,
z > z max−m+1
where, from (2.15),
ym = z – (m – 1) w
and from its range as a hypergeometric random variable,
max(0, w + Nm – N) ≤ ym ≤ min(w, Nm).
The probability (2.16) is zero outside this range of ym. For a particular value of w, all possible
values of z are mutually exclusive, so to calculate the probability of w sum the probabilities given
by (2.16) over all possible values of z, ie
P (W =w ∣H )= ∑ P (W =w ∣(Z = z)∧H )
(2.17)
z ∈B
where B is the set defined by
B={z : z ∈ ℤ , z max−m+1 ≤ z ≤ z max , 0 ≤ z −(m−1) w ≤ w}
Table 1 shows the results from applying (2.16) and (2.17) to the 2014 race.
z
w
58
59
60
P(W = w)
20
0.251 0.238 0.095
0.584
21
0.046 0.123 0.219
0.388
22
0.001 0.006 0.021
0.028
P(Z = z) 0.298 0.367 0.335
Table 1: Conditional distribution of W given Z in 2014
The cells of Table 1 contain the conditional probabilities P (W =w ∣( Z =z )∧H ) for w = 20, 21,
22 and z = 58, 59, 60. Conditional probabilities for w > 22 are negligible. The final row of Table 1
shows the marginal distribution of Z, ie the probabilities that 58, 59 or 60 prizes would be awarded.
The most important information in Table 1, however, is its final column showing the marginal
distribution of W.
6
Using the extending the argument theorem, we can combine the marginal distribution of W with the
probabilities of §2.2 to remove conditioning on w. The probability of obtaining x winning tickets
from one stage of selling is then
⌊ z max /(m−1)⌋
∑
P ( X =x ∣H ) =
P ( X =x ∣( W =w)∧H ) P (W =w ∣H )
(2.18)
w=⌊ z max /m⌋
where P ( X =x ∣(W =w)∧H ) is given by (2.7). The theorem could similarly be applied to the
probability of obtaining x winning tickets from m stages of selling using (2.11). We can now make
probability statements about the number of winning tickets in a sample before the race is run –
before knowing how many ducks will be declared winners.
Example 2 (continued)
(iii) Before the outcome of the race is known, what is the probability of C selling 5 or more winning
tickets from one stage of selling, given that a maximum of 60 prizes will be awarded?
Using (2.7) in (2.18),
4
P ( X ≥5 ∣H )=1−∑ P( X = x ∣H )=1−0.957=0.043
x=0
which is odds of about 22 to 1.
(iv) Before the outcome of the race is known, what is the probability of C selling 5 or more winning
tickets from n1 = n2 = 82, n3 = 81 tickets sold from each stage of selling, given that a maximum of
60 prizes will be awarded?
Using (2.11) in (2.18),
4
P ( X ≥5 ∣H )=1−∑ P( X = x ∣H )=1−0.951=0.049
x=0
which is odds of about 19 to 1.
3. Is the duck race fair?
The duck race is fair if each and every ticket sold has the same chance of winning any one of the
prizes. Since sufficient prizes are provided for every winning ticket, the fairness of the race depends
on H – the assumption that each duck has equal probability 1/N of winning. Throughout §2 we have
assumed that H is true, but here we treat it as a hypothesis to be tested.
3.1 Testing a sample of tickets
Example 2 is based on fact: in 2014 an FoPV member actually sold 245 tickets from one series of
numbers, five of which were winners. Table 2 shows these results in the form of a 2 X 2
contingency table.
7
Sold by C
Winners
Losers
Total
5(2)
Not sold by C Total
15(18)
20
240(243) 2240(2237) 2480
245
2255
2500
Table 2: Contingency table for winning tickets sold by C in 2014
If H were true, then the proportion of tickets sold by C that won would be equal to the proportion of
tickets not sold by C that won. The expected frequencies assuming H were true are shown in
parentheses in Table 2. We see that C sold three more winning tickets than expected under H. While
we would not necessarily expect C to sell exactly two winning duck numbers, the question is
whether a difference of three ducks is large enough to cast doubt on hypothesis H.
Fisher's exact test for 2 X 2 contingency tables is appropriate here. Applying this test to Table 2
gives a p-value of 0.039. This is the same as the probability obtained in Example 2(i), since the null
distribution of Fisher's exact test is identical to the hypergeometric distribution given by (2.7), and
the value of Fisher's test statistic for Table 2 is xt = 5. A p-value of 0.039 is significant at the
conventional level of 5%, indicating that C was a significantly “lucky” vendor of tickets, but should
it lead us to reject the null hypothesis H and conclude that the race was unfair? There are several
issues to consider before reaching such a conclusion.
The test is particularly sensitive due to the discrete test statistic: eg if C had sold 4 winners the
upper tail probability would be 0.125, which is not significant at the 10% level.
An unusually small number of winners would also be evidence of an unfair race, so this should be a
two-sided test. The problem of defining a two-sided version of Fisher's exact test is controversial
and has been much discussed in the statistical literature – see Yates (1984), Hirji et al (1991),
Agresti (1992) and Meulepas (1998). Numerous approaches to this problem have been proposed
over the years. We consider some of the better known methods.
Gibbons and Pratt (1975) proposed adding the one-sided p-value to an attainable probability in the
other tail which is as close as possible to the one-sided p-value in the liberal (GP1), conservative1
(GP2) or either (GP3) direction (using Meulepas' labels). The least number of winning tickets C
could sell is none, corresponding to a lower tail probability of 0.126, which leads to the following
results: GP1 gives a two-sided p-value of 0.039, GP2 and GP3 give a two-sided p-value of 0.165.
The apparently significant result from GP1 may be disregarded on the grounds that there is no
attainable probability in the other tail that satisfies GP1, and because it's liberal. The results from
GP2 and GP3 may be disregarded on the grounds that, given a lower tail probability of 0.126, no
matter how large Fisher's test statistic, the two-sided p-value would not be significant, which seems
implausible. We dismiss these three methods, therefore, and consider only the following two
methods.
(a) Finney (1948), endorsed by Yates (1984), suggested simply doubling the one-sided p-value,
giving in this case a two-sided p-value of 0.078.
(b) Meulepas' (1998) more complicated calculation gives a two-sided p-value of 0.106.
1 A conservative p-value tends to understate the evidence against H, whereas a liberal p-value tends to overstate it.
8
The p-values from these two methods are not significant at the 5% level, leading us to accept the
null hypothesis H and conclude that the race was fair.
The hypergeometric null distribution of Fisher's exact test assumes fixed marginal totals. This
conditioning on the marginals is another source of controversy generally (see Yates (1984)), but is
particularly pertinent here. We know the row totals in Table 2 are not fixed – they depend on the
random variable W. We can remove the conditioning on the row totals by using the distribution
derived in §2.3. That is, use the distribution defined by (2.18), with substituted expressions (2.7),
(2.16) and (2.17), as the null distribution in Fisher's exact test. We may call this the quasihypergeometric contingency table null distribution, or QUAHC distribution for short. Table 3
compares the hypergeometric and QUAHC distributions. Probabilities for x > 7 are negligible. The
mean values, E[X], are included since they are required by Meulepas and other methods of
determining two-sided p-values.
x
0
1
2
3
4
5
6
7
E[X]
Hypergeometric 0.126 0.276 0.286 0.186 0.086 0.030 0.008 0.002 1.96
QUAHC
0.121 0.270 0.286 0.191 0.090 0.032 0.009 0.002 2.00
Table 3: P(X = x) and E[X] for the hypergeometric and QUAHC distributions
Clearly, the two distributions are very similar, but QUAHC is slightly more skewed to the right.
Using the more appropriate QUAHC null distribution, the one-sided p-value is 0.043, and the twosided p-values by the above methods are (a) 0.086 and (b) 0.115. As before, we conclude that there
is no significant evidence against H, and that the race was fair.
3.2 Testing the distribution of winners
An alternative test is to consider the distribution of the 20 winning numbers. In 2014 these were,
106 250 289 605 815 893 921 923 966 1046
1055 1114 1351 1544 1612 1753 1907 1957 2051 2115
The dot plot of these numbers shows a fairly random scatter on the interval [1, 2500].
Figure 1: Dot plot of winning duck numbers in 2014
Given H, the winning numbers constitute a sample from a discrete uniform distribution on the
9
interval [1, 2500]. We can test this using the chi-squared goodness-of-fit test. Split the interval into
four equal sub-intervals, [1, 625], [626, 1250], [1251, 1875], [1876, 2500]. The expected frequency
of winning numbers in each sub-interval is 5. The observed frequencies are 4, 8, 4, 4 respectively.
The chi-squared statistic is 2.4 corresponding to a p-value of 0.49. There is no evidence, therefore,
to reject the hypothesis of a discrete uniform distribution of ducks. The 2014 race seems to be fair.
It should be noted, however, that this test is not very powerful2 for only 20 ducks, which allow no
more than 4 sub-intervals – since the chi-squared test requires a minimum expected frequency of 5.
Although there were 20 winning numbers, the first 30 or so ducks were recorded. It would be
interesting to repeat the test for this larger sample, which would allow more sub-intervals.
4. Summary
This analysis was inspired by Example 2: what are the odds of obtaining X = 5 winning tickets in a
sample of 245 tickets? A full and accurate answer to this question leads to some surprisingly
complicated mathematics. We have considered both the simple case of tickets sold from one stage
of selling, and the more complicated case of tickets sold from several stages. The major
complication, however, is taking into account the adjustment of the number of winners, W, to the
number of available prizes. Assuming a simple rule for awarding prizes, we derived the probability
distribution of W and hence that of X. This analysis could be reworked for other rules. We found
that the derived 'QUAHC' distribution of X is similar to the hypergeometric distribution. We used
both as null distributions in Fisher's exact test, which touched upon some long-standing
controversial issues in the literature. Two-sided versions of this test did not produce significant
evidence against the hypothesis that the duck race was fair. We also tested the fairness of the race
using chi-squared goodness-of-fit. This also did not lead us to doubt the fairness of the race, but
increasing the power of this test by recording a larger number of finishing ducks – say the first 100
– is recommended.
References
Agresti, A.(1992). A survey of exact inferences for contingency tables. Statistical Science, 7, 131177.
Finney, D. J. (1948). The Fisher-Yates test of significance in 2 X 2 contingency tables. Biometrika,
35, 145-156.
Gibbons, J. D. and Pratt, J. W. (1975). P-values: Interpretation and methodology. American
Statistician 29, 20-25.
Hirji, K. F., Tan, S. and Elashoff, R. M. (1991). A quasi-exact test for comparing two binomial
proportions. Statistics in Medicine, 10, 1137-1153.
Meulepas, E. (1998). A Two-Tailed P-Value for Fisher's Exact Test. Biometrical Journal. 40, 3-10.
Yates, F. (1984). Tests of significance for 2 X 2 contingency tables (with discussion). Journal of
the Royal Statistical Society, Series A, 147, 426-463.
A. J. Bertie
12th January 2015
2 The power of a significance test is the probability it rejects the null hypothesis when it is false.
10