10.2 The Multinomial Distribution

10.2 The Multinomial Distribution
Ulrich Hoensch
Monday, April 08, 2013
The Multinomial Setting
A probability experiment consists of selecting an object at random
from exactly one of k bins, where the probability of selecting the
object from the ith bin is pi . Suppose the experiment is repeated n
times independently. Let Xi be the random variable that gives
the number of objects selected from the ith bin.
In this situation, we say that the random vector (X1 , X2 , . . . , Xk )
has a multinomial distribution. Note that for k = 2, we obtain a
binomial distribution.
Theorem 10.2.1
If (X1 , X2 , . . . , Xk ) has a multinomial distribution, then
n
P(X1 = x1 , . . . , Xk = xk ) =
p1x1 p2x2 . . . pkxk
x1 x2 . . . xk
n!
p x1 p x2 . . . pkxk .
=
x1 !x2 ! . . . xk ! 1 2
Example
Suppose five observations are independently obtained from the
distribution with PDF
f (x) = 6x(1 − x), 0 ≤ x ≤ 1.
Find the probability that one observation lies in the interval
[0, 0.25), none in the interval [0.25, 0.50), three in the interval
[0.50, 0.75), and one in the interval [0.75, 1].
We have 4 bins, and the probability for each bin can be obtained
by integration. For example,
Z 0.25
5
p1 =
6x(1 − x) dx = .
32
0
Example
Thus,
Bin
[0, 0.25)
[0.25, 0.50)
[0.50, 0.75)
[0.75, 1]
Probability
5/32
11/32
11/32
5/32,
and
P(X1 = 1, X2 = 0, X3 = 3, X4 = 1)
5!
(5/32)1 (11/32)0 (11/32)3 (5/32)1 = 0.0198.
=
1!0!3!1!
Multinomial/Binomial Relationship
Theorem 10.2.2
Suppose X = (X1 , X2 , . . . , Xk ) has a multinomial distribution with
parameters n = X1 + X2 + . . . + Xk and pi = P(X = ei ) (where ei
is the ith unit vector). Then, the marginal distribution of Xi is a
binomial distribution with parameters n and pi .
Proof. Assume w.l.o.g. that i = 1.
P(X1 = x1 ) =
X
x2 ,...,xk
=
n!
p1x1 p2x2 . . . pkxk
x1 !x2 ! . . . xk !
X (n − x1 )! x
n!
p1x1
p22 . . . pkxk .
x1 !(n − x1 )!
x
!
.
.
.
x
!
2
k
x ,...,x
2
k
By the multinomial theorem,
X (n − x1 )! x
p22 . . . pkxk = (p2 +. . .+pk )n−x1 = (1−p1 )n−x1 . x
!
.
.
.
x
!
2
k
x ,...,x
2
k
Example
Suppose 32 independent selections from 4 bins resulted in the
following observed frequencies:
Bin
1
2
3
4
Obs. Freq.
10
5
10
7
Suppose we assume that a multinomial distribution with
(p1 , p2 , p3 , p4 ) = (5/32, 11/32, 11/32, 5/32) applies. We formulate
the null hypothesis
H0 : p1 = 5/32, p2 = 11/32, p3 = 11/32, p4 = 5/32.
We want to compute the probabilty that under H0 , the observed
frequencies or more extreme frequencies occur. This is the
(exact) p-value for H0 .
Example
This leads to the following table.
Bin
1
2
3
4
Obs. Freq.
10
5
10
7
Exp. Freq.
5
11
11
5
Difference
5
6
1
2
The range R for the observed and more extreme frequences is:
I
X1 : 0, 10 − 32
I
X2 : 0 − 5, 17 − 32
I
X3 : 0 − 10, 12 − 32
I
X4 : 0 − 3, 7 − 32
Note that we also need X1 + X2 + X3 + X4 = 32.
Example
The exact p-value is
p =
X
(x1 ,x2 ,x3 ,x4 )∈R
32!
x1 !x2 !x3 !x4 !
5
32
x1 11
32
x2 11
32
x3 5
32
x4
≈ 0.0008 = 0.08%.
Thus, we reject the null hypothesis that the given multinomial
distribution applies. The p-value was computed using the following
Mathematica code.
Example
PDF@MultinomialDistribution @x1 + x2 + x3 + x4,
prob@x1_, x2_, x3_, x4_D :=
85  32, 11  32, 11  32, 5  32<D, 8x1, x2, x3 , x4<D  N;
solns = Solve@8Abs@x1 - 5D ³ 5, Abs@x2 - 11D ³ 6,
x1 ³ 0, x2 ³ 0, x3 ³ 0, x4 ³ 0<, 8x1, x2, x3, x4<,
Abs@x3 - 11D ³ 1, Abs@x4 - 5D ³ 2, x1 + x2 + x3 + x4 Š 32,
Integers D;
Total@prob@x1, x2, x3, x4D . solnsD
0.000800296
Homework Problems for Section 10.2 (Points)
p.498-499: 10.2.2 (2), 10.2.8 (2).
Homework problems are due at the beginning of the class on
Monday, April 15, 2013.