*This research was sponsored by U.S. Army Research Office - Durham, under
Grant Number DAH-C04-74-C-0030.
t This research was sponsored by U.S. Air Force Office of Scientific Research,
under Grant Number 74-2701.
ON A MULTIVARIATE GENERALIZED OCCUPANCY MODEL
N.L. Johnson *
University of North Carolina at Chapel Hill
Samuel Kotz t
Temple University, Philadelphia
Institute of Statistics Mimeo Series No. 979
January" 1975
On A ~fultivariate Generalized Occupancy Model
by
N.L. Johnson *
University of North Carolina at Chapel Hill
and
Samuel Kotz**
Temple University, Philadelphia
O.
Introduction and Summary
Uppuluri and Carpenter (1971) (referred to below as [U-C]), have discussed a
generalized occupancy model related to the classical occupancy model discussed by
David and Barton (1962).
We first obtain Uppuluri and Carpenter's results by
elementary methods, and then discuss a natural multivariate generalization.
e-
1.
Univariate Case
Consider a population containing just m categories, of which b(sm) appear
with equal frequency, each being a proportion p
call these categories of Class I.
of the total population.
We
The remaining categories (a proportion
(1 - bp) of the total population) we will call categories of Class II.
We first evaluate the probability that in
r
independent trials, in each of
which an individual is chosen at random from the population, observed and returned to the population, there are observed exactly
j
of the b Class I cate-
gories.
We will denote by V
r
in r
the number of different Class I categories observed
independent trials, and by Rl
category (anyone among the
the number of trials in which a Class I
b in this Class) is observed.
Then
Rl
has
*This research was sponsored by U.S. Army Research Office - Durham, under
Grant Number DAH-C04-74-C-0030.
** This research was sponsored by U.S. Air Force Office of Scientific Research,
under Grant Number 74-2701.
2
a binomial distribution with parameters r, bp.
The distribution of Vr , given Rl is the same as that of the number of
different categories observed when there are just b equally likely categories
(i.e. each in proportion b- l ) and Rl trials. This is the classical occupancy
distribution (e.g. David and Barton (1962)). Hence
(1)
Averaging over the distribution of R
l
Pr[V
r
=
j] =
(~)E[~joRl/bRl]
J
•
Using the formula (for A an arbitrary constant)
E[ARl] = (1 - bp + bpA)r
we obtain
e-
Pr[V = j]
r
Since
~j f(x)
= (~)~j(l
J
f (-l)j-h(~)f(x+h)
=
h=O
Pr[Vr
=
(2)
- bp + poO)r .
f (-l)h(~)f(x+j-h)
(3)
, we can express (3) as
h=O
= j]
,I.:\:(,3) ,
or
:""" (3)"
(r)
, given that
1.J
[U-C] derive the conditionaZ probability p..
i «j) cate-
gories of Class I have already been observed, that after r further independent
trials are carried out, the total number of different Class I categories observed
(including the
i
already observed) will be
j.
To evaluate this we regard the
(b-i) so far unobserved Class I categories as Class II categories (each, of
course, still with proportion p) and all others (including Class II categories)
as Class II I .
We then need
the probability of observing (j -i) Class I' "
3
categories.
So we replace
b by
(b-i) and
j
by
(j-i) in (3), obtaining
(r) _ (~-~)~j-i{l _ (b-i)p + poO}r .
Pij ' ' - J-1
l.r
If we use the form (3)", we obtain
(4)
(4) ,
which agrees with the formula on page 320 of [U-C].
The methods we use are
much simpler than those used in [U-C] , which include the theory of absorbing
Markov chains and "bidiagonal' matrices.
Results obtained in [U-C] concerning a certain waiting-time distribution
can also be obtained quite simply.
to observe all b
observed.
Let
T denote the number of trials needed
categories of Class I, given that
i
have already been
We again introduce the Classes I', II' defined above, and note that
e-
b-i
T =
where
Th
L
(5)
Th
h=O
is the number of trials needed to observe some one of (b-i-h+l)
categories of Class I (each with proportion pl.
The Th's
are independent
and Th has a geometric distribution with parameter (b-i-h+l)p.
Hence
(6)
and
var(T )
h
= {I
- (b-i-h+l)p} (b_i_h+l)-2 p -2.
(7)
From (5), (6) and (7)
E[T]
b-i
= p-l L (b-i-h+l)-l
(8)
h=l
var(T)
=p
_2 b - i
L (b-i-h+l) -2
h=l
- p
_lb-i.
-1
I (b-l-h+l) .
h=l
These formulae agree with those on page 323 of [U-C].
(9)
Of course, the expected value of the waiting time could be obtained as
00
L r{Pr [Vr=b]
r=b
- Pr (Vr _l =b
n.
Feller (1962) and Harkness (1969) consider an extension of the classical
occupan~y
problem in which there is a probability (l-w) that an observation
falling in any Class I category will not be recorded.
(Tables of critical
points for the "extended" classical occupancy distribution are available in
Johnson, Kotz and Srinivasan (1974)).
here this simply means that
r, bwp
In the situation we have discussed
R has a binomial distribution with parameters
(in place of r, bp).
The formulae (3) - (9) will still apply, with
p replaced everywhere by wp.
2.
~-
Multivariate Generalization
The simple techniques employed in Section 1 can be applied to obtain some
results for a natural generalization.
for
Suppose we have bg categories of Class l(g)
g = 1,2, •.. ,k
and
Class lq, as before contains
each category of Class leg) has proportion Pg.
k
all the remaining categories, and has total proportion 1 k
L b P = P , say.
g=l g g
0
L R independent trials, R trials yield
g=O g
g
individuals in Class leg) (for g = 1,2, ..• ,k) and RO yield Class II individuals.
Suppose, now, that in r =
R1,RZ, ... ,R , thaLexaatZy
g
the bg Class leg) categories have been observed (g = 1,Z, .•. ,k) is
Then the conditional probability, given
jg
of
k
= g=l
IT [V
= j g IR]
r,g
g
k b
j
R
R
= IT (. g) 6 go g/b g
g=l J g
g
(10)
5
Here
V'
_r =(Vr, lJ""Vr, k)
categories observed, and
on
\
R, the V 's
r,g
Now
with Vr,g being the number of Class leg)
(Note that conditionaZZy
-
j' = (jl,j2, ••. ,jk)'
are mutually independent.)
RO,R1 , ••. ,Rk have a joint multinomial distribution, with parameters
k
Lb. P ), b1P1, ..• ,bkPk' Proceeding as in the univariate case
g=1 g g
(cf (2) - (3)) and noting that for arbitrary constants Al ,A2 ,··· ,Ak
r; (1 -
k R
E[ II A g]
g=l g
= (1
k
Lb
-
= (PO
k
P
g=l g g
+
k
+
Lb
(11)
P A )r
g=1 g g g
Lb P A )
g=1 g g g
r
(11) ,
we obtain
k
L PgoOg)T
(12)
g=1
where the difference operator
~g
operates only on
0g'
Noting that
we find that (12) can be put in the form
Pr[~r = i]
.
=
k
E(' -h )
~ (~g) r (_1)1
g=l J h=O
g
--
g
J
g.{
~ (~g)}(po
g
g=1
Noting further that
e·
we see that (12) or (12)' can also be expressed as
+
I
h P Jr.
g=l g g
(12)'
j
Pr[V =j]
",r '"
=
6
k
E(j -h ) k
b
k
f (_1)1 g gg=lII (bg-J. g , ~g,J. g- hgJ(po g=lr hgpg)r
(12)"
+
h=Q
Yet another form is
k
!r (-1) i
h
Pr[V =j] =
",r '"
h=Q
1
}r
r
bg
{k
II b . h · h Po +
(j -h ) p .
g=l g-J g , g,J g - g)
g=l g g g
g k (
(12)111
k
between (12) iii
r
PgPg' the parallelism between (12)" and (3)', and
g=l
and (3)" is apparent.
Remembering that PQ::;:l-
i ,i 2 , ... ,i k categories of Classes I(1),I(2), ... ,I(k)
l
respectively have already been observed, then as in Section 1, we obtain the
If we suppose that
conditional probability, that jl,j2, .•• ,jk
categories, in aZZ,of Classes
I(1),I(2), ... ,I(k) respectively will have been observed after
independent trials, by replacing
b
g
by (bg-i g)
and
jg
r
further
by (jg-i g)
in
(12), remembering the lihidden" b 's in PO' We obtain from (12) II
k
g
~-~
E(j -i -h ) k
b'
k
pI~) = r"'(-l)l g g g II (b . hg-~g· h ) (po + L (n -i )p J~ (13)
.~1iJ
h=Q
g=l g-J g , g,Jg-~g-' g
g=l g g g
An
extension, analogous to that described at the end of Section 1, is
obtained by supposing that (for g
= 1,2 •... ,k)
there is a probability,
that an observation of any category in Class I(g) is not recorded.
formulae (12)-(13) will still apply, with
k
remembering that now Po = 1 -
3.
Pg
(l-Wg~'
The
replaced by WgPg(g=1,2, ... ,k),
r b ~P .
g=l g g g
Expected Values
In the univariate case (Section 1) the probability that each of n
speoified categories of Class I are observed (at least once) in r
trials is
independent
7
1 -
n
r
n
r
. n
r
nn
r
(I)(I-p) + (2)(I-2p) -.. ·.+(-1) (I-np) = (-1) b. (I-poO)
(14)
(Note that np < 1.)
We define
if the t-th category in Class I is observed,
if not.
(15)
n 0t
E[ IT Z ]
t=I O-t
= (-1) nb.n (l-p
o
O)
r
(16)
where O-I,0-2"",O-n are positive integers.
In particular
-b.(I-poO)
r
=1
- (l-p)
r
(17)
whence
r
E[Zt] = 1 - (l-p)
(18.1)
var(Zt) = (l_p)r U_(l_p)r}
(18.2)
_: -b.(I-poO) r}2
cov(ZtZt') = b.2 (l-p oO) r {
=
Now Vr
=
(I-2p)
- (l-p)
2r
.
(18.3)
and so
E[V ]
r
var(V )
r
= b{l_(l_p)r}
= b(l_p)r{I_(l_p)r}
= b{(l_p)r
+
(19.1)
b(b-1){(1-2p)r _ (1_p)2r}
_ (1_2p)r} _ b2{(1_p)2r _ (1_2p)r}. (19.2)
Note that if there is no Class II, then p
occupancy problem. p cannot exceed b- 1 .
e·
r
In the multivariate case we define
= b- 1
and we have the classical
8
= {01 if the t-th category in Class leg) is observed,
Zgt
if not.
(20)
b
so that
V
r,g
,gz
L.
= t=l
gt
Then (cf. (19.1), (19.2))
E[V ]
r,g
= bg {1
- (l-p )r}
(21.1)
g
var(V ) = b {(l-p )r _ (1_2Pg)r} ~ b2{(1_p )2r _ (1-2p )r} (21.2)
r,g
g
g
g
g
g
Also, since (for (g,t)
~
(g' ,tV))
E[ZgtZg'tv1 = Pr[(Zgt=l)
n
(Zg't l = 1)1
= 1 - (l_P g)r _ (l_P ,)r
g
+
(l_P _P ,)r
g g
we have (for g t: g')
cov(Vr.g ,V,)
r,g = bgbg I cov(Z gt'Z g It')
= bgb g, { (l-Pg-P g ,) r
We now find the regression of V
on V
r,g'
r,g
Pr[V
b
= j IR 1 = (.g)~
r,g g g
J
g
j
- (l-P g) r (l-P g ,) r}
(21.3)
Since
goRg/bgRg
we have, using Bayes' theorem
(22)
Given Rg=r g , Vr,g , will be distributed as Vr _r gV , with PI
g'
g
by P ,(l_bgrg)-l. Hence
g
E[Vr,g ,R
I g=rg ] = bg,[l -
{1-Pg v(1-bgPg) -l}r-rg]
replaced
(23)
9
and E[Vr,g ,Ivr,g=j]
g
bution (22) .
is the expected value of (23) taken over the distri-
•
This is
bg' [1.
Thus the regression of V
r,g
E[V
.
on V
r,g'
is
Iv" ]
(24)
r,g' -r,g
REFERENCES
David, F.N. and Barton, D.E. (1962).
CombinatoriaZ Chanae, New York: Hafner.
Feller, W. (1968). An Introduation to ProbabiZity Theory and its
Vol. 1, New York: Wiley.
AppZiaations~
Harkness, W.L. (1969). The CZassiaaZ Oaaupanay ProbZem Revisited, Technical
Report No. 11, Department of Statistics, Pennsylvania State University.
Johnson, N.L., Kotz, S. and Srinivasan, R. (1974). Extended Oaaupanay
ProbabiZity Distribution CritiaaZ Points, Mimeo Series No. 934, Institute
of Statistics, University of North Carolina.
Uppuluri, V.R.R. and Carpenter, J.A. (1971). A generalization of the
classical occupancy problem, J. Math. AnaZ. AppZ. 54, 316-324.
© Copyright 2026 Paperzz