Weighted Distributions and Size-Biased Sampling with Applications

BIOMETRICS
34, 179-189
June, 1978
WeightedDistributionsandSizeBiased SamplingwithApplications
to WildlifePopulcztions
andHumanFamilies
G. P. PATIL
The Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A.
C. R. RAO
Indian Statistical Institute, New Delhi 110029, India
Summary
When an investlgator records an observation by nature according to a certain stochastic
model the recordedobservation will not have the original distributionunless everJ observation is
given an equal chance of being recorded. A numberof papers have appeared during the last ten
years implicitly using the concepts of weighted and size-biased sampling distributions.
In this paper we examine some general models leading to weighted distributioslswith weight
functions not necessarily boundedby unity. The examples include. probability sampling in sasnple
surveys additive damage models visibility bias dependent on the nature of data collection and
two-stage sampling. Several important distributionsand their size-biased fornls are recorded. A
few theoremsare given on the inequalities between the mean values of two weighted distributions.
The results are applied to the analysis of data relating to human populations and wildlife
management.
For humanpopulations thefollowing is raised and discussed. Let us ascertainfrom each snale
student in a class the nusHberof brothers includinghimself and sisters he has and denote by k the
numberof students and by B and S the total numbers of brothersand sisters. What would be the
approximate values of B/(B + S) the ratio of brothers to the total numberof children asld (B +
S)/k the average number of children per family? It is shown that B/(B + S) will be an
overestimate of the proportion of boys among the childrenper family in the general population
which is about halB and similarly (B + S)/k is biased upwards as an estimate of the average
for the
number of children per family in the general population. Some suggestiosls are oJ%ered
estimation of these population parameters. Lastly for the purpose of estimating wildlife population density certain results are formulated within theframework of quadratsasnplinginvolving
visibility bias.
1. Introduction
When an investigatorrecordsan observationby natureaccordingto a certainstochastic
model, the recordedobservationwill not have the originaldistributionunlessevesy observation is given an equal chance of being recorded. For example, suppose that the original
observationX has f(x) as the pdf (which may be probability when X is discrete and
probabilitydensitywhenX is continuous),and that the probabilityof recordingthe observation x is O < w(x) < 1, then the pdf of XW,the recordedobservationis
fw(x)
=
w(x)
f(x)
( 1.1 )
Key Words. Weighted distribution; Probability sampling; Damage model; Quadrat sampling; Visibility
bias.
179
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS,JUNE 1978
180
wherecois the normalizingfactorobtainedto makethe total probabilityequalto unity.Thus
comaybe referredto as the visibilityfactor. Note that
f if and only if w(x) is a constant.
Rao (1965) introduceddistributionsof the type (1.1) with an arbitrarynonnegative
weightfunctionw(x) which may exceedunityand gave practicalexampleswherewtx) = x or
xa are appropriate.He calleddistributionswith arbitraryw(x) weighteddistributions.In this
paperwe use this definitionof a weighteddistributionwith arbitraryw(x), of which( 1.1) is a
special case. The weighteddistributionwith w(x) = x is also called a sized biased distribu
tion. We shall show in Section 2 of this paperhow the weightw(x) = x occurs in a natural
way in many samplingproblems.
A numberof papershave appearedduringthe last ten yearsimplicitlyusingthe concepts
of weightedand size-biasedsamplingdistributions,the resultsof which are brieflysurveyed
(with relevantreferences)in a recentpaperby the authors(Patil and Rao 1977).A studyof
size-biasedsampling and related form-invariantweighteddistributionswas made by Patil
and Ord (1975).
In this paper, we examine some general models leading to weighteddistributionswith
weightfunctionsnot necessarilyboundedby unity.The resultsare appliedto the analysisof
data relatingto human populationsand wildlifemanagement.
fLt'
=
2. GeneralModelsLeadingto WeightedDistributions
2.1 ProbabilitySamplingin SampleSurveys
A well known exampleis what is calledpps (probabilityproportionalto size) samplingin
sample surveymethodologywhere the originalpdf of a variableis changedaccordingto a
given design of selection of samples to improve the efficiencyof estimatorsof unknown
parameters.
Let Y be a rv concomitantwith the main rv X understudyandpl(Y),p2(x: y),
and
p4(y
s x)
be the marginalpdf of Y, the conditionalpdf of X given Y, the marginalpdf of X
and the conditionalpdf of YgivenX respectively.Let us firstobservea valuey of a rv with
pdf
p3(X)
yttJ
g(Y
P1(Y)/ Jg(Y)
p1(S2)dy
(2.1)
whereg(y) is a chosen non-negativefunction.Then an observationis made on X using the
conditionaldistributionp2(x 4 y). The pdf of the resultingrv Xtt'is
tP2(XlA2)g(Y)pl0n)dy
=
p3(X
)
XV(X
)/
S
p3(X
)
w(x)dx
(2.2 )
Jg0')Pl02)dy
where
w(x)
=
JP4(Y
I x)
g(y) dy
(2.3)
which is a weighteddistributionof X with the weightw(x) not necessarilyboundedby unity.
Furtherw(x), in such a situation,may involve the unknownparametersof
Thus, for
the observedvalue xtv the appropriatepdf is the weightedpdf (2.2). In practiceP1(.Y)is
completelyknownandg(y) is chosento maximizethe efficiencyof inferenceon the unknown
parametersof interestin
p3(X).
p3(X).
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
Thelimit of
ftt2(X)
as d Ois easily seen to be
WEIGHTED DISTRIBUTIONS AND APPLICATIONS
181
2.2 DaswlageModel of IRao (l965)
Supposethat we are samplingfroma pdf f(x), but while realizingan observationx it goes
through a 'damageprocess'with the result that we finally have an observationz from the
conditionaldistributionwithpdf c(z | X = x) which we may denote simplyby c(z | x). The
marginalpdf of the observedvalue z is
d
c(z
N
x) f(x) dx
(2.4)
which has the form of a mixtureof distributions.
Let us supposethat an observationis recordedonly when the originalvalue is unchanged
throughthe damageprocess.The pdf of such an observationxw is
c(x | x) f(x)/E(c(X
1
X)),
(2.5)
whichis a weighteddistributionwith w(x) = c(x 1 x), wherec(x: x) = e(Z = x: X -- x), the
conditionalprobabilitythat an observationx remainsunchanged.Examplesof suchdistributions are consideredby Rao (1965) and Rao and Rubin (1964). A truncateddistributionis a
special case of (2.5), where c(x 1 x) takes the value zero in a certainregion of the sample
space of X and unity in the complement.
2.3 Visibility bias
Let us considera discreterandomvariableX with pdf f(x). For instance,X may be the
numberof individualsin a groupor a colony in whichcasef(x) is the probabilitythat a group
consistsof x individuals.Let us supposethat a groupgets recordedonly when at least one of
the individualsin the group is sighted and each individualhas an independentchance d
of being sighted. Then the probabilitythat an observedgroup has x individualsis
ftV(X)=w(x)f(x)/E(w(X))
wherew(x) = [1-(1-d)X].
group.
(2.6)
It may be noted that E(w(X)is the probabilityof observinga
x f(x)/E(X)
(2.7)
which is thus an approximationto (2.6) when d is very small. The weight w(x) = x
correspondsto size biasedsamplingand the distribution(2.7) providesan appropriatemodel
in manypracticalsituations.See, for example,the analysisof data on sex ratio in Rao (1965)
and Section 4 of this paper.
The significanceof the weight function w(x) = 1 - (1 - d)x and the limitingvalue x
would be apparentfrom the followingexamples.
Exassaplel (Haldane 1938, Fisher 1934,Rao 1965,Neel and Schull 1966,etc.). If we wish
to studythe distributionof the numberX of albinochildren(or childrenwith a rareanomaly)
in familieswith pronenessto producesuchchildren,a convenientsamplingmethodis firstto
discoveran albino child and throughit obtain the albino count xtv of the familyto whichit
belongs.If the probabilityof detectingan albino is , then the probabilitythat a familywith
x albinosgets recordedis w(x) = 1 - (1 - IS)X,
assumingthe usualindependenceof Bernoulli
trials. In such a case xw has a weightedbinomial distributionwith the weight function as
definedabove.
Example 2 (Cook and Martin 1974).In aerialcensusdata collectedfor estimatingwildlife
populationdensity,visibilitybias is generallypresentbecauseof the failureto observesome
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1978
182
animals.Supposethat the animalsare found in groupsand group count X hadpdf f(x) and
the probabilityof sightingan animal is d. Conditionalon observingat least one animalin a
group, a completecount is made of the groupand the numberof animalsis recorded.If the
samplingprocessis such that each animalhas an independentchanced of beingsighted,then
the selectionprobabilityis w(x) = 1 - (1 - d)X.The observedgroup count XLt'
has thepdf
w(x)f(x)/E(w(X)).
In both these examples,the parameterd may be small, in whichcase the weightfunction
w(x) = x would be appropriate.The exact treatment of quadrat sampling for animal
populationsfor given d is given in Section S of this paper.
2.4 Two-Stage Sampling (A Limit Theorem)
First let us considera discreter.v. X such that
P(X = i) = lri, i = 1,
.
(2.8)
Supposethat naturehas produceda largesampleof size N from the distribution(2.8) and in
the sample the observationi occurs ni times, so that
f
N. Furtherlet us
supposethat we take a subsampleof size n from the finite set of N observationsby drawing
one observationat a time with replacementand givinga chanceproportionalto sliw(i) to the
observationi. Then the probabilitythat the subsampleconsists of r1 ones, r2 twos,
is
proportionalto
nl
[E11w(1)]rl
[n2w(2)]t2
[n1w(l)+ n2w(2)+
]n
As N
,
+
n2
+
=
[(n1/N)w(l)]tsl[(n2/N)w(2)]r2
[(Hl/N)w(l)
+
(n2/N)w(2)
+
(2 9)
W]n
the expression(2.9) tends in probabilityt.o
In the limit r1,r2,
[71Wt1)]t1[lr2w(2)]r2
(2 1O)
[X1w(l ) + lr2w(2) + * ]n
constitutea sampleof size n from the weighteddistribution
p(x'v = i) = w(i)lrz/Ew(i)lrz.
(2.11)
It is seen that in (2.11), the weight w(i) can be arbitrarilysubjectto the conditionthat a
chancemechanismexistsfor drawinga samplefromthe finiteset (x1, , XN) giving a chance
proportionalto w(xi) for xi Consider,for instance,the problemof estimatingthe probability
thata child inheritsa certain defect. For this purpose we may obtain the list of children
referredto a clinic and recordthe numberof defectiveand non-defectivechildrenin each of
thedistinct familiesto which they belong. In such a method of sampling,the chancethat a
familywith r defective children is brought to record is proportionalto r, if the children
referredto the clinic can be consideredas a randomsampleof all defectivechildrenin the
populationunder study.
A similarresultholds in the case of a continuousrv X withpdf f(x) when we subsample
froman original large sample (say of size N), giving a chance proportionalto w(x) to
observationx in the originalsample.In such a case as N , the subsampleof fixed size n
maybe consideredas a randomsampleon a rv XU'with the pdf
w(x) f(x)/E(w(X))
(2.12)
Examplesof such sampling arise if we want to determine particle size distributionby
choosinga sample of particles hit by random points selected in the space enclosing the
particles.
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
WE1GHTED DISTRIBUTIONS AND APPLICATIONS
183
2.5 Examples
It may be worthwhile now to record some of the important distributionsand their
modifiedforms under siz.e-biaswith w(x) = x. It may be noted that in the case of discrete
distributionsXU'must necessarilyhave p(xw = O) _ o
3. Some Properties of Mean Values of Weighted Distributions
In this section, we prove a numberof theoremson the inequalitiesbetween the mean
values of two weighteddistributions.
Theorem 1. Let a non-negativerv X havepdf f(x) with E (X) < . Let xw havepdf fw(x)
= xf(x)/E (X). Then E(XW) - E(X) - V(X)/E(X), where V stands for variance, and
thereforeE(Xt0)> E(X) for non-degencrateX.
Proof: Straightforward.
Remark. The inequalityE(XW) > E(X) substantiatesthe intuitive result that the sizebiasedxw recordslargervaluesof X more often than theirnaturalfrequency,and its smaller
values less often.
TheoresH2. Let rvX havepdf f(x). Furtherlet the weight functionw(x) > OhaveE(w(X))
< . Let X2'be the w-weightedrv of X withpdf fW(x)= w(x)f(x)/E(w(X)). Then E(XW)>
E(X) if cov[X,w(X)] > Oand E(XW)< E(X) if cov [X,w(X)] < O.
TABlE1
Certain Basic Distributionsand their Size-BiasedForms
Size-biasedrv
pf(pdf)
Randomvariable(rv)
Binomial,B(n,p)
(X)px(l -p)n x
1 + B(n-l,p)
NegativeBinomial,NB(k,p)
(k + x-1
1 + NB(k+ l,p)
Poisson,Po(X)
e-XAt|x
Logarithmic
series,L(ov)
{-log(l -ov)}
Hypergeometric,
H(n,M,N)
( )M(X)(N-M)(n-X)/N(n)
1 + H(n-1 ,M-1 ,N-1 )
Binomialbeta, BB(n,a,^y)
( ),((a + x, zy+ n-x)/$(a,^y)
1 + BB(n-l,a,^y)
Negativebinomialbeta, NBB(k,a,^y)
(
Gamma,G(oe,k)
oekxk-le-6Yx/r(k)
G(oe,k+ 1)
Betafirstkind,B1(h,^y)
x6-l(l -x-l/
B1(6+ 1,8)
Betasecondkind,B2(6,^Y)
x6-l(l + x) /:(6
Pearsontype V, Pe(k)
x -k -1 exp(-x- l)/r(k)
Pe(k-1 )
Pareto,Pa(oe,^y)
zovzx-(T+1), x 2 oe
Pa(oe,oy-1 )
Lognormal,LN(,u,(T2)
(27r¢2)I/2exp-2(109 x
)
k
$
1 + Po(X)
oe/x
1 + NB(l,ov)
):(a + x,^y+ k)/,B(a,^y) 1 + NBB(k+ l,a,^y)
(05)/)
oY 6)
B2(6+ 1,^Y-6-1)
8
)
1
LN(y+ (J2tr2)
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
184Proof:
BIOMETRICS,JUNE 1978
Straightforward.
Theoress1
3. LetX andxw be
definedas in Theorem2.
E(X'8)> E(X) if w(x) T in x
Furtherlet X be aonand E(Xtt')< E(X) if
negative.Then
w(x) 1 in x, whereT
meansdecreasing.
meansincreclsingand
Proof:Follows from
w(x) T x, which can b Theorem2 sincecOV[X,w(X)]
> 0 if w(x) T x
andthe reverseis trueif
easily demonstrated.
Theorestl
4. Let non-negativerv
X havepdf f(x). Let
E(wi(X))< O for i = 1, 2,
the weightfunctions
definingthe corresponding
as/(x)> 0
ThenE(XU'2
) > E(X'l'l
wi-weightedrv's of X deIlotedbyhave
) if r(x) = W2(X )/Wl(X
) T in x and
X"'i.
E(X'1'2
) < E(X'l'l) if r (x ) 1 in
Proof: Follows when one
x.
notes that XW2
is r-weightedrv of Xlt11@
Resnark.The resultin
Theorem
4
is
adecisivecriterion,
interestingin that the ratio of
and not any
the weightfunctionsis
think.
For example,let w1(x)= directinequalitybetweenthe weight
fUIlGtiOIlS, clS one may
x(x-1),
and
(x- 1) =
1/[1-(l/x)] i x, implyingthat w2(x)= x2,fromwhichr(x) = W2(X)/W1(X) =X/
E(X't'2)
< E(XU'l),
observes
w2(x)> w1(x).
and not the reverse
becauseOIlC
4. A NaturalExasnple
of WeightedBinomial
4.1
Distribution
Test of the Model
Let us ascertainfrom
each male student in a
himself,
class the
and sistershe has and
denote by k the numberof numberof brothers,including
numbers
of brothersand sisters.
studentsand by B
ratio
of brothersto the total What would be the approximatevalues of aIldS the total
B/(B
numberof children and (B
children
+ S)/k, the average -r 57),the
(a.n.c.) per family?It is
easily
numberof
seen
that
B/(B + S) will be an
proportion
of boys among the
overestimate
children
ill
of the
the
generalpopulationwhich is
similarly
(B + S)/k is biased
about
upwards
half,
as
and
an
population.
Surprisingly,when the number estiamteof the a.n.C.per family ill the general
of
make
boys
a fairly accurate
in the class is not very
small, you CaIl
of the relative
(B
+S).This was statedprediction
magllitudes
of B and S, and the
in the form of an
ratio B/
that
empiricaltheoremin
(B-k)/(B + s-k) is closer to
half than B/(B + 57)(see Rao (1977). It was observed
on
the
hypothesisthat the distribution
of b, the numberof Table 2). This was explained
number
of brothersand sisters,
brothersgiveIlb + s,
is a weightedbinomial
the
with weightproportional the total
observation,
i.e., writingC for the
to the size of
binomialcoeffieient,
p(b | b + s) = b C(b +
s,b)(.5)b+s/E(b)
=C(b+sl,b-1)(5)t2+5-1
(4.1)
Let
(bi, si), i = 1, , k be
the
observation
of
the
k
boys,
model
and B = Nbi,s = ESi.
(4.1)
Thell under
p(B-k = x | B + S-k
- T) = C(T, x)(.5)T.
To
test
the
hypothesisthatb has a weighted
(4.2)
distributionof the type (4.1), we
havecomputed
X2=(2x- T)2/Ton 1d.f.
from
thedata for each city
(43)
(see
plausibility
of the hypothesis.A Table 2). The chi-squarevalues are small,
model
(4.1),was carriedout by more detailedtest of the hypothesis,i.e. the indiccltingthe
validityof the
computingchi-squarevaluesfor
city.
The
chi-squarevalues were again
each familysize withill
small.
each
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
WEIGHTED DISTRIBUTIONS AND APPLICATIONS
185
TABLE2
Estimatesof Sex Ratio and FamilySize from Data on /\4ale Respondents
Placeandyear
Delhi
Calcutta
Waltair
Hyderabad
Tirupati(students)
Tirupati(staff)
Poona
Tehran
Isphahan
Tokyo
Columbus
State College
CollegeStation
London&Bradford
:75
: 63
:69
: 74
: 75
: 76
: 75
: 75
:75
: 75
:75
:75
:76
:76
k
B
S
29
104
39
25
592
50
47
21
11
50
29
28
63
43
92
414
123
72
1902
172
125
65
45
90
65
80
152
80
66
312
88
53
1274
130
65
40
32
34
52
37
90
39
B
B-k
B + S B + S-k
.58
.57
.58
.58
.60
.57
.66
.62
.58
.73
.62
.68
.63
.67
.488
.498
.488
.470
.507
.484
.545
.500
.515
.540
.523
.584
.497
.487
x2
B+ S
k
6
.07
.04
1.09
.36
.50
.25
1.18
.19
.06
.49
2.91
2.53
.01
.02
5.45
6.96
5.41
5.00
5.36
6.04
4.04
5.00
7.00
2.48
4.00
4.18
3.84
2.77
4.25
5.30
4.36
3.46
4.24
4.20
3.15
3.18
5.70
2.25
2.79
3.21
3.04
2.15
* It is interestingto note that Tokyohas the smallestfamily size (numberof children).Amongthe Indiancities Poona has a
smallervalue, and it would be of interestto investigatethis phenomenon.
Note It has not been possibleto ascertainthe actual familysizes in the populationsof differentcitiesquoted in Table2 except
in the cases of Indiancities. In these cases the figures were close to 6, as predicted.
4.2 Estimation of Family Size
Let f(b, s) be the relativefrequencyof familieswith b brothersands sistersin the general
population.Then underthe assumptionmade in Section4.1 the probabilityof the observation (b, s) or (b, t) wheret = b + s coming into our recordis
fW(b,s) = b f(b, s)/E(B) = b p(t) p(b s t)/E(B)
(4.4)
where
p(b | t) = C(t b) 7Uh( 1 -
7r)t-h
(4.5)
Then
pW(t)= t p(t)/6, b = E(T)
where6 is the parameterof interest.Hence
(4.6)
Et(1/T)= 1/6
(4.7)
which shows that the harmonicmean of the observedti is an estimatorof b.
If the exact form of p(t) is not known then we may estimate6 by
6
= k/z( l /tz)
(4.8)
Theestimate6 of b is computedfor each city and given in Table2. It is seen that the observed
average(B + S)/k is largerthan the estimatedvalue in each case.
Sincep(b | t) given in (4.5) is independentof 6, we observethat underthe assumedmodel,
t is sufficientfor b. If the form of p(t) is known, we may use the likelihoodfunction
k
k
pW(ti ) = 6 - k H
rl
i=l
tiP(ti)
(4.9)
i=l
to estimateb. For instance,if p(t) is geometric,then the maximumlikelihoodestimateof 6 is
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1978
186
g
where T = tl +
2K + 2
(4.10)
+ tk. If p(t) is logarithmic,then
61
1-(1 - cv)log (1 - cv)l-lcr
(4.1 1)
wherecx= 1 - (k/T). The formulae(4.10) and(4.11) are not appropriatefor the presentdata
as p(t) does not seem to be eithergeometricor logarithmic,as claimedin some studies(see
Feller, p. 141).
5. QuadratSamplingwith VisibilityBias
For the purpose of estimatingwildlife population density, quadratsamplinghas been
found generallypreferable.Quadratsamplingis carriedout by first selectingat random a
numberof quadratsof fixedsize fromthe regionunderstudy and ascertainingthe numberof
animals in each. Following Cook and Martin (1974) we make the assumptiolasas given
below:
A1: Animalsoccurin groupswithineach quadratandthe numberof groupswithina
quadrathas a specifieddistribution.
A2: The numberof animalsin a group has a specifieddistribution.
A3: The numberof groupswithin a quadratand the numbersof animalswithinthe
groups are all independentlydistributed.
A4: The method of samplingis such that the probabilityof sighting(recording)a
group of x animals is w(x).
Let X andX't'be the rv'srepresentingthe numberof animalsin a groupin the population
and as ascertained.Similarly,let N and N2' be the rv's for the numberof groups within a
quadrat.It is clear that since the method of ascertainmentdoes not give equal chance of
selection to groupsof all sizes (unless w(x) is constant), the rv's X and
do not have the
same distribution,and so is the case with N and NW.The following theoremprovidesthe
basic resultsin quadratsamplingtheory.
Theorem5. Under the assumptionsA1 - A4 we have the following results.
X't'
(i)
P(N2'= sa1| N= n)
=
(
)6tzrtl(l
_
66))rl
r
where
U= E
w(x) P(X = x)
is the visibilityfactor (the probabilityof recordinga group).
(sn ) (
)
(
),
i.e.the visibilitybias inducesan additivedamagemodel on the true quadratfrequencywith
binomialsurvivaldistribution(see Rao 1965).(iii) The probabilitythat m observedgroupsin
a quadrathave xl,
, xm animalsis
(ii)
(
)
nEt
rn
P(Xl'v= X1,
, Xm&t)
= Xrn| NtV= m) =
pt
P(XU'i= xi)
i =1
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
WEIGHTED DISTRIBUTIONS AND APPLICATIONS
187
where it may be noted,
P(XW = x ) = w(x ) P(X = x )/co .
(iv)
Let Sw = X1W+
+ XmwThen
p(sw = y) = m
E p(NW = m) p(sw = y | m)
=1
p(sw = y | m) =
,
w(x,)
. w(xm)P(X1= x,)
P(X
= xm)
Proof: Underthe assumptionsandnotationsusedwe havethe basicprobabilityequality
P(N = n, Nw = sn, Xlw=
xl,
, Xmw= xm Xm+l= Xm+l
= P(N = n) (m
) I|
j-l
From (5.1) summingout Xm+l,
P(N = n, Nw = m, Xlw = xl,
, Xn= Xn)
P(Xj= xj)w(xj) I| [1-w(xj)] P(Xj = Xj). (5.1)
j=m+
, Xn we have
, XmW
= xm)
= P(N = n) (
) com(l-co)n-m II P(Xj'V= Xj).
(5.2)
Then the results(i), (ii), and (iii) of the theoremfollow from (5.2). Summing(5.2) over n
frommtom,wehave
m
P(N = m, Xlw = xl,
, XmtV
= xm) = p(Nw = m) I|
P(X
j=l
from which the result(iv) follows.
Note 1: The expression(5.3) enablesus to write down thejoint likelihoodof the numbers
of groups observedin diSerentquadratsand the numbersof animals observedin all the
groups sighted.Thus, if ml,
, mkare the numbersof groupsin k quadratsand xfj is the
numberof animalsin theXthgroupof the ith quadrat,thejoint likelihoodis the productof
k
tI
P(NW= mi)
(5 4)
i = 1
and
k
mi
II
II
i=l
j=l
P(XxxW
= xev).
(5.5)
Results (ii) and (iii) of the theoremgive the methodsof computingthe individualterms in
(5.4) and(5.5) fromthe populationdistributionsof N andX andthe weightfunctionw(x). In
general,the unknownparametersare those occurringin the specifieddistributionsof N and
X and the additionalvisibilityfactorco(or p the probabilityof sightingan animal).All these
could be estimatedusing the productof (5.4) and (5.5) as the likelihood function.
Note 2. Cook and Martin(1974) considerthe specialcase where
N
X
PO(X),Poisson with parameterA,
(5.6)
ax SX/g(0),power seriesdistribution,
(5.7)
w(x) = 1-(1-d)X.
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
(5.8)
N/1 po(r6)
6 = ),C;>
and xw aX fX/g(0),
0
=
0d
BIOMETRICS,JUNE 1978
188
It may be noted that whateverw(x) may be, Nw
co=
z
pO(6)6
sXw(x) 0 x/g(S) and xw
!
Acowhere
ax w(x) Sx/cog(S)
Thus, there are three parameters6, co and 0. Then the parameter6 is estimated from the
likelihood (5.4) and co, 0 from (5.5). Cook and Martin (1974) provided the necessary
computationsin such a case, choosing w(x) as in (5.8).
If N is not a Poisson variablethen the distributionof NU'involves coas an additional
parameter(see Rao 1965 and Sprott 1965), in which case the product of (5.4) and (5.5)
providethe joint likelihoodfor the estimationof all the unknownparameters.
In the special case where N and X are as distributedin (5.6) and (5.7) respectivelyand
w(x) = dx (i.e., when a group is observedif and only if all the animalsare sighted),
so that the parametersA, 0 and d are confoundedand are not individuallyestimable.In a
differentcontext,Kemp (1973, 1975)observesthe confoundingof 0 and d in the specialcase
of N being degenerateat one.
Re'sunle'
Quand un chercheur recueille une observation qui suit de par sa nature un nzodele stochastique, I'observation recueillie n'obe'iraa la distribution d'origine, que si chaque e'le'mentobservchance d'eAtrerecueilli. Un certain nombre d'articles sont apparus ces dix
able recJoitla seleAnle
dernieres annees, qui utilisent implicitement les concepts de distributions ponde'rees avec
echantillonnage d'efJectif biaise.
Dans cet article, nous exanlinons quelques modeles ge'neraux conduisant a des distributions ponde'reesnon ne'nessairementlimite'espar l'unite'.Les exemples comprennent. I'e'chantil
sur e'chantillons,les nlodeles de don1nlagesadditifs, le
Ionnage probabiliste.dans des enqueAtes
biais de visibilite'lie' a la nature du recueil des donne'eset l'echantillonnage a deux niveaux. On
rapporte plusieurs distributions importantes et leurs deformations provoquees par un biais
d'efectif On donne quelques the'orenlessur les ine'galite'sentre valeurs moyes?nesde deux distributions pondere'es. Les re'sultatssont appliques a l'analyse de donnees se rapportant a des
populations hunlaines et au controle des populations sauvages.
Pour des populations humaines, la question suivante est posee et discutee . denlandons a
y conlpris) et de soeurs
chaque etudiant d'une classe, de sexe male, le nombre de freres (lui meAme
de sa fratie. Soit h le nombred'etudiants, B et S le nombretotal defreres et de soeurs. Que seront
les valeurs approche'esde B/(B + S), rapport du nombre de freres au nombre total denfants, et
de (B + S)/h, nombre moyen d'enfants par famille ? On montre que B/(B + S), est une surestimation de la proportion de garcJonsparmi les enfants de la population gene'rale,proportion
eglale a I /2 environ.Et que de meme, (B + S)/h est biaise'vers les valeurs e'leve'esen tant qu'estinlation du noselbred'enfants par famille dans la population gene'rale. On propose quelques suggestions pour l'estimation de ces parametres de population.
Enfin, en vue d'estimer la densite des populations sauvages on formule quelques resultats
dans le cadre de l'e'chantillonnagequadratiqueavec biais de visibilite'.
References
Cook, R. C. and Martin, F. B. (1974). A model for quadrat sampling with visibility bias. Journal of the
American Statistical Association 69, 345-349.
Feller, W. (1968). An Introductionto Probability Theory and its Applications. \tolume 1, John Wiley and
Sons, Inc., New York.
Fisher, R. A. (1934). The effects of methods of ascertainment upon the estimation of frequellcies.
Annals of Eugenics 6, 13-25.
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions
WEIGHTED DISTRIBUTIONS AND APPLICATIONS
189
Haldane J. B. S. (1938). The estimationof the frequencyof recessiveconditionsin man. Annals of
Eugenics (London) 7, 255-262.
Kemp, C. D. (1973). An clementaryambiguityin accidenttheory.Accident Analysis and Prevention,
371-373.
Kemp,C. D. (1975). Models for visibilitybias in quadratsampling.Proceedings of the 40th Session of
the International Statistical Institute, Warsaw
Neel and Schull(1966). Human Heredity. Universityof ChicagoPress,Chicago,211-227.
Patil, G. P. and Ord, J. K. (1975). On size-biasedsampling and related form-invariantweighted
distributions.Sankhya, Series B 38, 48-61.
Patil, G. P. and Rao, C. R. (1977). Weighteddistributions:a surveyof their applications.In Applications of Statistics. P. R. Krishnaiah(ed.), North Holland PublishingCompany,383-405.
Rao, C. R. (1977).A naturalexampleof a weightedbinomialdistribution.American Statistician 31, 2426.
Rao, C. R. (1965). On discretedistributionsarisingout of methodsof ascertainment.In Class,cal and
Contagious Discrete Distributions. G. P. Patil (ed.), PergamonPress and StatisticalPublishing
Society,Calcutta,320-332.
Rao, C. R. and Rubin, H. (1964).On a characterization
of the Poissondistribution.Sankhya, Series A
25, 295-298.
Sprott,D. A. (1965). Somecommentson the questionof identifiabilityof parametersraisedby Rao. In
Classical and Contagious Discrete Distributions. G. P. Patil (ed.), StatisticalPublishingSociety,
Calcuttaand PergamonPress,333-336.
ReceivedDecesnber1976; R evisedAugust 1977
This content downloaded from 195.251.235.181 on Wed, 8 Oct 2014 07:39:47 AM
All use subject to JSTOR Terms and Conditions