Some Problems Connected With Statistical Inference*
by
D~
R. Cox
Department of Biostatistics, School of Public Health
Institute of Statistics
Mimeo SeriesNo o lSi
* Invited
address given at a joint meeting of the Institute of
Mathematical Statistics and the Biometric Society, Princeton
University, Princeton, N. J., April 20) 1956.
Some problems connected with statistical in£erence*
D. R. Cox
Department of Biostatistics, School ot Public Health
1.
Introduction.
The aim of this paper is to make some general comments about
the nature of statistical inference.
Most of the points are implicit or explicit
in the literature or in current statistical practice.
2.
Inferences and
decision~
For the present discussion a statistical inference
will be defined as a statement about statistical populations made from given
observations with measured uncertainty. An inference in general is an uncertain
conclusion.
Two things mark out statistical inferences.
First the information
on which they are based is statistical, i.e. consists of observations subject to
random fluctuations.
Secondly we explicitly recognize that our conclusion is
uncertain and attempt to measure, as objectively as possible, the uncertainty
involved.
A statistical inference carries us from observations to conclusions about the
populations sampled.
A scientific infereme in the broader sense is usually
concerned with arguing from essentially descriptive facts about populations to
some deeper understanding of the process under investigation.
The more the
statistical inference helps us wi. th this latter process the better.
Statistical inferences involve the data, assumptions about the populations
sampled, a question about the populations, and very occasionally a distribution
ot prior probability. No consideration of losses is usually involved directly
in the inference although these may affect the question asked.
Statistical decisions deal with the best action to take on the basis ot
statistical information.
Decisions are based on not only the considerations just
*Invited address given at a joint meeting of the Institute of Mathematical Statistics and the Biometric Sociev, Princeton Univ~sity, Princeton, N•. J., April 20,
1956.
- 2 -
listed, but also on an assessment of the 10sse8 consequent on wrong decisions and
on prior information, as well as, of course, on a specification of the set of
possible decisions.
Current theories of decision do not give a direct measure
of the uncertainty involved in the decision.
An inference can be considered as answering the question:
"i/.T}m t are we'·
really entitled to learn f::-om these data?". A decision, however, should be based
on all the information ai.'"ailab1e that bears on t'1e point at issue, inclt'.ding for
e~tample
the prior reason2'J:U.:m.SGS of different expla::J.ations of a set of ciata.
This inforJlBtion that is adc.i tional to the daTa is called prior knowledge.
Now the general idea that wa should,
~n
any application, ask ourselves what
are the possible courses of action to be taken, what the consequences of incorrect
action are, and what prior knowledge is available, is unquestionably of great
importance. 'Why, then, do we bother with inferences which go, as it were, only
part of the way?
First, particularly in scientific problems, it seans of intrinsic interest
to be able to say what the data tell us, quite apart from the course of action
that we decide upon.
Secondly, even in problems where a clear-cut decision is
the sole object, it often happens that the assessment of losses and prior information is highly subjective, and therefore it may be advantageous to get clear the
relatively objective matter of what the data say, before embarking on the more
controversial issues.
A full discussion of this distinction between inferences and decisions will
not be attempted here.
Two further points are, however, worth making briefiy.
First, some people have suggested that what is here called 'inference' should
be considered as
t
summarization of data'.
This choice of words seems not to
recognize that we are essentially concerned with the uncertainty involved in
passing from the observations to the underlying populations.
Secondly, the
distinction drawn here is between the applied problem of inference and the applied
problem of decision-making; it is possible that a satisfactory set of techniques
- 3 for inference could be constructed from the mathematical structure used in
decision theory.
3.
The sample space.
to a sample space L
Statistical methods work by referring the observations S
Over L
of observations that might have been obtained.
one or more probability' measures are defined and calculations in these probability
distributions give our significance limits" confidence intervals, etc.
2.-
is
usually taken to be the set of all possible samples having the same size and
structure as the observations.
R. A. Fisher (see, for example,
;' 7.:
) and G. A. Barnard,
t" 2;
, have
pointed out that
L.
the experiment.
For example if the experiment were repeated" it may be that the
may have no direct counterpart in indefinite repetition of
sample size would change.
Therefore what happens 'Mien the experiment is repeated
L ,
is not sufficient to determine
and the correct choice of
L
may need
careful consideration.
As a comment on this point, it may be helpful to see an example where the
sample size is fixed, where a definite space
L
is determined by repetition of
the experiment and yet where probability calculations over
2..
do not seem relevant
to statistical inference.
Suppose that we are interested in the mean
e
of a normal population and
that, by an objective randomization device, we draw either (i) with probability
1/2, one observation, x" from a normal population of mean 8 and variance
Of
;J..
or (ii) 'With probability 1/2, one observation, x, from a normal population of'
e
mean
where
and variance
\T J...)
1
cr.._A..
o;,..J...
are known,
,
.J...4-
ll":$>rr, and where we know in any particular
'..1.
instance which population has been sampled.
More realistic examples can be given, for instance in terms of regression
problems in which the frequency distribution of the independent variable is koown.
- 4However the present example illustrates the point at issue in the simplest terms.
(A similar example has been discussed trom a rather different point of view in
'!he sample space tormed by indefini te repetition of the experiment is clearly
defined and consists of two real lines
and conditionally on
I.
L. 1 , I~
each having probability 1/2
there is a normal distribution of meaD
l.
e
and variance
. .2..
cr..
•
L
Now suppose that we ask, in the Neyman-Pearson sense, for the test of the
e = 0, with size
e I , where e I ~ 01
null l\vpothesis
alternative
Consider two tests.
say O.oS, and wi. th maximum power against the
.
»~
First, there is what we may call the conditional test,
in which calculations of power and si ze are made conditionally wi thin the particular distribution that is known to have been sampled.
x. >
regions
/. &4 crf
or
x
> /. ~~
(r
~
This leads to the critical
, depending on which
distribution has been sampled.
This is not, however, the mos t pO'Werful procedure over the whole sample space.
An application ot the Neyman-Pearson lemma shows that the best test depends slightly
on
9
1
J
t:r.1)
rr.::L
J
but is very nearly of the following form.
Take as the
critical region
x
>
1.28
~
, if the first population has been sampled
x '> 5(;;2 ' if the second population has been sampled.
Qualitatively, we can achieve almost complete discrimination betweenB= 0 and
e={)
I
when our observation is from
L.2. '
rate to rise to very nearly 10% under
2. 1
and therefore we can allow the error
•
It is intuitively clear, and oan
easily be ver:tfi.ed by oalculation, that this increases the power.. in the region
of interest, as compared with the conditional test.
(The increase in power
could be made striking by haVing an unequal division of probability between the
two lines.)
- sNow it the object of the analysis is to make statements by a rule with certain
desirable long-run properties, the uncondi tional tes t jus t given is unexceptionable.
It, hoever, our object is to say 'what we can learn fran the data that we
have', the unconditional test is surely
have an observation trom
21•
unacceptable~
Suppose that we know we
The unconditional test says that we can assign
this a higher level of significance than we ordinarily do, because, if we were
to repeat the experiment, we might sauple some quite different distribution.
But
this tact seems irrelevant to the interpretation of an observation which we know
came from a distribution with variance
(i-
1
.J..
•
That is, our calculations of
power, etc. should be made conditionally wi thin the distribution known to have
been sampled, i.e. we should use the conditional test.
To sum up, in statistical inference the sample space
L.
must not be deter-
mined solely by considerations of power, or by what would happen if the experiment
were repeated indefinitely"
avoided, 2-
If difficulties of the sort just explained are to be
should be taken to consist, so far as is possible, of observations
similar to the observed set S, in all respects which do not give a basis for
discrimination between the possible values of the unknown parameter B of interest.
Thus in the example, information as to whether it was 2:.-, or
sampled, tells us nothing about
ally on
2. 1
or
Lz
()
x...z..
that was
, and hence we make our inference condition...
•
Fisher has formalized this notion in his concept of ancillary statistics
C61 (~l·}.
As orignially put forward, this seems insufficiently general to deal
Wi th such s1 tuat10ns as the 2 x 2 contingency table, and the following generali-
zation is put forward.
, with parameters
estimation ot
such that
<f,
~
Suppose that we wish to make an inference about parameters
¢
--)
regarded as nuisance parameters and that exhaustive
can be based on fu..'1ctions ~, ~,!
of the observations,
- 6 -
e ;
the distribution of A-4
t given .....,
a depends only on
(ii) the values of
a,s
"'t4f
a, s,
,
~
'~"
'v\
l'
P (a
Af\
then inference about
Q
4"l\
r
i.e. the sample space
9 in that their joint
give no information about
J)
I:,)~ ~j ~
distribution function
~f'"
"¥'
there exis ts
~,
(J.'
. I
~
is such that for any
such that
1"*1'
, s ; t9 , ~)
N:\
f
=
'..,.
(a , s
M
~.,\
j
()
,
-;1\
t);
should be based on the distribution of
should consider
a
t
'"'I-
*
given
a,
""1'
as fixed and equal to its observed
.~
value.
To apply this defin:1.ti.on we hays to !'ega::.'d our cbservations as generated
by a random process; the
apace to those point
The equation
*
d..;l:·~.r:.:.t.ion
relev~nt
simply talls us how to cut down the sample
to the interpretation of the observations we have.
is the formal expression of the condition that
no basis for discrimination between different values of
a
(and a) give
.".
'V\
~e
.~~\,
For example let r 1 , r be randomly drawn from Poisson distributions of means
2
,h, J /~l. and let /;. /./~ = (Si be the parameter of interest; that is write
the means as
of
¢ , ¢~
where
¢
is a nuisance parameter.
The likelihood
r , r 2 can be written
I
a!
x
at
where
t
=r l ,
a
=~l
t! (a - t)!
+ r 2 " The equation
*
is easily shmvn to
telling us that a gives us no information about ry
<
~e
satisfied,
Therefore significance and
confidence oalculations are to be made condi. tionally on the observed value of a,
as is the conventional procedure [12
J .
Note tha t if the populations for
atudy were selected by a random procedure independent of':(
, the likelihood
would be changed only by a factor independent, of f::} and the final choice of sample
- 7space would be unchanged o
A method of inference that used only the values of likelihood ratio 'WOuld
avoid these difficulties
Anotr...eJ:' impor-tant prob:.'3'1l
comv~cted
wj.th the choice of the eample space con-
cerns the possibilit,y and des ir abili ty of making inferences within finite sarrple
speces obtain.ed
d~.scuss~d
here.
b~r
permuti!lg the observat.ions
• This matter will not be
- 8 -
4. Interval estimation. ilfuch controversy has centered on the distinction between
fiducial and confiden.ce
ost:~J)~.!? tion.
mathematics, b:l.t abo·l:c. 't,rc
(i)
The i'iducia:'.
L~8··.H3ral
cl:::-'P.:'~c,,;:l
Here follow £'ive remar.ks, not about the
aims of the two
med;h~ds"
loads to a distr5.uu!i:I.on
f')j,.1
t:1e unknowll parameter,
whereas the method of cor.fidence intervals, as uS11ally formulated, gives only
one interval at some preselected level of probabili t~r.
a distinct point in favor of the fiducial method.
confidence interval such as
(i - 1.96 rr/,/n,
a sense in which the unknown mean
This seenE" at first sight
For wh'3n we write down a
x ... 1.96 CFllt'.),
there is certainly
e is likely to lie near the center of the
interval, and rather unlikely to lie near the ends and in which, in this case, even
if 8 does lie outside the interval, it is probably not far outside.
The usual
theory of confidence intervals gives no direct expression of these facts.
Yet this seems to a large extent a matter of presentation,; there seems no
reason why we should not work with confidence distributions for the unknown
parameter.
These can either be defined directly, or can be introduced in terms
of the set of all confidence intervals at different levels of probability.
State-
ments made on the basis of this distribution, provided we are careful about their
fom, have a direct frequency interpretation.
In
ap~lications
it will otten be
enough to specify the confidence distribution, by for example a pair of intervals,
and this corresponds to the
CODman
practice of quoting say both the
95
per cent
and the 99 per cent confidence intervals.
If we consider that the object of interval estimation is to give a rule
for making on the basis of each set of data, a statement about the unkmwn parameter,
a certain proportion of the statements to be correct in the long run, consideration
of the confidence distribution may seem urmecessary and possibly invalid.
The
attitude taken here is that the object is to attach, on the basis of data S, a
measure of uncertainty to different possible values of B , soowing what can be
- 9inferred about
G
trom the data.
The frequency interpretation of the confidenoe
intervals is the way by which the measure of uncertainty is given a concrete interpretation, rather than the direct object of the inference.
It is then diffi-
cul t to see an objection to the consideration of many confidence statements
simultaneously.
As an example, ccnsider the estimatim of the mean t9 ci a normal population of unit variance, from a single observation x, when it is given that
~ :?
O.
The natural confidence distribution is
at G
for
=0,
6) ~ (),
a point probability,
_.~
,,_I
'i( ~
';t)
a continuous dis tribution __'.:
,Ii). ,7)
The frequency interpretation of this is that if we were to select always the
set of 19 values covered by some fixed part of the confidence distribution,
e.g. the lower 95 per cent, then for any
();'7 U ,
95 per cent of trials in the long run. For
~
the true value is covered in
=0
,
the corresponding fre-
quency exceeds 95 p:lr cent, or can be made equal to 95 per cent by a natural
process of random selection.
Such a random process to achieve exactly 95 per
cent coverage is a process to eXplain the concrete meaning of the confidence
distribution; it does not seem relevant to the actual use of the distribution
in applica tions.
If the restriction is that
<Y 7 C
there is a difficul:'y in that t9 ': 0
is an inadmissable parameter value and it is not sensible to attach confidence probability to a parameter value that cannot occur.
This seems, how-
ever, a rather artifical matter and it seems reasonable to deal with it by
the convention that 0 = (' \ is to stand for some very small positiva value
of
~.
- 10-
(ii)
It is sometimes claimed as an advantage of fiducial estimation that
it is restricted to methods that use 'all the infonnation in the data,' while
confidence estimation includes any method giving the requisite frequency interpretation.
'Ibis claim is lent sane support by those accounts of confidence
interval theory lIhich use the lIOrds 'valid' or 'exact' tor a method of calculating intervals that has, under a given mathematical set-up, an exact frequency interpretation, no matter how inadequate the intervals may be in telling us what can be learnt from the data.
However, good accounts of the theory of confidence intervals stress equally the need to cover the true value wi th the required probability and the requirement of having the intervals as narrow as possible.
Very special impor-
tance, therefore, attaches to intervals based on exhaustive estimates.
It is
true that there are differencesbetween the approaches in that the fiducial
method takes the use of exhaustive estimates as a primary requirement, mereas
in the theory of confidence intervals the use of exhaustive estimates is deduced from some other condition.
There does not seem a major difference be-
tween the methods here.
(iii)
The uniqueness of inferences obtained by the fiducial method has
received much discussion recently, ~], [10],
(13J.
Uniqueness is important
because, once the mathematical form of the populations is sufficiently well
specified, it should be possible to give a single answer to the question
'what do the data tell us about
e '.
Yet too much must not be made of this
issue because of the doubts that always arise as to the appropriate theoretical specification.
The present posi tion is tha t several cases are known where the fiducial
method ads to non-unique answers and, so far as I know, none where the confidence intervals, based on exhaustive estimates, are not unique.
It is, of
-11course, entirely po ssible that a way will be found of formula ting fiducial
calculations to make them unique.
But it is clear that at present there is
no ground for preferring the fiducial approach on these considerations.
(iv)
If exhaustive estimation is possible for a group of plrameters,
fiducial inference will usually be possible about art9' one of them or any
combination of them, since the joint fiducial distribution of all the parameters can be found and the unwanted parametem integrated out.
Exact con-
fidence estimation is in general possible only for restricted canbinations
of parameters.
An example is the Behrens-Fisher problem, where exact fiducial
inference is possible.
The 81tuation about confidence estimation in th1s
case is not too clear but is probably that the procedure preferred by Welch,
while giving a close approximatim to an 'exact' sys tem of confidence intervals,
has f'requenCY-ptOperties depending slightly on the nuisance parameters.
This point is,
the possibility that no exact confidence interval solu-
tion to the problem exists was recognized by Welch:
see also
li.4].
Some-
what academic is that these consideratiors are all with respect to those
idealized conditions in which sufficient statistics exist,
however, within
this framework the fiducial technique is certainly the more powerful.
(v)
cation.
The final consideration concerns the question of frequency verifiR. A. Fisher has repeatedly stated that the object of fiducial in-
ference is not to make statements that will be correct nth given frequency
in the long run.
One may agree with this in that one really wants to measure
. the uncertainty corresponding to different ranges of values far
e ,
and it is
qui te conceivable that one could construct a satisfactory measure of uncerUiinty that has not a frequency interpretaticn.
Yet one must surely insist
on some pret1:l{ clear-cut practical meaning to the measure of uncertainty and
this fiducial probability has never been shown to have.
J. T. Tukey's recent
- 12 work on fiducial probability and its frequency verification should be referred to here.
It seems, therefore, that with some shifts of emphasis, the theory
ot confidence intervals is adequate to deal with the problem of the interval estimation.
S.
~ignificance
Suppose now that we have a null hypothesis Ho
tests.
concerning the population or populations trom which the data S were drawn
and that we enquire 'what do the data tell us concerning the possible
truth or falsity ot Ho ?'
Adopt as a measure of consistency with the null
hypothesis
probr:
lJhat is,
'We
c..
( data showing as much
or more evidence against Ho as S
calculate, at least approximately, the actual level of
significance attained by the data under analysis, and use this as a measure
ot conformity
wi th the null hypothesis.
Significance tests are often used
in practice in this way, al.though many formal accounts of the theory of
tes ts sugges t, implicitl¥ or explicitJ.y, qUlite a ditferent procedure.
Namely,
we should, atter considering the consequences of wrongly accepting and re-
jecting the null hypothesis, and the prior knowledge about the situatiOl ,
1'1x a significance level in advance of the data.
This is then used to
form a rigid dividing line between samples for which we 'accept the null
hypothesis'
~nd
those for which we
I
reject the null hypothesis.'
A de-
cision-type test of this sort is clearly something quite different from
the application contemplated here.
Two aspects of significance tests will be discussed briefly here.
.,
- 1.3 First there is the question of when significance tests are useful* and
secondly there 1s the justi.fication of (**) as a measure of conformity.
'lbe discussion is restricted to the testing of simple hypotheses about
unknown parameters, with or without nuisance parameters.
if
e
For example,
is the mean of a normal populatim, we consider tests of 13> = 0 ,
but not of
e
L:.. 0 •
Perhaps the most frequent type of application of significance tests,
at any rate in technological work, is in situations where the null hypothesis is almos t certainly false and where, moreover, we have no particular reason to think that it is even approximately true.
For example,
in the comparison of two alternative industrial processes we would usually be certain that an experiment of sufficient sensitivity would show
there to be some real difference between the processes in whatever property is of interest.
The significance test is concerned wi th whether
we can, from the data under analysis, claim the existence of a difference.
Or, to look at the matter slightJ.y differently, the significance level
tells us at what
]e vels
the confidence intervals for the true difference
include only values with the same sign as the sample difference.
This
idea that the significance level is concerned with the possibility that
the true effect may be in the opposite directi.on from that observed,
occurs in a different way in
~].
Hardly ever is the answer to the significance tes t the only thing
we should consider: whether or not significance is attained at an interesting level (say at least at the
* F.
J. Anscombe
of this.
[1]
10%
level), some considerati.on shculd
has recently given a very interesting discussion
- 14 be given to whether differences ihat may exist are of pratical importance,
i.e. estimation should be considered as well as significance testing.
A
possible exception to this is in the analysis of very limited amounts of
data.
Here it can often be taken for granted tha.t differ:"lnces of p!'atica1
importance are consistant with the data, the point of the statistical analysis being to see whether the direction of any effects has been reasonably well established.
The problem dealt with by a significance test,
P.S
jv.st
cO~lside:r'ed,
is
different from that of deciding which of two treatments is to be recommended for future use..
This carmot be taclded 'Without careful consideration
of the differences of practical importance, the losses consequent on wrong
decisions and the prior knowledge.
Another type of application of significance tests is to situations
where there is a definite possibility that the null hypothesis is very
nearly true.
(Exact truth of a null hypothesis is unlikely except in a
genuine uniformity trial.) A full analysis of such a situation would involve consideration of what departure from the null hypothesis is considered of Jr8ctical importance.
However, it is often convenient to
test the null hypothesis directly; i£ significant departure trom it is
obtained, consideration must then be given to whether the departure is
of practical importance.
Of course, we probably in any case will wish
to examine the problem as one of estimation as well 8S of significance
testing.
Consider now the choice of (**) as the quantity to measure significance.
To use the definition, we need to order the points of the sample
space in terms of the evidence they provide against the null hypothesis.
There are t1D ways of doing this.
The first, and most satisfactory, is
-lS the introduction, as in the usual development of the Neyman-Pearson theory,
ot the requirement of maximum sensitivi ty in the detection of certain types
of departure from the null hypothesis.
That is, we wish, in the simpliest
case, to maximize, i f possible for all fixed
probe
<9;
~
1...1
(attaining significance at the t. level),
where 9 represents a set-up which we derive to distinguish from the null
hypothesis.
This leads, in simple cases, to a unique specification of
the significance probability (**).
A second method of ordering the sample points to determine (**) i
which leads to a unique answer for discrete distributions, involves the
null hypothesis itself am uses no appeal to the notion of al ternative
b;ypothesea.
We say that sample point 81 shows as much or more evidence
agains t Ho than the sample point 82 if only
prob 1:
f Ho ) '-.
_.
prob
r.
(52' He),
and hence calculate (**) by summing over all points wi th probability, under
the null hypotheses, less than or equal to that of the observed point.
We
may call this a test of pure consistency with the null hypothesis as opposed
to the preVious type, which we may call a test of specific discrimination.
It is clear that 1£ we are in a poSition to specify what type of alternative
we wish to detect, it 'Will be much better to use the first type of test.
In some standard cases, the two methods give identical answers.
The next question to consider is why we sum over a whole set of sample
points rather than
discussed.
11:> rk
in terms only of the observed point.
This has been
The advantage of (**) is that it has a clear-cut physical inter-
pretation in terms of the formal scheme of acceptance and rejection contem-
- J6 -
pIated in the Neyman-Pearson theory.
To ob tain a measure depending only
on the observed sample point, it seems
neceB~ary
to take the likelihood
ratio, for the observed point, of the null hypothesis versus some c')nventionally chcsen a1 t.ernative (see
[3J ),
and the practical meaning
that can be given to this is much 5S clear.
But consider a test of the
following discrete null hypothesis Ho , Ho ;
Sample value
probe under Ho
prob. under
0.7$
o
0.80
1
2
0.1$
0.0$
0.0$
3
0.00
0.00
0.04
0.01
4
flo
I
0.1$
and suppose that the alternatives are the same in both cases aBi are such
that the probabilities (*1:-) should be calcula ted by' summing probabilities
over the upper tails of the two distributions.
Suppose further that the
observation 2 is obtained; order Ho the significance level is 0.0$, while
under Hcf it is 0.10. Yet it is difficult to see why we should sa::! that our
observe.tion is more connected wi. th Hef than wi. th Ho ; this point has often
been made before,
[2] , [9J.
On the other hand, if we are really inter-
ested in the confidence interval type of problem, i.e. in covering oursel-ves agains t the pos sibil1ty that the' effec t' is in the direc tion opposite to that observed, the use of the tail area seems more reasonable.
A s noted in
i}
the use of likelihood ratios ra ther than summed probabi-
lities avoids difficulties connected with the choice of the sample space,
2..
We are faced wi th a conflict between the mathematical and logical
advantages of the likelihood ratio, and the desire to calculate quantities
with a clear practical meaning in terms of what happens ?hen the methods
are used.
Further discussion of this is necessar.y.
-176. Other Questions about populations.
The preceding sections have dealt briefly
wi. th inference procedures for interval estimation and for significance tests.
There are numerous other questions that may be asked about the populations sampled
and it would be of value to have inference procedures for answering them.
For
example there are the problems of selection, e.g. that of choosing from a set of
treatments, a small group haVing desirable properties.
Decision solutions of this
and other similar problems are known; methods that measure the uncertainty connected With these situations do not seem available.
Again the problem of discrimination, i.e. of assigning an individual to one
of two (or more) groups, is usually answered as a decision problem; that is we
specify a rule for classifying a new individual into its aDpropriate group (we may
include a 'doubtful' group as one possible answer).
An inference solution would
measure the strength of evidence in favor of the individual being in one or other
The natural way to do this seems to be to quote the (log) likelihood ratio
group.
tor group I versus group II.
7. The role of the assumptions.
The most important general matter connected with
inference not discussed so tar, concerns the role of the assumptions made in calcUlating significance, etc.
Only a very brief account of this matter will be
given here; I do not feel competent to give the question the searching discussion
that it deserves.
Assumptions that we make, such as those concerning the form of the populations
sampled, are always untrue, in the sense that, for example, enough observations
from a population would surely show some systematic departure from say the normal
form.
There are two devices available for overcoming this difficulty,
(i)
the idea of nuisance parameters, i.e. of inserting sufficient unknown
so
parameters into the functional form of the population,/that a good approximation
to the true population can be attained;
-18.
(11)
the idea of robustness (or stability), i.e. that we may be able to show
that the answer to the significance test or estimation procedure would have bten
essentially unchanged had we started from a somewhat different population
fu:rra~
Or, to put it more directly, we may attempt to say how far the population would
have to depart from the assumed form, to change the final conclusioas
seriously~
This leaves us with a statement that has to be interpreted qualitatiYely in the
light of prior information about distributional shape, plus the information, if
any, to be gained from the sample itself o
practical work, although rarely made
This procedure is frequently used in
explicit~
In reference for a single population mean, examples of (i) are, in order of
complexit.Y, to assume
(a)
a normal population of unknown dispersion;
(b)
a population given by the first two terms of an Edgeworth expansion;
(c)
in the limi t, an arbitrary population (dis tribution-free procedure) ..
The last procedure has obVious attractions, but it should be noted that it is
not possible to give a firm basis for choice betvffien numerous alternative methods,
wi thout bringing in strong assumptions about the power properties required, and
also that it often happens that no reasonable distribution-free method exists for
the problem of interest.
Thus if we are concerned with the difference between
the means of two populations of different and unknown shapes and dispersions, no
distribution-tree method is known that is not palpably artificial.
For these
reasons, and others - for example the exemption of a dependence - distribution-free
methods are not a full solution of the difficulty.
An artificial example of method (ii) is that if we v.ere given a single observation from a normal population and asked to assess the significance of the
difference from zero, we could Plot the level attained against the population
standard deviation
c:s-.
Then we could interpret this qualitatively in the
-19light of whatever prior information about Ci was available.
example concerns
the
comp~rison
of two sample
v~~iances.
shown to be highly significant by the usual F tes":. ar,d a
to show that provi ded tha t 11either
A less artificial
The ratio might be
rot~gh cC:l.]cuJ.at,~.an
made
R exceeded r··~ 0, significance at leas'" say
1·
the 1 per cent level would still occur.
The choice between methods (i) and (ii) depends on
(a)
the extend to which our prior knowledge limits the population from;
(b)
the amount of information in the data about the population characteristic
that may be used as a nuisance parameter;
(c)
the extent to which the final conclusion is sensitive to the particular
population characteristic of interest.
Thus, in (a), if we have a good idea of the population form, we are probably
not much interested in the fact that a distribution-free method has certain de'sirable properties for distributions quite unlike that we expect to encounter.
To comment on (b), we would probably not wish to studentize with respect to a
population characteristic about which hardly any information was contained in the
sample, e.g. as estimate ot variance with one or two degrees of freedom. In
sample
small/problems there is frequently li ttle information about population shape
contained in the data.
Finally there is consideration
(0).
If the final con-
elusion is very stable under changes of distribution form, it is usually convenient
to take the most appropriate simple theoretical fonnas a basis for the analysis
and to use method (ii).
Now it is very probable that in many instances investigation would show that
the same answer would, for praotical purposes, result from the alternative types
of method we have been discussing.
But suppose that in a particular instance
there is disagreement, e.g. that the result of applying a
t
test were to differ
materially from that of applying some distribution-free procedure.
we do1
What would
-20It seems to me that, even if we have no good reason for expecting a normal
population, we would not be willing to accept the distribution-free answer uncondi tionally.
A serious difference between the results at the two tes ts would
usually indicate that the conclusion we draw about the population mean depends on
the population shape in an important way, e.g. depends on the attitude we take
to certain outlying observations in the sample.
I t seems more satisfactory for
a full discussion of the data, to state this and to assemble whatever eVidence
is available about distributional form, rather than to simply use the distributionfree a9proach.
Distribution-free methods are, however, often very useful in
small sample situations where little is known about population form and where
elaborate discussion of the results would be out of place.
Clearly much more discussion of these problems is needed.
REFERENCES
(1]
Anscombe, F. J., "Contribution to the Discussion of a Paper by F.. H.
David and N. L. Johnson," -J,.- .R~
St.atist o Soc~"
._.
.- B,
to appear.
Barnard, G. A., "The Meaning of a Significance Level, II Biometrika
34 (1947), 179-182.
[3J
Barnard, G. A., "Statistical Inference," J. R. Statist. Soc., Supp1.
11 (1949), 11,-139.
Bartlett, M. S., "A Note on the Interpretation of Quasi-8urficiency,"
. Biometrika 31 (1939), 391-392.
(,J
[6]
CreasY, M. A., "Limits for the Ratio of Means," J. R. Statist. Soc.,
B, 16 (19,4), 186-194.
Fisher, R. A., Il'l'he Logic of Indu.ctive Inference," J. R. Statist.
~Q 98 (193,), 39-,4.
Fisher, R. A., "Statistical Methods and Scientific Induction,"
~tatist. Soc~, B, 17 (1955), 69-78.
.:!:!.
Kempthorne, 0., "The Randomization Theory of Experimental Inference,"
J. Am. St. Assoc. 50 (1955), 946-967.
I
Jeffreys,
H.,
Mauldon, J.
Owen, A.
The Theo;;r of Probability. Oxford, 2nd ed., 1946.
G.,
"Pivotal Quantities for Wishart's and Related Distributions and a Paradox in Fiducial 'Iheory, It J. R.
Statist. Soc., B, 17 (19,,), 79-85.
----
a.. 'G.,
"Ancillary Statistios and Fiducial Distributions,"
Sanktya 9 (1948), 1-18.
.
H.,
'tHomogeneity of Results in
Przyborowski, J. and Wilenski,
,Testing Samples from Poisson Series," Biometrika
)1 (1939), 313-323,
Tukey, J. 1;;., "Fiducial Inference," unpublished lectures.
Wilks, S. S., "On the Problem of '!Wo Sanples from Normal Populations
With Unequal Variances," Ann. Math. Statist. 11 (1940),
475 (abstract).
© Copyright 2026 Paperzz