REVISED VERSION
WHEN IS A DISTRIBUTION DETERMINED
BY ITS LETTER VALUES?
Donald St. P. Richards **
Department of Statistics
University of North Carolina
Chapel Hill, N.C.
27514
Rameshwar D. Gupta *
Division of Mathematics,
Computer Science & Engineering
University of New Brunswick
St. John, New Brunswick, Canada
ABSTRACT
Suppose
that
the
differences
between
the
successive letter values of a continuous symmetric distribution
form
a
geometric
progression.
assumptions
it
is
shown
that
determined.
As an application,
Under
the
very
simple
distribution
regularity
is
uniquely
a statistic based only on the
letter values is constructed for testing the hypothesis that a
data
set
is
drawn
from
any
given,
completely
specified
distribution.
AMS 1980
62G25.
Subject
Classification:
Primary
62GIO,
Key words and phrases:
Letter values, hypothesis
data
analysis,
Laplace
distribution,
exploratory
distribution.
Secondary
testing
uniform
*Partially supported by the National Scientific and Engineering
Research Council, grant no. A-4850.
**Partially supported by a grant from the Research Council of
the University of North Carolina.
1.
INTRODUCTION
In recent years,
exploratory methods
for
have become widespread among statisticians.
trend
include
references
the
books
therein.
[1],
There
[2]
are
and
analyz ing
data
Wi tnesses to this
the
several
large
number
of
reasons
for
the
populari ty of the new methods and we shall mention one of the
most important.
summary
For exploratory purposes, many of
statistics
for
a
batch
sorting and counting rules.
be quite
robust~
of
data
are
Consequently,
the· simple
computed
using
these summaries can
that is, a large variation in a small portion
of the data set causes only a small change in the value of the
summary
statistic.
particularly
ascertain
In
useful
for
important
general,
exploratory
investigators
information
who
about data
summaries
need
to
sets,
are
quickly
such as
the
presence of outliers.
On the other hand, classical summaries such as the sample
mean
and variance,
identify
the
while markedly nonrobust,
distribution
underlying
sets
can
be
used
of
data.
to
For
example, if the mean of a sample of independent observations is
normally
distributed
population
is
exploratory
summaries
then
it
necessarily
is
well
known
normal.
are rare.
that
Similar
Indeed,
the
the
parent
results
for
purpose of
thi s
paper is to show that one class of exploratory statistics,
letter values, may be
for
•
certain
used
continuous
to
identify the parent
distributions.
As
an
the
population
important
consequence, we show that the letter values may also be used to
test the hypothesis
that a
data
completely specified distribution.
set
is
drawn
from
a
given,
2
2.
LETTER VALUES
Table:
Relationship between letter valt;tes and tail areas
for continuous distributions
Label
Tag
X
M
1/2
xl
F
1/2 2
x2
E
1/2 3
x
D
x
o
3
Tail Area
1/2 4
5
1/2
C
4
If F(x) is an absolutely continuous distribution function,
k=O,1,2, ...
Thus,
X
o
then
(1 )
is the median, xl is called the upper fourth,
x 2 the
Similarly the k-th lower letter value x_ k is
the unique solution of the equation
upper eighth, etc.
k=O,l , 2 , ..•
In
analogy
fourth,
wi th
etc.
the
The
upper
tags
M
fourth,
( 2)
x_I
(denoting
fourths), E (denoting eighths), etc.,
is
called
median),
the
F
lower
(denoting
are the classical labels
for the letter values, and are due to Tukey (cf. [2]).
.
I
The original motivation for our work is a problem posed in
[1; p. 56].
density
values
There, the reader is asked to find the probability
function
are
of
equally
a
continuous
spaced;
that
distribution
is,
the
whose
di fferences
letter
xk+l-x k
3
are independent of k.
the
Lap~ace
(1)
and (2),
it can be shown that
density function
=
f(x)
Using
~
1
20-
exP(-~)
0-
(3)
, -oo<x<co,
I
where 0->0, -oo<J..l<co, has equally spaced letter values.
Another
distribution
with
uniform
distribution
on
nicely
(-1,1);
spaced
letter
here,
the
values
spacings
geometric progression with common ratio r
In
view
of
these
whether the function (3)
two
examples,
is the only
equally spaced letter values.
xk+l-x k
form
a
geometric
is
xk+l-x k
form
natural
to
a
= t.
it
is
~robability
More generally,
progression,
the
what
density having
if
is
ask
the spacings
the
underlying
distribution?
Clearly some symmetry assumptions are needed;
indeed,
the
standard exponential density
f(x)
= e -x ,
X>O,
(4)
has equally spaced upper letter values, while its
lower letter
values are not even spaced in geometric progression.
prove that if the density f(x)
x o'
and
has
geometric
assumptions,
letter
values
progression,
f (x)
is
is
which
then
uniquely
MAIN RESULT
are
about
spaced
with
determined.
derive the explicit form of f(x).
3.
symmetric
Below we
the
according
minimal
Further,
median
to
a
regularity
we
shall
4
Theorem:
f(x).
Let X be a random variable with density function
Assume that f(x)
the median xo'
X,
except
(ii)
is
(i)
continuous and symmetric about
differentiable everywhere on the range of
possibly
at
xo'
log-concave
(iii)
and
monotonic
decreasing for x>x O. If the spacings between the letter values
of X form a geometric progression,
k
( 5)
xk+l-x k = ar , k=O,1,2, ...
where O<r<l, a>O, then up to location and scale,
__{toe (1-1 xl) <x-I,
f(x)
o
where r = 1/2 1 /
Before
oe ,
this
First,
distribution
remains
valid
case
r=l,
converges
<x
~
if
(iii)
to
OJ or as r
the
=
on
(-1,1):
is
<xX:
Laplace
us
in
(iii)
concave
then
this
is
=
its
case,
theorem
density
(with
the
by
To
the
weaker
recover
function
~=O,
the
of
~=l)
the
Y
as
a
Since O<r<l, then the sum
a/ (l-r) .
the range of X is a finite interval.
scaling X,
of
to
x>x .
O
(3)
some
corresponds
replaced
for
the
density
<x=l
note
1.
Proof of the Theorem:
Hence,
let
the case r=! or
f(x)
Y
~
result,
assumption
I
let
elsewhere
l~<x<OJ.
uniform
hypothesis
(6)
,
proving
implications.
-l<x<l
we may assume without loss
of
By
shifting and
generali ty
that
the
,
5
=
so that a
l-r.
Next, define for
k~O,
G(x)
where F(x)
is the cdf of X.
By the mean value theorem,
there
exists uk in the interval (xk,xk+l) such that
=
F(xk+l)-F(xk)
=
xk+l-xk
b
I
b = 4a'
(2r)k'
the last equality following from
(I)
-v
uk
~
I
as k
~
00.
Let
integer n>O and let y
uk = l-e
=
g(x)
"'
Then,
g(v n )
=
,
(5) •
k>O.
In particular,
Choose
and
fix
an
px+q be the straight line which passes
through the points (v n ' lnf(u n »
\
k
and
and (vn+l' lnf(un+l».
Define
In f(l-e- 1xl ) - (plxl+q), -oo<x<oo.
g(x)
= g(v n +l ) =
is
0,
symmetric
and
is
about
differentiable
x=O,
satisfies
everywhere
except
6
possibly at x=O.
the function hl(x) = l-e- x is concave on
Clearly,
Since h 2 (x) = In f(x)
is concave and monotonic decreasing then
the composite function
(0,00).
(0,00).
(h
.
2
hl)(x) = In f(l-e
0
-x
) is concave on
,/\
I
Since
[In f(l-e- x »), - p, x>O
g'
(x)
= { [In
x
f(l-e »)' + p, x<O
then g(x) is also concave on (0,00).
Now suppose that g'(v ) < O.
n
g' (x)
~
g' (v n ) < 0
decreasing
for
contradicting
n=O, 1 , 2 , •••
for
>
x
all
v
Rolle's
Thus
x > vn •
•
in
n'
9 (vn+l) = O.
Next,
Since g(x)
is
g(x)
concave
is
Therefore
theorem
and
the
I xl
> vI.
for
Hence,
c
<x > 1.
is
all
of
g(x)
concavi ty
Hence,
Again by concavity, we have
that
is,
g(x) = g(v ) = 0
l
constant,
Integrating
and
(8)
the
Ix I
< 1,
(8 )
log-concavity
of
f(x)
Finally, it remains
over
entails
(x ,1) and (x ,1), applying
3
2
1
and simplifying, we even find that r = 1/2 /<x and c = <x/2.
I xl
for
(7) implies
f (x) = c (1- I x I ) <x-I , u l ~
where
for
g'(w) = 0
so that g'(v n + l ) = 0 for all n>O.
g'(x)=O
strictly
particular,
show that g'(x) has exactly one zero, w, in (Vn'v + l ).
n
O~g'(vn+l) ~
then
to be shown that
g(x)
< u ·
Since
l
g' (x) > g'(v ) = 0, O<x<v ·
l
l
is
concave
(8)
on
(1)
.(
also holds when
(O;v ),
l
then
That is, 9 is monotone increasing,
7
and we
even have
g(x)
=
g(v l )
~
0,
By
O<x<v l .
the
preceding
arguments, we know that
=
g(x)
lnf(l-e
f~(x)
where
-x
lnf~(l-e
) -
is the function
in
-x
),
x
> 0,
so g(x) < 0 is equivalent
(6),
to
f(x)
If
f~(x),
<
is
there
strict
continuity of f(x)
inequality
holds
( 9)
O<x<v l •
inequality
f~(x)
and
in
an
in
at
( 9)
=
x
to'
then
the
at to guarantees that the strict
neighborhood
open
of
Then
to·
by
integrating (9), we get
contradicting the defini tion of x
value.
Therefore
f(x)
=
f~(x)
continuity on all of (-1,1).
If
monotone
the
the
on
as the second upper letter
CO,l)
f(x)
is
increasing,
then
analogous
=
ratio
Alternatively,
f~where
r>l
note
assumed
O<~<l.
can
be
that
if
and
by
symmetry
and
log-convex
and
0
function
conclusion f
when
2
is
the
be
arguments
Even more
analyzed
X
to
is
using
random
lead
true,
to
the
similar
variable
the
case
methods.
in
our
result, and Y is defined by
\
.,,)
1
1 + X = l+Y'
then Y has range (-co,ro) and the letter values of Yare spaced
8
Then a theorem
according to a geometric progression with r>l.
analogous
to
the main
result
can
be
obtained,
characterizing
the density functions
ex
2 ( 1+ I x I )ex+ 1 '
where ex>O.
-oo<x<oo,
We leave the precise details to the reader.
TESTING FOR A COMPLETELY SPECIFIED DISTRIBUTION
4.
For an application of our results, consider the problem of
testing the hypothesis H O that a random sample Xl ,X 2 , ••• 'X n is
drawn from a continuous population whose distribution function
F(x)
F(x)
is
completely
= I(x),
the
specified.
standard
The
normal
important
case
where
distribution
function,
distribution
F(x),
has
been reviewed in [1: p. 425 ff.].
For
the
necessary
Yl
=
to
and
If F(x)
can base
x l ,x 2 ' ...
of
a
follow
the
sufficient
=
F(X l ) ' ... 'Y n
( 0,1) •
we
X.1
F(X n )
that
follow
the
the
it
transformed
uniform
is
data
distribution
on
satisfies the assumptions of our main result,
test
the
of
the
hypothesis
transformed
data.
on
Thus,
the
we
letter
would
values
fail
to
reject the null hypothesis if the spacings xk+l-x k approximate
a geometric progression wi th r = ,.
This is a rule of thumb
procedure
for
testing H is
O
testing
HO.
One
possible
test
statistic
for
(
'"
9
R
\
N
= /
(
x k +1 -x k
_ !)2,
x-x
k=-N
k k-1
where N is chosen to equal the smallest k for which x k +1 -x k is
negligible.
HO is to be rejected for large values of R.
In
work now in progress we are performing simulation studies of R
and
other
testing H •
O
statistics,
based
only
on
the
letter
values,
for
Results will appear elsewhere.
Acknowledgement.
We are grateful to David Ruppert for bringing
to our attention an error in a previous version of this work.
REFERENCES
\
•
1.
Hoaglin,
D.C., Mosteller,
F.,
and Tukey,
J .. W.,
Understanding Robust and Exploratory· Data Analysis,
New York: Wiley.
2.
Tukey, J.W.
Exploratory
Addison-Wesley.
Data
Analysis,
1977.
eds.
1983,
Reading:
© Copyright 2026 Paperzz