4 Invariant Statistical Decision Problems

4
Invariant Statistical Decision Problems
4.1
Invariant decision problems
Let G be a group of measurable transformations from the sample space X into
itself. The group operation is composition. Note that a group should include
the identity transformation e and the inverse g −1 , such that g −1 g = e.
Consequently, all transformations are one-to-one.
Definition 30. The family of distributions Pθ , θ ∈ Θ, is said to be invariant under the group G, if for every g ∈ G and every θ ∈ Θ there exists a
unique θ0 ∈ Θ such that the distribution of g(X) is given by Pθ0 whenever
the distribution of X is given by Pθ .
This unique θ0 is denoted by ḡ(θ). The meaning of the definition is that
for every real-valued integrable function φ
Eθ φ(g(X)) = Eḡ(θ) φ(X).
Definition 31. A parameter θ is said to be identifiable if distinct values of
θ correspond to different distributions.
If the family of distributions is invariant under G, then the unicity of θ0
implies that θ is identifiable.
Lemma 8. If a family of distributions Pθ , θ ∈ Θ, is invariant under G, then
Ḡ = {ḡ : g ∈ G} is a group of transformations of Θ into itself.
Definition 32. A decision problem, consisting of the game (Θ, A, L) and
the distributions Pθ over X is said to be invariant under the group G if the
family of distributions is invariant and if the loss function is invariant under
G in the sense that for every g ∈ G and a ∈ A there exist a unique a0 ∈ A
such that
L(θ, a) = L(ḡ(θ), a0 ) for all θ ∈ Θ.
Denote the unique a0 by g̃(a).
Lemma 9. If a decision problem is invariant under a group G, then G̃ =
{g̃ : g ∈ G} is a group of transformations of A into itself.
Example 14. Consider the shift group in the normal estimation problem
with L(θ, a) = (a − θ)2 . Here gc (x) = x + c. Thus, ḡc (θ) = θ + c and
g̃c (a) = a + c.
14
Example 15. Assume X is binomial. Let L(θ, a) = W (θ − a), for some
even function W . Let G = {e, g}, where g(x) = n − x. The distribution of
g(X) is B(n, 1 − θ). Thus, ḡ(θ) = 1 − θ and g̃(a) = 1 − a.
Example 16. Let X ∼ N (θ1 1, θ22 I) and Θ = {θ = (θ1 , θ2 ) : θ2 > 0}. Let
A = R and let L(θ, a) = (a − θ1 )2 /θ22 . Consider transformations of the
6 0. Then ḡb,c (θ) = (bθ1 + c, |b|θ2 ) and
form gb,c (x) = bx + c1, where b =
g̃b,c (a) = ba + c.
Example 17. Consider again the situation in Example 16. If A = {0, 1}
and we take

1 if θ1 > 0
L(θ, 0) =
0, if θ1 ≤ 0,

1 if θ1 ≤ 0
L(θ, 1) =
0, if θ1 > 0.
For gb (x) = bx, b > 0, then ḡb (θ) = bθ and g̃b (a) = a.
4.2
Invariant decision rules
Definition 33. Given an invariant decision problem, a non-randomized
decision rule d ∈ D is said to be invariant under G if for all x ∈ X and all
g∈G
d(g(x)) = g̃(d(x)).
A randomized decision rule is invariant if it is a mixture of invariant decision
rules.
Theorem 14. The risk of an invariant decision rule is constant over the
orbits of the group Ḡ.
4.3
Location and scale parameters
Definition 34. A real parameter θ ∈ Θ is said to be a location parameter
for the distribution of a random variable X if Fθ (x) is a function of x − θ
only.
Lemma 10. For the location parameter:
1. θ is a location parameter for the distribution of X if, and only if, the
distribution of X − θ is independent of θ.
15
2. If the distribution of X are absolutely continuous with density fθ (x),
then θ is a location parameter if, and only if, fθ (x) = f (x − θ), for
some density function f (x).
Example 18. The normal mean (known variance), the Cauchy α (known
β), the U (θ, θ + 1) are all examples of location parameters.
Definition 35. A real parameter θ ∈ Θ is said to be a scale parameter for
the distribution of a random variable X if Fθ (x) is a function of x/θ only.
Lemma 11. For the scale parameter:
1. θ is a location parameter for the distribution of X if, and only if, the
distribution of X/θ is independent of θ.
2. If the distribution of X are absolutely continuous with density fθ (x),
then θ is a location parameter if, and only if, fθ (x) = (1/θ)f (x/θ), for
some density function f (x).
Example 19. The normal N (θµ, θ2 ), the Cauchy β (known α), the U (0, θ),
the β in a Gamma distribution (known α) are all examples of location parameters.
One can combine both definitions and get a location-scale family with
parameters (µ, θ). Note that the distribution of X − µ is independent of µ,
but the distribution of X/σ is not independent of σ.
Lemma 12. If every nonrandomized invariant rule is an equalizer rule (that
is, it has a constant risk), the nonrandomized invariant decision rules form
an essentially complete class of all randomized invariant rules.
Assume Θ = A = R and let L(θ, a) = L(a − θ).
Theorem 15. In the problem of estimating a location parameter with loss
L(θ, a) = L(a − θ), if E0 (X − b) exists and is finite for some b and if there
exists b0 such that
E0 L(X − b0 ) = inf E0 L(X − b),
b
(1)
(over b for which the expectation exists) then d0 (x) = x−b0 is a best invariant
rule. It has a constant risk equal to (1).
Example 20. If L(θ, a) = (a − θ)2 and X has a finite variance, then b0 =
E0 (X). If L(θ, a) = |a − θ| and X has a finite first moment, then b0 is the
median of X under P0 .
16
Example 21. Let Pθ (X = θ + 1) = 1 − Pθ (X = θ − 1) = 0.5 and let

|a − θ| if |θ − a| ≤ 1,
L(θ, a) =
1
if |a − θ| > 1.
The the best invariant rules are not admissible.
Example 22. In the multidimensional normal location estimation problem
the vector of means is the best invariant estimator. However, it is not admissible for dimension higher than 2, since the estimator:

k−2
d(X) = X̄ 1 −
kX̄k2
is better (k is the dimension).
However, typically the best invariant estimate is minimax.
Theorem 16. If an equalizer rule is extended Bayes, it is also a minimax
rule.
Which leads to:
Theorem 17. Under the conditions of Theorem 15, if L is bounded below
and if for every > 0 there exists N such that
E0 L(X − b)I{|X|≤N } ≥ inf E0 L(X − b) −
b
for all b, then the best invariant rule is minimax.
An example of a location parameter is the normal mean:
Theorem 18. If X1 , . . . , Xn is a sample from a normal distribution with
mean θ and known variance σ 2 , then X̄ is a best invariant estimate of θ and
a minimax estimate of θ, provided that the loss function is a nondecreasing
function of |a − θ| and that E0 L(X̄) exists and is finite.
4.4
Estimation of a distribution function
Let X1 , . . . , Xn be a sample from a continuous distribution F . We estimate
the distribution with F̂ , which is continuous from the right. Two commonly
used loss functions are:
L1 (F, F̂ ) = sup |F (x) − F̂ (x)|,
x
17
(2)
and
L2 (F, F̂ ) =
Z
(F (x) − F̂ (x))2 F (dx),
(3)
A sufficient statistic is (X(1) , . . . , X(n) ), the order statistics. Let us reduce the collection rules further by considering only invariant decision rules.
Consider the group of transformations
G = {gψ : gψ (x(1) , . . . , x(n) ) = (ψ(x(1) ), . . . , ψ(x(n) ))}
for all ψ continuous and strictly monotone. Then ḡψ (F )(x) = F (ψ −1 (x))
and g̃ψ (F̂ )(x) = F̂ (ψ −1 (x)) for both losses.
As part of the homework assignment you should prove that the invariant
decision rules have the form
F̂ (x) =
n
X
ui I[X(i) ,X(i+1) ) (x),
i=0
for −∞ = X(0) < X(1) < · · · < X(n) < X(n+1) = ∞. It turns out that for
the second loss function and for invariant rules:
Z F (X(i+1) )
n
X
(t − ui )2 dt.
R(F, F̂ ) =
E
i=0
F (X(i) )
This function is minimized for ui = (i + 1)/(n + 2).
18

Download Report

4 Invariant Statistical Decision Problems

Paperzz.com

Your Paperzz