Composite Hypotheses and Generalized Likelihood Ratio
Tests
Rebecca Willett, 2016
In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters or other characteristics.
Here are a few motivating examples.
Example: Unknown amplitudes/delays in wireless communications.
We don’t always know how many relays a signal will go through, how strong the
signal will be at each receiver, the distance between relay stations, etc.
Example:
Unknown signal amplitudes in functional brain imaging.
Example:
Unknown expression levels in gene microarray experiments.
1
Composite Hypotheses and Generalized Likelihood Ratio Tests
1
2
Composite Hypothesis Tests
We can represent uncertainty by specifying a collection of possible models for each hypothesis. The collections are indexed by a parameter.
H0 : X ∼ p0 (x|θ0 ), θ0 ∈ Θ0
H1 : X ∼ p1 (x|θ1 ), θ1 ∈ Θ1
• In general, the distributions p0 and p1 may have different parametric forms.
• The sets Θ0 and Θ1 represent the possible values for the parameters.
• If a set contains a single element (i.e., a single value for the parameter), then we
have a simple hypothesis, as discussed in past lectures. When a set contains more
than one parameter value, then the hypothesis is called a composite hypothesis,
because it involves more than one model.
The name is even clearer if we consider the following equivalent expression for the
hypotheses above.
H0 : X ∼ p0 , p0 ∈ {p0 (x|θ0 )}θ0 ∈Θ0
H1 : X ∼ p1 , p1 ∈ {p1 (x|θ1 )}θ1 ∈Θ1
Example: Brain imaging
Recall the brain imaging problem.
H0 : X ∼ N (0, 1)
H1 : X ∼ N (µ, 1), µ > 0 but otherwise unknown
equivalently X ∼ p, p ∈ {N (µ, 1)}µ>0
In this example, H0 is simple and H1 is composite.
2
Uniformly Most Powerful Tests
Let us begin by considering special cases in which the usual likelihood ratio test is computable and optimal. Here is an example.
iid
H0 :
x1 , · · · , xn ∼ N (0, 1)
H1 :
x1 , · · · , xn ∼ N (µ, 1), µ > 0
iid
Log LRT:
n
Q
2
√1 e−(xi −µ) /2
2π
i=1
log
n
Q
i=1
2
√1 e−(xi ) /2
2π
=
n
X
−
i=1
= µ
n
X
i=1
(xi − µ)2 x2i
+
2
2
xi −
nµ2
2
Composite Hypotheses and Generalized Likelihood Ratio Tests
3
Test statistic:
µ
n
X
i=1
H1
xi ≷ γ
H0
0
⇐⇒
n
X
H1
xi ≷ γ 0 /µ = γ
i=1
H0
We were able to divide both sides by µ since µ > 0. We do not need to know the exact
H1
P
value of µ in order to compute the test i xi ≷ γ for any value of γ.
H0
Pn
Let t =
i=1 xi denote the test statistic. It is easy to determine its distribution(s)
under each hypothesis (a composite in the case of H1 ).
H0 :
H1 :
t ∼ N (0, n)
t ∼ N (nµ, n) µ > 0 unknown
Since distribution of t under H0 is known, we can choose threshold to control PF A .
√
γ
PF A = Q √
⇒ γ = nQ−1 (PF A )
n
This is optimal detector (most powerful) according to NP lemma. Several ROC curves
corresponding to different values of the unknown parameter µ > 0 are depicted below.
We cannot know which curve we are operating on, but we can choose a threshold for
a desired PF A and the resulting PD is the best possible (for the unknown value of µ).
In such cases we say that the test is uniformly most powerful, that is most powerful no
matter what the value of the unknown parameter.
Figure 1: ROC for various µ > 0 for the simple case.
Definition: Uniformly Most Powerful Test
A uniformly most powerful (UMP) test is a hypothesis test which has the greatest
power (i.e. greatest probability of detection) among all possible tests yielding a given
false alarm rate regardless of the underlying true parameter(s).
Composite Hypotheses and Generalized Likelihood Ratio Tests
3
4
Two-sided Tests
To see how special the UMP condition is, consider the following simple generalization of
the testing problems above.
H0 :
H1 :
x ∼ N (0, 1)
x ∼ N (µ, 1), µ 6= 0
The log-likelihood ratio statistic is
(x − µ)2 x2
log Λ(x) = −
+
= µx − µ2 /2
2
2
and the log-LRT has the form
H1
µ x − µ2 /2 ≷ γ 0 .
H0
We can move the term µ2 /2 to the other side and absorb it into the threshold, but this
leaves us with a test of the form
H1
µ x ≷ γ.
H0
Since µ is unknown (and not necessarily positive) the test is uncomputable.
How can we proceed? Look at two densities in the microarray experiment. Intuitively
H1
the test |x| ≷ γ seems reasonable. This is called the Wald Test. The PF A of the Wald
H0
test can be seen below.
PF A = 2Q(γ) ⇒ γ = Q−1 (PF A /2)
Z ∞
Z −γ
1 −(x−µ)2 /2
1
2
√ e
√ e−(x−µ) /2 dx y = x − µ
PD =
dx +
2π
2π
γ
−∞
Z ∞
Z −γ−µ
=
N (0, 1)dy +
N (0, 1)dy
γ−µ
−∞
= Q(γ − µ) + Q(γ + µ) .
Composite Hypotheses and Generalized Likelihood Ratio Tests
5
Figure 2:
The PD depends on µ, which is unknown.
Model µ as a deterministic, but unknown, parameter. Estimate µ from the data and
plug the estimate into the LRT. Under H1 the distribution is X ∼ N (µ, 1), so a natural
estimate for µ is µ
b = x, the observation itself. The plugging this into the likelihood ratio
yields
p(x|b
µ)
exp(−(x − µ
b)2 /2)
2
b
Λ(x)
=
=
= ex /2 .
2
p(x|0)
exp(−x /2)
This is the generalized likelihood ratio. In effect, this compares the best fitting model in
the composite hypothesis H1 with the model H0 . Taking the log yields the test
H1
b
log Λ(x)
= x2 ≷ γ ,
H0
which is equivalent to the Wald test.
4
The Generalized Likelihood Ratio Test (GLRT)
Consider a composite hypothesis test of the form
H0 : X ∼ p0 (x|θ0 ), θ0 ∈ Θ0
H1 : X ∼ p1 (x|θ1 ), θ1 ∈ Θ1
The parametric densities p0 and p1 need not have the same form.
The generalized likelihood ratio test (GLRT) is a general procedure for composite
testing problems. The basic idea is to compare the best model in class H1 to the best in
H0 , which is formalized as follows.
Definition: Generalized Likelihood Ratio Test (GLRT)
Composite Hypotheses and Generalized Likelihood Ratio Tests
The GLRT based on an observation x of X is
b
Λ(x)
=
θ1 ∈Θ1
max p1 (x|θ1 )
H1
max p0 (x|θ0 )
H0
θ0 ∈Θ0
≷ γ,
or equivalently
H1
b
log Λ(x)
≷ γ.
H0
Example:
We observe a vector X ∈ Rn , and consider two hypotheses:
H0 :X ∼ N (0, σ 2 In )
H1 :X ∼ N (Hθ, σ 2 In ).
where σ 2 > 0 is known, θ ∈ Rk is unknown, and H ∈ Rn×k is known and full rank.
The mean vector Hθ is a model for a signal that lies in the k-dimensional subspace
spanned by the columns of H (e.g., a narrowband subspace, polynomial subspace,
etc.). In other words, the signal has the representation
s=
k
X
θi hi , H = [h1 , . . . , hk ].
i=1
The null hypothesis is that no signal is present (noise only).
The log likelihood ratio is
1
>
>
(x
−
Hθ)
(x
−
Hθ)
−
x
x
2σ 2
1
= − 2 (−2θ> H > x + θ> H > Hθ)
σ
Λ(x) = −
Thus we may consider the test
H1
θ> H > x ≷ γ,
H0
but this test is not computable without knowledge of θ. Recall that
H1 : x ∼ N (Hθ, σ 2 I), θ ∈ Rk
⇐⇒
H1 : x ∼ p1 , p1 ∈ {N (Hθ, σ 2 I)}θ∈Rk .
We want to pick p1 in {N (Hθ, σ 2 I)} that matches x the best.
1
1
>
p(x|θ, H1 ) =
exp − 2 (x − Hθ) (x − Hθ)
(2πσ 2 )n/2
2σ
6
Composite Hypotheses and Generalized Likelihood Ratio Tests
7
Find θ that maximizes the likelihood of observing x:
θ̂ = arg min(x − Hθ)> (x − Hθ).
{z
}
θ∈Rk |
||x−Hθ||2
Taking the gradient with respect to θ
∂ >
(x x − 2θ> H > x + θ> H > Hθ) = 0
∂θ
⇒ 0 − 2H > x + 2H > Hθ = 0
⇒ θb = (H > H)−1 H > x
Plugging θ̂ into the test statistic θ> H > x, we have
H1
θ̂> H > x = x> H(H > H)−1 H > x ≷ γ
H0
Recall that the projection matrix onto the subspace is defined as PH := H(H > H)−1 H >
b
log Λ(x)
=
1 >
1
x PH x = 2 kPH xk22 .
2
2σ
2σ
Thus the GLRT computes the energy in the signal subspace and if the energy is large
enough, then H1 is accepted. In other words, we are taking the projection of x onto
H and measuring the energy. The expected value of this energy under H0 (noise
only) is
EH0 kPH Xk22 = kσ 2 ,
since a fraction k/n of the total noise energy nσ 2 falls into this subspace.
The performance of the subspace energy detector can be quantified as follows.
We choose a γ for the desired PF A :
H1
1 >
x
P
x
≷γ
H
σ2
H0
What is the distribution of x> PH x under H0 ? First use the decomposition
PH = U U >
where U ∈ Rn×k with orthonormal columns spanning columns of H, and let y := U > x.
Then
1
1
1 >
x PH x = 2 x > U U > x = 2 y > y
2
σ
σ
σ
y ∼N (0, σ 2 U > U ) ≡ N (0, σ 2 Ik×k )
iid
yi /σ ∼N (0, 1)
>
⇒
, i = 1, . . . , k
y y
∼χ2k , chi-squared with k degrees of freedom
σ2
Composite Hypotheses and Generalized Likelihood Ratio Tests
Under H0 ,
1 >
x PH x ∼ χ2k
σ2
=⇒
8
PF A = P(χ2k > γ)
To calculate the tails on χ2k distributions you can software such as Matlab
(chi2cdf(x,k), chi2inv(γ,k), chi2cdf(x,k)). Remember the mean of a χ2k distribution
is k, so we want to choose a γ bigger than k to produce a small PF A .
5
Wilks’ Theorem
Wilk’s Theorem (1938)
Consider a composite hypothesis testing problem
iid
H0 : X1 , X2 , ..., Xn ∼ p(x|θ0 ),
where θ0,1 , . . . , θ0,` ∈ R are free parameters and
θ0,`+1 = a`+1 , . . . , θk = ak are fixed at the values
a`+1 , . . . , ak
iid
H1 : X1 , X2 , ..., Xn ∼ p(x|θ1 ), θ1 ∈ Rk are all free parameters
and the parametric density has the same form in each hypothesis.
In this case family of models in H0 is a subset of those in H1 , and we say that
the hypotheses are nested. (This is a key condition that must hold for this theorem.)
st
nd
h If the 1 i and 2 order derivatives of p(x|θi ) with respect to θi exist and if
p(x|θi )
E ∂ log∂θ
= 0 (which guarantees that the MLE θbi → θi as n → ∞), then the
i
generalized likelihood ratio statistic, based on an observation X = (X1 , . . . , Xn ),
max p(x|θ1 )
b n (X) =
Λ
θ1
max p(x|θ0 )
θ0
(1)
Composite Hypotheses and Generalized Likelihood Ratio Tests
9
has the following asymptotic distribution under H0 :
n→∞ 2
b
2 log Λ(x)
∼ χk−`
D
b
2 log Λ(x)
→ χ2k−`
i.e.,
Proof: (Sketch) under the conditions of the theorem, the log GLRT tends to the log
GLRT in a Gaussian setting according to the Central Limit Theorem (CLT).
Example: Nested Condition
iid
H0 :xi ∼ N (0, 1)
iid
H1 :xi ∼ N (0, σ 2 ), i = 1, 2, . . . , n, σ 2 > 0 unknown
log LR:
n X
1
1
1
2
2
− log σ − xi
−
2
2
2σ
2
i=1
MLE of σ 2 :
n
1X 2
x
σb2 =
n i=1 i
log GLR under H0 :
"
X 1
2
− log
2
n
1X 2
x
n i=1 i
!
x2
− i
2
1
n
1
Pn
i=1
x2i
#
n→∞ 2
∼ χ1
−1
© Copyright 2026 Paperzz