IMST 2008 / FIM XVI - Marquette University

Decision theoretic Bayesian hypothesis testing
with focus on skewed alternatives
By
Naveen K. Bansal
Ru Sheng
Marquette University
Milwaukee, WI 53051
Email: [email protected]
http://www.mscs.mu.edu/~naveen/
IMST 2008 / FIM XVI
Probabilit y Model : P( , )
H 0 :  0 (say, 0) vs H 1 :  0 , H1 :  0
Directional Error (Type III error):
Sarkar and Zhou (2008, JSPI)
Shaffer (2002, Psychological Methods)
Finner ( 1999, AS)
Lehmann (1952, AMS; 1957, AMS)
Main points of these work is that if the objective is to find the true
Direction of the alternative after rejecting the null, then a Type III error must be also
controlled.
Type III error is defined as P( false directional error if the null is rejected). The
traditional method does not control the directional error. For example,
| t | t / 2 , and t  t / 2 , an error occurs if   0.
IMST 2008 / FIM XVI
Probabilit y Model : P( , )
H 0 :   0 ( say , 0) , H 1 :   0 , H1 :   0
 ( )  p1 1( )  p0 0 ( )  p11( )
where
 -1( )  g 1( ) I (  0),  0 ( )  I (  0), g1( ) I (  0)
g 1()  density with support contained in (-,0)
g1()  density with support contained in (0, )
A  {1,0,1}, L( , a)  loss for taking action a  A.
IMST 2008 / FIM XVI
The Bayes risk for a decision rule  is given by
r  ( )  p1r1 ( 1 )  p0 r0 (0)  p1r1 ( 1 )
where
r1 ( 1 )    0 R ( ,  ) 1 ( )d
r1 ( 1 )    0 R ( ,  ) 1 ( )d
r0 (0)  R (0,  )
For a fixed prior  , decision rules can be compared by comparing the space
S( )  {( r1 ( 1 ), r0 (0), r1 ( 1 )) :  D*}
consider the class of all rules  for which R (0,  ) (Type I error) are the same.
slope depends
on p1 and p1
r1 ( )
Bayes
Rule
Bayes
Rule
1
p1  p1
p  p1
r1 ( )
IMST 2008 / FIM XVI
Consider t he following two priors
 ( )  p-1 1( )  p0 0 ( )  p11( ),
 ' ( )  p-'1 1( )  p0 0 ( )  p1'  2 ( )
p  p' (i.e. H is more likelely under  in comparison to  ' )
1 1
1
' are the Bayes rules under 
Theorem 1: If p  p' , and if  B and  B
1 1
and  ' respective ly, then
'
B
B
r ( )  r ( ' )
1
1
and
'
B
B
r ( )  r ( ' )
1
1
Remark : This theorem implies that if apriori it is known tha t H
is
1
more likely th an H ( p  p ), then th e average risk of the Bayes rule
1 1 1
in the negative direction will be smaller t han average risk in the positive
direction.
IMST 2008 / FIM XVI
Relationship with the False Discovery Rates
Consider t he "0 - 1" loss
0  0
L( ,-1)  
1   0
0  0
L( ,0)  
1  0
0  0
L( ,1)  
1  0
 P (  0)   0

R ( ,  )   P (  0)   0
 P (  0)   0
 
r1B ( )  
 0
P (  0) 1 ( )d  E[ P (  0 |   0)]
 1  E[ P (  0 |   1)]
which is expected false non - negative discovery rate in the Bayesian framwork.
Similarly r1 B ( ) can be interprete d as expected false non - positive discovery rate, and
r0 B ( ) as expected false non - dicovery rate. r  B ( ) is the weighted average of these false
discoverie s.
IMST 2008 / FIM XVI
Multiple Hupotheses Problem
{Xi , I 1,2,..., m} independent data with
X ~ f (xi ;i , ), i  1,2,..., m
i
H i :i  0, H i :i  0,
0
1
H i :i  0
1
i ~ p1 1( )  p0 0 ( )  p11( ), i 1,2,..., m independent
 i {1, 0,1} non - randomized Bayes rule
B
IMST 2008 / FIM XVI
An Application:
An experiment was conducted to see the effect of a
sequence knockdown from non-coding genes.
Hypothesis was that this knockdown will cause the
overexpression of the mRNAs of the coding genes.
The implication of this is that this will shows how the
non-coding genes interact with coding genes. In other
words, non-coding genes also play a part in protein
synthesis.
Here it can be assumed that that the effect of
knockdown on the coding genes (if there is any) would
be mostly overexpression of mRNAs than
underexpression of mRNAs. The objective would be to
detect as many of overexpressed genes as possible.
IMST 2008 / FIM XVI
Table 1
Possible outcomes from m hypothesis tests
Accept H 0
H 0 is true
H 1 is true
H1 is true
Total
FDR1  E[
V1  W1
]
R1
pFDR1  E[
Accept H 1
Accept H1
Total
U
V1
V2
m0
T1
S1
W2
m1
T2
W1
S2
m2
R0
R1
R2
m
FDR1  E[
V1  W1
| R1  0]
R1
V2  W2
]
R2
pFDR1  E[
V2  W2
| R2  0]
R2
( Storey, 2003 AS )
IMST 2008 / FIM XVI
Theorem 2 : Suppose that m hypothesis tests are based
on test statistics T ,T ,...,Tm. Let  and  be the regions
1
1
1 2
of accepting H i and H i , respectively, for each i  1,2,..., m.
1
1
iid
Assume that Ti | H i ~ I ( H i ) F  I ( H i ) F  I ( H i ) F , where
1 1
0 0
1 1
F , F , F are the marginal distributi ons of Ti wrt  , and
1 0
1 0 1
1 respectively. If ( H i , H i , H i ), i 1,2,..., m are independent and
1 0 1
identical trinomia l trials, then
pFDR ()  P( H | T  )  P( H | T  )
1
1
1
0
1
and
pFDR ()  P( H | T  )  P( H | T  )
1
1
1
0
1
IMST 2008 / FIM XVI
This theorem and our previous discussion on the Bayes decision
rule implies that under the independent hypotheses testing
setting, the Bayes rule obtained from a single test problem
H :  0 vs. H :  0, H :  0,
0
1
1
which minimizes the average expected false discoveries, can be
used under multiple hypotheses problem. The false discovery
rates pFDR and pFDR of this procedure can be obtained by
1
2
computing the posterior probabilities.
pFDR ( )  P( H |   1)  P( H |   1)
1 B
0 B
1 B
pFDR ( )  P( H |   1)  P( H |   1)
1 B
0 B
1 B
The prior probabilities p1 and p1 will determine whether
the positive hypothesis H1 or the negative hypothesis H 1
will be more frequently correctly detected.
IMST 2008 / FIM XVI
Computation of Bayes Rule:
 B ( )  {i (1,0,1) : E[L( ,i) | X] 
min E[ L( , j) | X]}
j  1,0,1
 ( | X)   ( H 1 | X) ( | H 1, X)   (H 0 | X) I (  0)   ( H1 | X) ( | H1, X)
Here  ( H
| X),  ( H | X), and  ( H | X) are the posterior probabilit ies
1
0
1
of H , H , and H respectively; and  ( | H , X) and  ( | H , X)
1 0
1
1
1
are the posterior densities of  under  ( ) and  ( ) respectively.
1
1
Computation parts invole computing  ( H
| X),  ( H | X), and  ( H | X)
1
0
1
and computing E[ L( ,1) | H , X] and E[ L( ,1) | H , X].
1
1
IMST 2008 / FIM XVI
Choice of the loss function:
0
 0
L( ,0)  q ( )   0 ,
 0

0
L( ,1)  q ( )
 1

0
 0
 0

,
L
(

,
1
)


 0
q ( )   0

1
For the "0 -1" loss, q ( )  q ( )  q ( )  1.
0
1
1
A good choice would be
q ( )  l (1 q( )),
0
0
q ( )  l (1 q( )),
1
1
q ( )  l (1 q( ))
1
1
where l , l and l are some non - negative constants.
0 1
1
Here, q( ) reflect the magnitude of departure from the null. One
good choice of q( ) is
q( )  E [log( f ( X ,0) / f ( X , ))]

IMST 2008 / FIM XVI
If q( )  q ( )  q ( ) 1, then we have a "0 -1" loss.
1
1
Theorem 3 : Under t he "0 -1" loss, the Bayes rule is given by
 B  {i : pi f (X | Hi )  max( p j f (X | H j ), j  1,0,1)}
Here f (X | H j ) is the marginal density X under H j .
In other words, reject H
0
if
f (X | H ) p
f (X | H )
0  1 and
0 
f (X | H ) p
f (X | H )
1
0
1
Otherwise, select H
p
1.
p
0
( H ) if
1 1
p f (X | H )  () p f (X | H )
1
1
1
1
IMST 2008 / FIM XVI
Example :
Probabilit y Model : N ( ,1)
H :   0 vs. H :  0, H :  0
0
1
1
Prior :  ( )  p  ( )  p I (  0)  p  ( )
1 1
0
11
 1 : left - half N (0,1) distributi on
1 : right - half N (0,1) distributi on
IMST 2008 / FIM XVI
Theorem 4 : Under the "0 -1" loss, the Bayes rule reject H
0
if
T (x)  p / p
and T ( x )  p / p ,
1
0 1
1
0 1
otherwise select H
( H ) if p T ( x )  () p T ( x ), where
1 1
1 1
11
2
n2 x2
nx
T ( x) 
exp(
)(
)
1
n 1
2(n 1)
n 1
and
2
n2 x2
nx
T ( x) 
exp(
)(
)
1
n 1
2(n 1)
n 1
Note that T and T are monotonically decreasing and increasing
1
1
functions, respectively. Thus the rejection region is equivalent to
x  T 1( p / p ) and x  T 1( p / p )
0 1
1 0 1
1
IMST 2008 / FIM XVI
Power Comparison:
When p  p , the rejection region is UMP unbiased.
1 1
Bayes test with p  0.39, p  0.39, p  0.22 : blue line)
1
0
1
UMP unbiased test : red dotted line
1
0.9
0.8
prob of rejection
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
mean value
0.2
0.3
0.4
0.5
IMST 2008 / FIM XVI
Bayes test with p  0.22, p  0.39, p  0.39 : blue line)
1
0
1
UMP unbiased test : red dotted line
1
0.9
0.8
prob of rejection
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
mean value
0.2
0.3
0.4
0.5
IMST 2008 / FIM XVI
What if p
, p , and p are unknown?
1 0
1
One can consider a Dirichlet prior. It is easy to see that the only
difference would be to replace p , p , and p by E ( p | X ),
1 0
1
1
E ( p | X ), and E ( p | X ) respectively.
0
1
For the normal problem, we get the following result.
Theorem 5 : Under the "0 -1" loss, the Bayes rule reject H
0
if
E ( p | X)
E ( p | X)
0
0
T (x) 
and T ( x ) 
,
1
1
E ( p | X)
E ( p | X)
1
1
otherwise select H
( H ) if E( p | X)T ( x )  () E ( p | X)T ( x ), where
1 1
1
1
1
1
IMST 2008 / FIM XVI
Another Approach:
Consider t he prior
 ( )  p0 I (  0)  (1 p0 ) '( ),
where  '( ) is a skewed prior; for example, skew - normal
2 

 '( )   ( )( ) I (  0).
 

Here  reflects the skewness in  values, and  the
dispersion in  values.
IMST 2008 / FIM XVI
Power Comparison of a skew-normal Bayes
with the UMP test
1
0.9
0.8
prob of rejection
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
mean value
0.2
0.3
0.4
0.5
IMST 2008 / FIM XVI
IMST 2008 / FIM XVI