Biostatistics 2063, Aug 28

Bayesian Inference and Bayesian Decision Analysis
p.1
Overview of Bayesian calculation:
A) Define the problem
Goal: Get the joint distribution of everything random & relevant.
Method: Define notation, factor joint distribution, calculate each part.
Notation:
What is known
Role
X
Symbol
Interpretation Observed data;
anything else
that is known.
Joint distribution:
Unknown, we don’t care
about it
Unknown, we do care about it


Nuisance parameter,
Choice of model,
Choice of prior
Parameter,
future observable,
asymptotic observable
[ X , , ]
 [ , ][ X |  , ]
B) Apply Bayes “theorem”: Calculate the (posterior) distribution of the quantity of interest:
Role
What is known
Unknown, we don’t care
about it
Unknown, we do care about it
Symbol
What to do
with it
X
Condition on it!


Integrate it out!
Learn from posterior distr!
Make a decision!
model  prior
marginal
[ X |  ]  [ ]
[ | X ] 
[X ]
posterior 
[ | X ] 

[ X , , ]


[ X , , ]
(no nuisance )

( nuisance  is present)
,
[A note on the joint distribution of everything : we seem to be combining two different
kinds of probability. (What are they?) Whether that’s legitimate is subject to philosophical
critique and defense. If you become especially interested, see the work of Abner Shimony,
Lane and Sudderth (but not yet)]
Bayesian Inference and Bayesian Decision Analysis
p.2
C) Decision-making:
Goal: Given a posterior distribution [ | X], choose the best action to take.
Method: comparing Bayes expected loss for various actions:
C.1. Specify the loss function
, so that L( , a ) for  in  and a in A (actions
a, action space A) represents the loss ensuing if you take action a when the true state of nature is .
C.2 Calculate the Bayes expected loss:
r (a | X) = Eq |X (L(q ,a)) = ò L(q ,a)[q | X]
q
C.3 Choose Bayes action,
aB  arg min  (a | X )
All this is done after taking account all current knowledge (both current data and prior).
An entire PLAN for making decisions is called a decision rule:
For each data set
, our plan tells us what action
to take.
A decision rule that agrees with Bayes action for each X is called a Bayes rule.
Each d B (X) is a Bayes action.
Bayesian Inference and Bayesian Decision Analysis
p.3
Example: unknown is in  = {S,H}, data is in X = {P,N}, action is in A = {T,W}.
 =S
=parameter space
={unknown states of nature }
 =H
X=N
X=P
X = sample space
= {possible observed outcomes
Interpretation:
 =H
X=P
A=T
 =S
“patient is healthy”
X=N
A=W
“test is positive”
“decide to treat”
“patient is sick”
“test is negative”
“decide to wait”
Prior = disease prevalence = Pr ( =S)
Model =
{
Posterior =
[X,]
joint
joint
=
=
=
Sensitivity = Pr (X=P | 
=S)
Specificity = Pr (X=N |  =H)
{
Predictive value of P = Pr (
=S | X=P)
Predictive value of N = Pr ( =H | X=N)
[]
[X | ]
marg’l cond’l
prior
model
=
=
=
[X]
[ | X]
marg’l cond’l
normalizer posterior
Set product
X
has 4 points.
Bayesian Inference and Bayesian Decision Analysis
p.4
Loss table A     where A= { T, W }
Healthy (“null hypothesis”)
Sick (“alternate hypothesis”)
T (treat the patient)
Loss(T,H) = LTH
Loss(T,S) = 0
W (wait)
Loss(W,H) = 0
Loss(W,S) = LWS
Bayes expected loss:
E(Loss | P, T ) = LTH Pr(H | P) = LTH Pr(P | H) Pr(H) / Pr(P)
E(Loss | P, W ) = LWS Pr( S | P) = LWS Pr(P | S) Pr(S) / Pr(P)
Choose T over W (“the Bayes action is T”) if
E(Loss | P, W ) >
E(Loss | P, T ), i.e.
LWS Pr(P | S) Pr(S) / Pr(P)
> LTH Pr(P | H) Pr(H) / Pr(P)
LWS
LTH
> 1
Pr(P | S)
Pr(P | H)
Pr(S)
Pr(H)
{loss ratio} {likelihood ratio} {prior odds } > 1
Example:
1) If prevalence = 0.001, sensitivity = 0.99, specificity = 0.99, loss ratio=1,
and a “P” test result is observed,
2) then
prior odds  1000-1, likelihood ratio  100,
3) so the Bayes action is W (“watchful waiting”; do nothing for now).
[What is the posterior odds?]
A test splits a group of patients into two, Neg and Pos; when is it a USEFUL test?
When will the Bayes action be different in the two groups? Answer:
1
Pr(S| Neg) Pr(S) Pr(Neg | S) LTreat ,H
Pr(S) Pr(Pos| S) Pr(S| Pos)
=
=
<
<=
=
= PPO
NPO Pr(H | Neg) Pr(H ) Pr(Neg | H ) LWait,S
Pr(H ) Pr(Pos| H ) Pr(H | Pos)
Pr(H | Neg)
Pr(S| Pos)
, PPO =
where the negative and positive predictive odds are: NPO =
.
Pr(S| Neg)
Pr(H | Pos)
Bayesian Inference and Bayesian Decision Analysis
p.5
Setting: diagnostic or screening test with continuous values.
Goal: choose the threshold, or cutoff, for taking the action.
X
X=x0
Test result is dichotomized:
Loss table
A    ,
“negative”=N=I{X<x0} , “positive”=P=I{X>x0} .
A= { T, W }
Healthy (“null hypothesis”)
Sick (“alternate hypothesis”)
T (take an action)
Loss(T,H) = LTH
Loss(T,S) = 0
W (wait)
Loss(W,H) = 0
Loss(W,S) = LWS
“ROC curve” (receiver operating characteristic)
“True Positive Rate”
= 1 – Type II error
=Pr(Pos|S)
= sens(x0)
=1–
X= +
X= 
X=x0
“False Positive Rate”
= Type I error
=Pr(Pos|H)
= 1 – spec(x0)
=1–
Bayesian Inference and Bayesian Decision Analysis
p.6
 ( A | X )   (W | X )  ( LAH Pr( H | X )  0)  (0  LWS Pr( S | X ))
f S ( X )( )
f H ( X )(1   )
 LAH
 LWS
f H ( X )(1   )  f S ( X )
f H ( X )(1   )  f S ( X )
This is positive if
LAH f H ( X ) (1   )
 1.
LWS f S ( X ) 
The optimal cutoff x0 satisfies
LAH f H ( x0 ) (1   )
1
LIS f S ( x0 ) 
 (1  FS ( x0 )) / x
f S ( x0 ) LAH (1   )

Slope =  (1  F ( x )) / x f ( x )  L

H
0
H
0
WS
common sense check?
.
1–
X= 
X=x0
X= +
1 –
(A cute aside: pick one S person and one H person. The X value for the S person should be
larger—usually. What is the probability that this is true? Think about ROC graph.)
EXERCISE: if the two distributions are normal, what is the likelihood ratio, if the variances are
the same? different?