Medical Informatics (A)

Medical Decision-Support Systems
Probabilistic Reasoning in
Diagnostic Systems
Yuval Shahar, M.D., Ph.D.
Reasoning Under Uncertainty
in Medicine
• Uncertainty is inherent to medical reasoning
– relation of diseases to clinical and laboratory
findings is probabilistic
– Patient data itself is often uncertain with respect
to value and time
– Patient preferences regarding outcomes vary
– Cost of interventions and therapy can change
Probability:
A Quick Introduction
• Probability function, range: [0, 1]
• Prior probability of A, P(A): with no new
information (e.g., no patient information)
• Posterior probability of A: P(A) given certain
information (e.g. laboratory tests)
• Conditional probability: P(B|A)
• Independence of A, B: P(B) = P(B|A)
• Conditional independence of B,C, given A:
P(B|A) = P(B|A & C)
– (e.g., two symptoms, given a specific disease)
Probabilistic Calculus
• P(not(A)) = 1-P(A)
• In general:
– P(A & B) = P(A) * P(B|A)
• If A, B are independent:
– P(A & B) = P(A) * P(B)
• If A, B are mutually exclusive:
– P(A or B) = P(A) + P(B)
• If A,B not mutually exclusive , but independent:
– P(A or B) = 1-P(not(A) & not(B)) = 1-(1-P(A))(1-P(B))
Test Characteristics
Disease Disease
present
Test result
True
positive
Positive
(TP)
False
negative
Negative
(FN)
TP+FN
Disease
absent
Total
False
positive
(FP)
True
negative
(TN)
FP+TN
TP+FP
FN+TN
Test Performace Measures
• The gold standard test: the procedure that defines presence or
absence of a disease (often, very costly)
• The index test: The test whose performance is examined
• True positive rate (TPR) = Sensitivity:
– P(Test is positive|patient has disease) = P(T+|D+)
– Ratio of number of diseased patients with positive tests to total
number of patient: TP/(TP+FN)
• True negative rate (TNR) = Specificity
– P(Test is negative|patient has no disease) = P(T-|D-)
– Ratio of number of nondiseased patients with negative tests to
total number of patients: TN/(TN+FP)
Test Predictive Values
• Positive predictive value (PV+) = P(D|T+)
= TP/(TP+FP)
• Negative predictive value (PV-) = P(D-|T-)
= TN/(TN+FN)
Lab Tests: What is “Abnormal”?
The Cut-off Value Trade off
• Sensitivity and specificity depend on the cut off
value between what we define as normal and
abnormal
• Assume high test values are abnormal; then,
moving the cut-off value to a higher one increases
FN results and decreases FP results (i.e. more
specific) and vice versa
• There is always a trade off in setting the cut-off
point
Receiver Operating Characteristic
(ROC) Curves: Examples
Receiver Operating Characteristic
(ROC) Curves: Interpretation
• ROC curves summarize the trade-off
between the TPR (sensitivity) and the false
positive rate (FPR) (1-specificity) for a
particular test, as we vary the cut-off
treshold
• The greater the area under the ROC curve,
the better (more sensitive, more specific)
Bayes Theorem
P(A & B)  P(A)P(B | A)  P(B) P(A | B),
P( B) P( A | B)
 P( B | A) 
P( A)
P(disease | test : positive)  P( D | T )
P( D) P(T  | D)
P( D) P(T  | D)


P(T )
P( D) P(T  | D)  P( D )( P(T  | D )
P( D) P(T  | D)

P( D) * Sensitivity  (1  P( D)) * FPR
Odds-Likelihood (Odds Ratio)
Form of Bayes Theorem
• Odds = P(A)/(1-P(A)); P = Odds/(1+Odds)
• Post-test odds = pretest odds * likehood ratio
P( D | T )
P( D) P(T  | D)

*
P( D  | T ) P( D ) P(T  | D )
Likelihood  ratio ( LR ) 
P(T  | D) TPR

P(T  | D ) FPR
Negative  likelihood  ratio ( LR ) 
P(T  | D) FNR

P(T  | D ) TNR
Application of Bayes Theorem
• Needs reliable pre-test probabilities
• Needs reliable post-test likelihood ratios
• Assumes one disease only (mutual exclusivity of
diseases)
• Can be used in sequence for several tests, but only if
they are conditionally independent given the disease;
then we use the post-test probability of Ti as the pretest probability for Ti+1 (Simple, or Naïve, Bayes)
P ( D |  Ti )
P( D)

LRi

P( D  |  Ti ) P( D ) i 1.. n
i
i
Relation of Pre-Test and
Post-Test Probabilities
Example:
Computing Predictive Values
• Assume P(Down Syndrom):
– (A) 0.1% (age 30)
– (B) 2% (age 45)
• Assume amniocentesis with Sensitivity of
99%, Specificity of 99% for Down Syndrom
• PV+ = P(DS|Amnio+)
• PV- = P(DS-|Amnio-) = 99.999%
Predictive Values:
Down Syndrom
( A) P( DS )  0.1%
0.001* 0.99
0.00099
PV  

 0.0901
0.001* 0.99  0.999 * 0.01 0.00099  0.00999
PV -  P(DS- | Amnio-)  99.999%
(B)P(DS)  2%
0.02 * 0.99
PV  
 0.66891
0.02 * 0.99  0.98 * 0.01
PV -  0.99979
Example: de Dombal’s System (1972)
•
•
•
•
•
Domain: Acute abdominal pain (7 possible diagnoses)
Input: Signs and symptoms of patient
Output: Probability distribution of diagnoses
Method: Naïve Bayesian classification
Evaluation: an eight-center study involving 250 physicians and
16,737 patients
• Results:
–
–
–
–
Diagnostic accuracy rose from 46 to 65%
The negative laparotomy rate fell by almost half
Perforation rate among patients with appendicitis fell by half
Mortality rate fell by 22%
• Results using survey data consistently better than the clinicians’
opinions and even the results using human probability estimates!
Decision Trees
• A convenient way to explicitly show the
order and relationships of possible
decisions, uncertain outcomes of decisions ,
and outcome utilities
• Enable computation of the decision that
maximizes expected utility
Decision Trees Conventions
Decision node
Chance node
Information link
Influence link
A Generic Decision Tree
Decision Trees: an HIV Example
Decision
node
Chance
node
Computation With Decision Trees
• Decision trees are “folded back” to the top
most (leftmost, or initial) decision
• Computation is performed by averaging
expected utility recursively over tree
branches from right to left (bottom up),
maximizing utility for every decision made
and assuming that this is the expected utility
for the subtree that follows the computed
decision
Influence Diagrams:
Node Conventions
Chance node
Decision node
Utility node
Link Semantics
in Influence Diagrams
Dependence link
Information link
Influence link
Influence Diagrams:
An HIV Example
The Structure of Influence Diagram Links
Belief Networks
(Bayesian/Causal Probabilistic/Probabilistic Networks, etc)
Influence diagrams without decision and utility nodes
Disease
Gender
Sinusitis
Fever
Runny
nose
Headache
Link Semantics in Belief Networks
Dependence
Independence
B
A
C
Conditional
independence of B
and C, given A
Advantages of Influence
Diagrams and Belief Networks
• Excellent modeling tool that supports acquisition
from domain experts
– Intuitive semantics (e.g., information and influence links)
– Explicit representation of dependencies
– very concise representation of large decision models
• “Anytime” algorithms available (using probability
theory) to compute the distribution of values at any
node given the values of any subset of the nodes
(e.g., at any stage of information gathering)
• Explicit support for value of information
computations
Disadvantages of Influence
Diagrams and Belief Networks
• Explicit representation of dependencies often
requires acquisition of joint probability
distributions (P(A|B,C))
• Computation in general intractable (NP hard)
• Order of decisions and relations between
decisions and available information might be
obscured
Value of Information (VI)
• We often need to decide what would be the next best
piece of information to gather (e.g., within a diagnostic
process); that is, what is the best next question to ask
(e.g., what would be the result of a urine culture?)
• The Value of Information (VI) of feature f is the
marginal expected utility of an optimal decision made
knowing f, compared to making it without knowing f
• The net value of information (NVI) of f = VI(f)-cost(f)
• NVI is highly useful for a hypothetico-deductive
diagnostic approach to decide what would be the next
information item, if any, to investigate
Examples of Successful BeliefNetwork Applications
• In clinical medicine:
– Pathological diagnosis at the level of a
subspecialized medical expert (Pathfinder)
– Endocrinological diagnosis (NESTOR)
• In bioinformatics:
– Recognition of meaningful sites and features in
DNA sequences
– Educated guess of tertiary structure of proteins
The Pathfinder Project
(Heckerman, Horvitz, Nathwani 1992)
• Task and domain: Diagnosis of lymph node biopsy,
an important medical problem
– Large difference between expert and general pathologist
opinions (almost 65%!)
• Problems in the domain include
– Misrecognition of features (information gathering)
– Misintegration of evidence (information processing)
• The Pathfinder project focused mainly on assistance
in information processing
• A Stanford/USC collaboration; eventually
commercialized as Intellipath, marketed by the ACP,
used as early as 1992 by at least 200 pathology sites
Pathfinder Domain
• More than 60 diseases
• More than 130 findings, such as:
–
–
–
–
–
Microscopic
immunological
molecular biology
Laboratory
Clinical
• Commercial product extended to at least 10 more
medical domains
Pathfinder I/O behavior
• Input: set of <Feature, Instance> (<Fi, Ii>) pairs
(e.g., <NECROSIS, ABSENT>
– Instances are mutually exclusive values of each feature
– Prior probability of each disease Dk is known
– P(F1I1, F2I2…FtIt | Dk,x) is in acquired knowledge base
• Output: P(Dk|F1I1, F2I2…FmIm,x)
 x = background knowledge (context)
• User can ask what is the next best (cost-effective)
feature to investigate or enter
- Probabilistic (decision-theoretic) hypothethico-deductive
approach
• Distribution of each Dk is updated dynamically
Pathfinder Methodology:
Probabilities and Utilities
• Decision-theoretic computation
• Bayesian approach: Probabilities represent beliefs of
experts (data can update beliefs)
• Utilities represented as a matrix of all diseases
• A matrix entry pair < Dj Dk> encodes the (patient) utility
of diagnosing Dk when patient really has Dk
• Since no therapeutic recommendations are made, the
model can use one representative patient (the expert),
expressed in micromorts and willingness-to-pay to
avoid risk of each outcome
Pathfinder Computation
• Normally we would use the general form of
Bayes Theorem:
P(F1I1, F2I2 FtIt | Dk, x )P(Dk | x )
P(Dk | F1I1, F2I2 FtIt , x ) 
Dl P(F1I1, F2I2FtIt | Dl,x )P(Dl | x )
• But that involves exponential number of
probabilities to be acquired and represented
Pathfinder 1: The Simple Bayes Version
• Assuming conditional independence of
features (Simple or Naïve Bayes):
P(F1I1, F2I2 FtIt | Dk, x )  P(F1I1 | Dk, x )P( F2I2 | Dk, x )...P(FtIt) | Dk, x )
  P( FiIi | Dk, x )
i
• Assuming mutual exclusivity and exhaustiveness
of diseases the overall computation is tractable:
 P(F I D ,x )P(D | x )
|  F I ,x ) 
 P(F I D ,x )P(D | x )
i i|
P(Dk
k
k
i
i i
i i|
i
Dl
i
l
l
Pathfinder 2: The Belief Network Version
• Mutual exclusivity and exhaustiveness of diseases is
reasonable in lymphnode pathology
– Single disease per examined lymph node
– Large, exhaustive knowledge base
• Conditional independence is less reasonable and can
lead to erroneous conclusions
• The simple Bayes representation of Pathfinder 1 was
therefore enhanced to a belief network in Pathfinder 2
which included explicit dependencies between different
features, still taking advantage of any explicit global and
conditional independencies
Decision-Theoretic Diagnosis
• Using the utility matrix and given observations f,
the expected diagnostic utility using f is averaged
over all diagnoses:
– EU(Dk(f)) = SjP(Dj| f)U(Dj,Dk)
• Thus, Dx(f) = ARGMAXk [EU(Dk (f))
• However, since the diagnosis is sensitive to the
utility model, Pathfinder does not recommend it,
only the probabilities P(Dk |f)
Pathfinder: Gathering Information
• Next best feature to observe is recommended using a
myopic approximation, which considers only up to one
single feature to be observed
• The feature chosen maximizes EU given that a
diagnosis would be made after observing it
• Feature f is chosen that maximizes NVI(f)
• Although myopic approximation could backfire, in
practice it works well
– especially when U(Dj,Dk) =is set to 0 if one of the diseases is
malignant and the other benign, and set to 1 if they are both
malignant or both benign
Pathfinder 2:
Knowledge Acquisition
• To facilitate acquisition of multiple probabilities, a
Similarity Network model was developed
• Using similarity networks, an expert creates multiple
small belief networks, representing 2 or more diseases
that are difficult to distinguish
• The local belief networks are then unified into a global
belief network, preserving soundness
• The graphical interface also allows partitioning of
diseases into sets, relative to each set some feature is
independent, thus further assisting in the construction
Pathfinder 1 and 2: Evaluation
• Pathfinder 1 was compared to Pathfinder 2 using 53 cases,
a new user, and a thorough analysis of each case
– Diagnostic accuracy of PF2 is greater than that of PF1 (gold
standard: the main domain expert’s distribution and his
assessment on a scale of 1 to 10)
– Difference is due to better probabilistic representation (better
acquisition and inference)
– Cost of constructing PF2 rather than PF1 is justified by the
improvements, (measure: the utility of the diagnosis)
– PF2 is at least as good as the main domain expert, with respect
to diagnostic accuracy