0.80 P(E|A) = 0.70 P(~E|A)

Artificial Intelligence and Lisp #5
Causal Nets (continued)
Learning in Decision Trees and Causal Nets
Lab Assignment 3
Causal Nets

A causal net consists of:

A set of independent terms

A partially ordered set of dependent terms


An assignment of a dependency expression to
each dependent term. (These may be decision trees)
The dependency expression for a term may use
independent terms, and also dependent terms that
are lower than the term at hand. This means the
dependency graph is not cyclic
Example of Causal Net
Car moves
Headlights on
Main engine runs
Starting motor
runs
Fuse OK
Battery
charged
Key turned
Clutch
engaged
Gas in tank
Causal Nets II




A causal net is an acyclic graph where each node
(called a term) represents some condition in the
world (i.e., a feature) and each link indicates a
dependency relationship between two terms
Terms in a causal net that do not have any
predecessor are called independent terms
A dependence specification for a causal net is an
assignment, to each dependent term, of an
expression using its immediate predecessors
A causal net is exhaustive iff all actual
dependencies are represented by links in it.
Dependence specification for one of the terms
using discrete term values
Headlights-on
=
[fuse-ok?
[battery-charged?
[battery-charged?
[fuse-ok?
[battery-charged?
false ]
true false]
false false]]
true
false]
[<fuse-ok battery-charged>?
true false false false
:range <<true true><true false>
<false true><false false>> ]
Observations on previous slide


A decision tree, where the same term is used
throughout on each level, may become
unnecessarily large
A decision tree can always be converted to an
equivalent tree of depth 1 by introducing
sequences of terms, and corresponding
sequences of values for the terms
Main topic for first part of lecture





Given:
An exhaustive causal net equipped with
dependence specifications that may also use
probabilities
A specification of a priori probabilities for each
one of the independent terms (more exactly, a
probability distribution over its admitted values)
A value for one of the dependent terms
Desired: Inferred probabilities for the
independent terms, alone or in combination,
based on this information.
Inverse operation (from previous lecture)
Consider this simple case first:
lights-are-on
[noone-home?
<70 30> <20 80>]
If it is known that lights-are-on is true,
what is the probability for noone-home ?
Possible combinations:
lights-are-on
noone-home
true
0.70
0.30
false
0.20
0.80
Suppose noone-home is true in 20% of overall cases, obtain:
lights-are-on
noone-home
true
0.14
0.06
false
0.16
0.64
Given lights-are-on, noone-home has 14/30 = 46.7%
probability. .
Inverse operation (from previous lecture)
Consider this simple case first:
lights-are-on
[noone-home?
<70 30> <20 80>]
If it is known that lights-are-on is true,
what is the probability for noone-home ?
Possible combinations:
noone-home
false
lights-are-on
0.70
0.30
0.20
0.80
Suppose noone-home is true in 20% of overall cases, obtain:
lights-are-on
noone-home
0.14
0.06
false
0.16
0.64
Given lights-are-on, noone-home has 14/30 = 46.7%
probability. The probability estimate has changed from 20%
to 46.7% according to the additional information.
Redoing the example systematically
lights-are-on
noone-home
true
false
0.70
0.30
-----------0.20
0.80
probabilities cond'l
on noone-home
Suppose noone-home is true in 20% of overall cases, i.e. the
a priori probabillity for noone-home is 0.20
lights-are-on
noone-home
true
false
0.14
0.16
0.06
0.64
a priori probabilities
lights-are-on
noone-home
true
false
14/30
16/30
|
|
6/70
64/70
probabilities cond'l
on lights-are-on
Bayes' Rule
E = lights-are-on
noone-home
true
false
0.70
0.30
-----------0.20
0.80
P(A|E) =
P(E|A)*P(A)/P(E)
noone-home is true in 20% of overall cases:
a priori probabillity for noone-home is 0.20
lights-are-on
noone-home
true
false
0.14
0.16
0.06
0.64
14/30 =
0.70*0.20/0.30
E = lights-are-on
noone-home
true
false
14/30
16/30
|
|
6/70
64/70
Bayes' Rule
E = lights-are-on
noone-home
true
false
0.70
0.30
-----------0.20
0.80
P(A|E) =
P(E|A)*P(A)/P(E)
Known:
noone-home is true in 20% of overall cases:
P(A) = 0.20,
P(~A) = 0.80
P(E|A) = 0.70
P(~E|A) = 0.30
P(E|~A) = 0.20
P(~E|~A) = 0.80
P(E) = P(E|A)*P(A) + P(E|~A)*P(~A) =
0.70 * 0.20 + 0.20 * 0.80 = 0.30
P(A|E) = 0.70 * 0.20 / 0.30
=
14/30
Derivation of Bayes' Rule
To prove:
P(A|E) = P(E|A)*P(A)/P(E)
P(E&A) = P(E|A) * P(A)
P(A&E) = P(A|E) * P(E)
P(A|E)*P(E) = P(E|A)*P(A)
By a similar proof (exercise!)
P(A|E&B) = P(E|A&B)*P(A|B)/P(E|B)
More than two term values
E = lights-are-on
0 home
1 home
>1 home
0.70
0.30
-----------0.20
0.80
-----------0.05
0.95
P(A|E) =
P(E|A)*P(A)/P(E)
Only difference: we need P(A) for each one of the
three possible outcomes for A, i.e., we need a probability
distribution over the possible values of A
Two-level Decision Tree
dog-outside
E
E
[A?
[B?
[B?
[noone-home?
[dog-sick?
[dog-sick?
0.80
0.70
<80 20> <70 30>]
<70 30> <30 70>] ]
0.70]
0.30] ]
[<A B>? 0.80 0.70 0.70 0.30
:range <<true true> <true false> <false true>
<false false>> ]
Two-level Decision Tree
P(A) = 0.20
E
[<A B>?
0.80 0.70 0.70 0.30 ]
P(A|E) = P(E|A)*P(A)/P(E) which means that
P(A&B|E) = P(E|A&B)*P(A&B)/P(E)
P(A|E) can be obtained as P(A&B|E) + P(A&~B|E)
P(E|A&B) = 0.80
1.
2.
What is P(A&B) ?
What is P(E) ? Before it was obtained as
P(E) = P(E|A)*P(A) + P(E|~A)*P(~A)
Two-level Decision Tree
P(A) = 0.20
P(B) = 0.05
apriori
E
<A
<T
<T
<F
<F
B>
T>
F>
T>
F>
0.80
0.70
0.70
0.30
0.01
0.19
0.04
0.76
P(E) =
0.008
0.133
0.028
0.228
0.397
P(A&B|E) = P(E|A&B)*P(A&B)/P(E)
P(A|E) can be obtained as P(A&B|E) + P(A&~B|E)
P(E|A&B) = 0.80
1. P(A&B) = 0.01
2. P(E) = 0.397
P(A&B|E) = 0.80 * 0.01 / 0.397 ~ 0.02
Two-level Decision Tree
Explanation of the second line in the table
P(A) = 0.20
P(B) = 0.05
apriori
<A B>
<T T>
0.80
<T F>
0.70
...
P(E|A&~B) = 0.70
P(A&~B) = 0.19
0.01
0.19
0.008
0.133
conditional probability, given in
the decision tree
a priori probability (using independence)
P(E&A&~B) = P(E|A&~B)*P(A&~B) = 0.133
a priori probability
P(E) = P(E&A&B) + P(E&A&~B) + P(E&~A&B) +
P(E&~A&~B) = 0.397
a priori probability
Re-view assumptions made above

Given:

An exhaustive decision tree using probabilities, so that





P(A&B) = P(A) * P(B)
for each combination of independent terms
A specification of a priori probabilities for each one of the
independent terms (more exactly, a probability distribution
over its admitted values)
A value for one of the dependent terms
Desired: Inferred probabilities for the independent terms,
alone or in combination, based on this information.
Inverse evaluation across causal net
Observed feature
Headlights on
Car moves
Main engine runs
Starting motor
runs
Fuse OK
Battery
charged
Key turned
Clutch
engaged
Gas in tank
1. Remove irrelevant terms
Observed feature
Main engine runs
Starting motor
runs
Battery
charged
Key turned
Gas in tank
Inverse evaluation across causal net




1. Remove irrelevant terms (both “sideways”
and “upward”; also “downward” if apriori
probabilities are available anyway)
2. Calculate apriori probabilities “upward” from
independent terms to the observed one
3. Calculate inferred probabilities “downward”
from observed term to combinations of
independent ones
4. Add up probabilities for combinations of
independent terms
Learning in Decision Trees and Causal Nets




Obtaining a priori probabilities for given terms
Obtaining conditional probabilities in a decision
tree with a given set of independent terms, based
on a set of observed cases
Choosing the structure of a decision tree using a
given set of terms (assuming there is a cost for
obtaining the value of a term)
Identifying the structure of a causal net using a
given set of terms

Choosing (the structure of) a decision tree
Also applicable for trees without probabilities






Given a set of independent variables A, B,... and a large
number of instances of the values of these + value of E
Consider the decision tree for E only having the node A,
and similarly for B, C, etc.
Calculate P(E,A) and P(E,~A), and similarly for the other
alternative trees
Favor the choice that costs the least and that gives the
most information in the sense of information theory (the
“difference” between P(E,A) and P(E,~A) is as big as
possible)
Form subtrees recursively as long as it is worthwhile
This produces both structure and probabilities for the
decision tree
Assessing the precision of the decision tree

Obtain a sufficiently large set of instances of the
problem

Divide it into a training set and a test set

Construct a decision tree using the training set

Evaluate the elements of the test set in the
decision tree and check how well predicted
values match actual values
Roll-dice scenario






Roll a number of 6-sided dice and register the
following independent variables:
The color of the dice (ten different colors)
The showing of the 'seconds' dial on the watch,
ranging from 0 to 59
The showing of another dice that is thrown at
the same time
A total of 3600 different combinations
Consider a training set where no combination
occurs more than once
Roll-dice scenario





Conclusion from this scenario:
It is important to have a way of determining
whether the size of the (remaining) training set
at a particular node in the decision tree being
designed, is at all significant
This may be done by testing it against a null
hypothesis: could the training set at hand have
been obtained purely by chance?
It may also be done using human knowledge of
the domain at hand
Finally it can be done using a test set
Continuous-valued terms
and terms with a large number of discrete values

In order to be used in a decision tree, one must
aggregate the range of values into a limited
number of cases, for example by introducing
intervals (for value domains having a natural
ordering)



Identifying the structure of a causal net
This is very often done manually and using the
human knowledge about the problem domain.
Other possibility: select or generate a number of
alternative causal nets, learn dependence
specifications (e.g. decision trees) for each of
them using training sets, and assess their
precision using test sets
There is much more to learn
about Learning in A.I.



Statistically oriented learning: major part of the field at
present. Based on Bayesian methods and/or on neural
networks
Logic-oriented learning: identifying compact
representations of observed phenomena and behavior
patterns in terms of logic formulas
Case-based learning: the agent maintains a case base
of previously encountered situations, the actions it
took then, and the outcome of those actions. New
situations are addressed by finding a similar case that
was successful and adapting the actions that were
used then.
Lab 3: Using Decision Trees and
Causal Nets – the Scenario





Three classes of terms (features): illnesses,
symptoms, and cures
Cures include both use of medicines and other
kinds of cures
Causal net can model the relation from illness
to symptom
Another causal net can model the relation from
current illness + cure to updated illness
Both of these make use of dependency
expressions that are probabilistic decision trees
Milestone 3a




Downloaded lab materials will contain the
causal net going from disease to symptom, but
without the dependency expressions
It will also contain operations for direct
evaluation and inverse evaluation of decision
trees
The task will be to define plausible dependency
expressions for this causal net, and to run test
examples on it.
This includes both test examples given by us,
and test examples that you write yourself.
Milestone 3b



Additional downloaded materials will contain a set of
terms for medicines and cures, but without the causal
net, and a generator for (animals with) illnesses
The first part of the task is to define a plausible causal
net and associated dependency expressions for the
step from cures to update of illnesses
The second part of the task is to run a combined
system where animals with illnesses are diagnosed
and treated, and the outcome is observed.
Updated Plan for the Course

Please check the course webpage where the
plan for lectures and labs has been modified.
Lab 2 has been given one more week in the
tutorial sessions, and labs 4 and 5 have been
commuted.