6PP

CS 188: Artificial Intelligence
Fall 2006
Lecture 15: Bayes Nets
10/19/2006
Probabilistic Models
ƒ A probabilistic model is a joint distribution over a set of
variables
ƒ Given a joint distribution, we can reason about
unobserved variables given observations (evidence)
ƒ General form of a query:
Stuff you
care about
Stuff you
already know
Dan Klein – UC Berkeley
ƒ This kind of posterior distribution is also called the belief
function of an agent which uses this model
Independence
Example: Independence
ƒ Two variables are independent if:
ƒ N fair, independent coin flips:
ƒ This says that their joint distribution factors into a product two
simpler distributions
ƒ Independence is a modeling assumption
ƒ Empirical joint distributions: at best “close” to independent
ƒ What could we assume for {Weather, Traffic, Cavity,
Toothache}?
H
0.5
H
0.5
H
0.5
T
0.5
T
0.5
T
0.5
ƒ How many parameters in the joint model?
ƒ How many parameters in the independent model?
ƒ Independence is like something from CSPs: what?
Example: Independence?
ƒ Most joint distributions
are not independent
ƒ Most are poorly
modeled as
independent
Conditional Independence
ƒ P(Toothache,Cavity,Catch)?
T
P
S
P
warm
cold
0.5
sun
0.6
0.5
rain
0.4
ƒ If I have a cavity, the probability that the probe catches in it doesn't
depend on whether I have a toothache:
ƒ P(catch | toothache, cavity) = P(catch | cavity)
ƒ The same independence holds if I don’t have a cavity:
ƒ P(catch | toothache, ¬cavity) = P(catch| ¬cavity)
T
S
P
T
S
P
warm
sun
0.4
warm
rain
0.1
warm
sun
0.3
warm
rain
0.2
cold
sun
0.2
cold
rain
0.3
cold
sun
0.3
cold
rain
0.2
ƒ Catch is conditionally independent of Toothache given Cavity:
ƒ P(Catch | Toothache, Cavity) = P(Catch | Cavity)
ƒ Equivalent statements:
ƒ P(Toothache | Catch , Cavity) = P(Toothache | Cavity)
ƒ P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
1
Conditional Independence
ƒ Unconditional (absolute) independence is very rare
(why?)
The Chain Rule II
ƒ Can always factor any joint distribution as an incremental
product of conditional distributions
ƒ Conditional independence is our most basic and robust
form of knowledge about uncertain environments:
ƒ Why?
ƒ What about this domain:
ƒ Traffic
ƒ Umbrella
ƒ Raining
ƒ This actually claims nothing…
ƒ What about fire, smoke, alarm?
The Chain Rule III
ƒ Trivial decomposition:
ƒ With conditional independence:
ƒ What are the sizes of the tables we supply?
Graphical Models
ƒ Models are descriptions of how
(a portion of) the world works
ƒ Models are always simplifications
ƒ May not account for every variable
ƒ May not account for all interactions
between variables
ƒ What do we do with probabilistic models?
ƒ Conditional independence is our most basic and robust
form of knowledge about uncertain environments
ƒ Graphical models help us manage independence
Bayes’ Nets: Big Picture
ƒ Two problems with using full joint distributions for
probabilistic models:
ƒ Unless there are only a few variables, the joint is WAY too big to
represent explicitly
ƒ Hard to estimate anything empirically about more than a few
variables at a time
ƒ Bayes’ nets (more properly called graphical models) are
a technique for describing complex joint distributions
(models) using a bunch of simple, local distributions
ƒ We describe how variables locally interact
ƒ Local interactions chain together to give global, indirect
interactions
ƒ For about 10 min, we’ll be very vague about how these
interactions are specified
ƒ We (or our agents) need to reason about unknown variables,
given evidence
ƒ Example: explanation (diagnostic reasoning)
ƒ Example: prediction (causal reasoning)
ƒ Example: value of information
Graphical Model Notation
ƒ Nodes: variables (with domains)
ƒ Can be assigned (observed) or
unassigned (unobserved)
ƒ Arcs: interactions
ƒ Similar to CSP constraints
ƒ Indicate “direct influence” between
variables
ƒ For now: imagine that arrows
mean causation
2
Example: Coin Flips
Example: Traffic
ƒ Variables:
ƒ N independent coin flips
ƒ R: It rains
ƒ T: There is traffic
X1
X2
Xn
R
ƒ Model 1: independence
T
ƒ Model 2: rain causes traffic
ƒ No interactions between variables:
absolute independence
ƒ Why is an agent using model 2 better?
Example: Traffic II
Example: Alarm Network
ƒ Let’s build a causal graphical model
ƒ Variables
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ Variables
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
T: Traffic
R: It rains
L: Low pressure
D: Roof drips
B: Ballgame
C: Cavity
Bayes’ Net Semantics
ƒ Let’s formalize the semantics of a
Bayes’ net
B: Burglary
A: Alarm goes off
M: Mary calls
J: John calls
E: Earthquake!
Probabilities in BNs
ƒ Bayes’ nets implicitly encode joint distributions
A1
An
ƒ A set of nodes, one per variable X
ƒ As a product of local conditional distributions
ƒ To see what probability a BN gives to a full assignment, multiply
all the relevant conditionals together:
ƒ A directed, acyclic graph
ƒ A conditional distribution for each node
X
ƒ A distribution over X, for each combination
of parents’ values
ƒ CPT: conditional probability table
ƒ Description of a noisy “causal” process
A Bayes net = Topology (graph) + Local Conditional Probabilities
ƒ Example:
ƒ This lets us reconstruct any entry of the full joint
ƒ Not every BN can represent every full joint
ƒ The topology enforces certain conditional independencies
3
Example: Coin Flips
X1
X2
Example: Traffic
Xn
R
h
0.5
h
0.5
h
0.5
t
0.5
t
0.5
t
0.5
r
T
¬r
r
1/4
¬r
3/4
t
3/4
¬t
1/4
t
1/2
¬t
1/2
Only distributions whose variables are absolutely independent
can be represented by a Bayes’ net with no arcs.
Example: Alarm Network
Example: Naïve Bayes
ƒ Imagine we have one cause y and several effects x:
ƒ This is a naïve Bayes model
ƒ We’ll use these for classification later
Example: Traffic II
ƒ Variables
ƒ
ƒ
ƒ
ƒ
ƒ
Size of a Bayes’ Net
ƒ How big is a joint distribution over N Boolean variables?
L
T: Traffic
R: It rains
L: Low pressure
D: Roof drips
B: Ballgame
R
D
B
ƒ How big is an N-node net if nodes have k parents?
T
ƒ
ƒ
ƒ
ƒ
Both give you the power to calculate
BNs: Huge space savings!
Also easier to elicit local CPTs
Also turns out to be faster to answer queries (next class)
4
Building the (Entire) Joint
ƒ We can take a Bayes’ net and build the full joint
distribution it encodes
ƒ Typically, there’s no reason to build ALL of it
ƒ But it’s important to know you could!
ƒ To emphasize: every BN over a domain implicitly
represents some joint distribution over that
domain
Example: Reverse Traffic
Example: Traffic
ƒ Basic traffic net
ƒ Let’s multiply out the joint
R
r
T
¬r
r
1/4
¬r
3/4
t
3/4
¬t
1/4
t
1/2
¬t
1/2
r
t
3/16
r
¬t
1/16
¬r
t
6/16
¬r
¬t
6/16
Causality?
ƒ When Bayes’ nets reflect the true causal patterns:
ƒ Reverse causality?
ƒ Often simpler (nodes have fewer parents)
ƒ Often easier to think about
ƒ Often easier to elicit from experts
ƒ BNs need not actually be causal
T
t
R
¬t
t
9/16
¬t
7/16
r
1/3
¬r
2/3
r
1/7
¬r
6/7
r
t
3/16
r
¬t
1/16
¬r
t
6/16
¬r
¬t
6/16
ƒ Sometimes no causal net exists over the domain (especially if
variables are missing)
ƒ E.g. consider the variables Traffic and Drips
ƒ End up with arrows that reflect correlation, not causation
ƒ What do the arrows really mean?
ƒ Topology may happen to encode causal structure
ƒ Topology really encodes conditional independencies
Creating Bayes’ Nets
ƒ So far, we talked about how any fixed Bayes’ net
encodes a joint distribution
ƒ Next: how to represent a fixed distribution as a
Bayes’ net
ƒ Key ingredient: conditional independence
ƒ The exercise we did in “causal” assembly of BNs was
a kind of intuitive use of conditional independence
ƒ Now we have to formalize the process
ƒ After that: how to answer queries (inference)
5