6PP

Announcements
CS 188: Artificial Intelligence
ƒ Optional midterm
Fall 2006
Lecture 18: Decision Diagrams
10/31/2006
ƒ On Tuesday 11/21 in class unless conflicts
ƒ We will count the midterms and final as 1 / 1 / 2, and
drop the lowest (or halve the final weight)
ƒ Will also run grades as if this midterm did not happen
ƒ You will get the better of the two grades
ƒ Projects
Dan Klein – UC Berkeley
Recap: Sampling
ƒ Exact inference can be slow
ƒ 3.2 up today, grades on previous projects ASAP
ƒ Total of 3 checkpoints left (including 3.2)
Likelihood Weighting
ƒ Problem with rejection sampling:
ƒ Basic idea: sampling
ƒ Draw N samples from a sampling distribution S
ƒ Compute an approximate posterior probability
ƒ Show this converges to the true probability P
ƒ Benefits:
ƒ Easy to implement
ƒ You can get an (approximate) answer fast
ƒ If evidence is unlikely, you reject a lot of samples
ƒ You don’t exploit your evidence as you sample
ƒ Consider P(B | A=true)
Burglary
Alarm
ƒ Idea: fix evidence variables and sample the rest
Burglary
Alarm
ƒ Last time:
ƒ Rejection sampling: reject samples disagreeing with evidence
ƒ Likelihood weighting: use evidence to weight samples
ƒ Problem: sample distribution not consistent!
ƒ Solution: weight by probability of evidence given parents
Likelihood Sampling
Likelihood Weighting
ƒ Sampling distribution if z sampled and e fixed evidence
Cloudy
Cloudy
C
ƒ Now, samples have weights
Sprinkler
S
Rain
R
Rain
W
WetGrass
ƒ Together, weighted sampling distribution is consistent
1
Likelihood Weighting
ƒ Note that likelihood weighting
doesn’t solve all our problems
ƒ Rare evidence is taken into account
for downstream variables, but not
upstream ones
ƒ A better solution is Markov-chain
Monte Carlo (MCMC), more
advanced
ƒ We’ll return to sampling for robot
localization and tracking in dynamic
BNs
Decision Networks
ƒ MEU: choose the action which
maximizes the expected utility
given the evidence
Cloudy
C
S
U
ƒ Can directly operationalize this
with decision diagrams
Rain
R
ƒ Bayes nets with nodes for
utility and actions
ƒ Lets us calculate the expected
utility for each action
W
Weather
ƒ New node types:
ƒ Chance nodes (just like BNs)
ƒ Actions (rectangles, must be
parents, act as observed
evidence)
ƒ Utilities (depend on action and
chance nodes)
Decision Networks
ƒ Action selection:
Umbrella
Report
Example: Decision Networks
Umbrella
ƒ Instantiate all
evidence
ƒ Calculate posterior
over parents of
utility node
ƒ Set action node
each possible way
ƒ Calculate expected
utility for each
action
ƒ Choose maximizing
action
Umbrella
U
U
Weather
Weather
A
W
P(W)
leave
sun
sun
0.7
leave
rain
0
rain
0.3
take
sun
20
take
rain
70
W
Report
Example: Decision Networks
U(A,W)
100
Value of Information
ƒ Idea: compute value of acquiring each possible piece of evidence
ƒ Can be done directly from decision network
Umbrella
W
P(W)
sun
0.7
rain
0.3
R
clear
cloudy
R
ƒ Example: buying oil drilling rights
U
ƒ
ƒ
ƒ
ƒ
Weather
A
W
leave
sun
100
P(R|sun)
leave
rain
0
0.5
take
sun
20
take
rain
70
0.5
P(R|rain)
clear
0.2
cloud
0.8
Report
Two blocks A and B, exactly one has oil, worth k
Prior probabilities 0.5 each, mutually exclusive
Current price of each block is k/2
Probe gives accurate survey of A. Fair price?
DrillLoc
U
OilLoc
U(A,W)
ƒ Solution: compute value of information
= expected value of best action given the information minus expected
value of best action without information
ƒ Survey may say “oil in A” or “no oil in A,” prob 0.5 each
= [0.5 * value of “buy A” given “oil in A”] +
[0.5 * value of “buy B” given “no oil in A”] – 0
= [0.5 * k/2] + [0.5 * k/2] - 0 = k/2
2
General Formula
ƒ
Current evidence E=e, possible utility inputs s
ƒ
Potential new evidence E’: suppose we knew E’ = e’
VPI Properties
ƒ Nonnegative in expectation
ƒ Nonadditive ---consider, e.g., obtaining Ej twice
ƒ
BUT E’ is a random variable whose value is currently unknown, so:
ƒ Must compute expected gain over all possible values
ƒ Order-independent
ƒ
(VPI = value of perfect information)
VPI Example
VPI Scenarios
ƒ Imagine actions 1 and 2, for which U1 > U2
ƒ How much will information about Ej be worth?
Umbrella
U
Weather
Report
Little – we’re sure
action 1 is better.
Reasoning over Time
ƒ Often, we want to reason about a sequence of
observations
ƒ Speech recognition
ƒ Robot localization
ƒ User attention
Little – info likely to
change our action
but not our utility
Markov Models
ƒ A Markov model is a chain-structured BN
ƒ Each node is identically distributed (stationarity)
ƒ Value of X at a given time is called the state
ƒ As a BN:
X1
ƒ Need to introduce time into our models
ƒ Basic approach: hidden Markov models (HMMs)
ƒ More general: dynamic Bayes’ nets
A lot – either could
be much better
X2
X3
X4
ƒ Parameters: called transition probabilities, specify
how the state evolves over time (also, initial probs)
3
Conditional Independence
Example
0.1
ƒ Weather:
X1
X2
X3
X4
0.9
ƒ States: X = {rain, sun}
ƒ Transitions:
ƒ Basic conditional independence:
ƒ Past and future independent of the present
ƒ Each time step only depends on the previous
ƒ This is called the (first order) Markov property
sun
0.9
0.1
This is a
CPT, not a
BN!
ƒ Initial distribution: 1.0 sun
ƒ What’s the probability distribution after one step?
Mini-Forward Algorithm
ƒ Question: probability of being in state x at time t?
ƒ Slow answer:
rain
Example
ƒ From initial observation of sun
ƒ Enumerate all sequences of length t which end in s
ƒ Add up their probabilities
P(X1)
ƒ Better answer: cached incremental belief update
P(X2)
P(X3)
P(X∞)
ƒ From initial observation of rain
P(X1)
Forward simulation
Stationary Distributions
ƒ If we simulate the chain long enough:
ƒ What happens?
ƒ Uncertainty accumulates
ƒ Eventually, we have no idea what the state is!
P(X3)
P(X∞)
Web Link Analysis
ƒ PageRank over a web graph
ƒ Each web page is a state
ƒ Initial distribution: uniform over pages
ƒ Transitions:
ƒ With prob. c, uniform jump to a
random page (solid lines)
ƒ With prob. 1-c, follow a random
outlink (dotted lines)
ƒ Stationary distributions:
ƒ For most chains, the distribution we end up in is
independent of the initial distribution
ƒ Called the stationary distribution of the chain
ƒ Usually, can only predict a short time out
P(X2)
ƒ Stationary distribution
ƒ
ƒ
ƒ
ƒ
Will spend more time on highly reachable pages
E.g. many ways to get to the Acrobat Reader download page!
Somewhat robust to link spam
Google 1.0 returned pages containing your keywords in
decreasing rank, now all search engines use link analysis
along with many other factors
4