CS 489 - Machine Learning - Bayesian network Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017 Special appreciation to Ian Goodfellow, Joshua Bengio, Aaron Courville, Michael Nielsen, Andrew Ng, Katie Malone, Sebastian Thrun, Ethem Alpaydin, Christopher Bishop, Geoffrey Hinton, Tom Mitchell. Bayesian networks • Naive Bayes assumption of conditional independence is too restrictive • It is intractable without some of these assumptions • Bayesian network describes conditional independence among subsets of variables It allows combining prior knowledge about independences among variables with observed training data Bayesian networks Definition: X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z; that is, if: (∀xi,yj,zk) P(X = xi | Y = yj, Z = zk) = P(X = xi | Z = zk) Example: Two coins, regular coin and fake two-tailed coin (P(h) = 0). Choose a coin and toss two times, define: A = First coin toss result in Head B = Second coin toss result in Head C = Regular coin has been selected A and B are dependent (since A happens tell us it is a regular coin, so probability of B will be changed) Given C (regular coin has been selected), A and B are independent. P(A| B, C) = P(A|C) Bayesian networks • A simple, graphical notation for conditional independence assertions, and it specifies a joint distribution in a structured form. • Represent dependence/independence via a directed graph – a set of nodes, for random variables – A directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: p(X1, X2,....XN) = The full joint distribution Π p(Xi | Pa(Xi ) ) The graph-structured approximation Pa(X) = immediate parents of X in the graph Simple practice • What is the P(A,B,C)? B A C p(A,B,C) = p(C|A,B) p(A) p(B) “Explaining away” effect: Given C, observing A makes B less likely e.g., earthquake/burglary/alarm example A and B are (marginally) independent but become dependent once C is known Simple practice • What is the P(A,B,C)? p(A,B,C) = p(A) P(B) P(C) A B C Simple practice A • What is the P(A,B,C)? B p(A,B,C) = p(B|A) P(C|A) P(A) B and C are conditionally independent Given A e.g., A is a disease, and we model B and C as conditionally independent symptoms given A C Simple practice • What is the P(A,B,C)? p(A,B,C) = p(C|B) p(B|A) p(A) Markov dependence A B C Bayesian networks Properties of Bayesian network: • Requires that graph is acyclic (no directed cycles) • Two components – The graph structure (conditional independence assumptions) – The numerical probabilities (for each variable given its parents) Bayesian networks What is the relationship of Bayesian net and Naive Bayesian? • Naive Bayesian is a special case of Bayesian net. Bayesian networks Why we favor BN? • Representation cost: – In previous example, we have five variables: F, A, S, H, N. So we need 25 -1 probability statements. But with BN, we only need 2 + 2 + 8 + 4 + 4 = 20. • Efficient learning computation • Incorporation of domain knowledge Bayesian network learning There are several cases: • Network structure is known or unknown • Variable values might be fully observed / partly observed The parameters are actually the conditional probability table we calculated in Naive Bayesian algorithm. What to do?
© Copyright 2026 Paperzz