Detecting Hidden Variables: A Structure-Based Approach
ABSTRACT: We examine how to detect hidden variables when learning probabilistic models. This problem is crucial for for improving our
understanding of the domain and as a preliminary step that guides the learning procedure. A natural approach is to search for ``structural
signatures'' of hidden variables. We make this basic idea concrete, and show how to integrate it with structure-search algorithms. We
evaluate this method on several synthetic and real-life datasets, and show that it performs surprisingly well.
A Bayesian network represents a joint probability over a set of
random variables using a DAG :
P(X1,…Xn)=P(V)P(S)P(T|V) … P(X|A)P(D|A,B)
Daphne Koller
Hebrew University
{galel,noaml,nir}@huji.ac.il
Stanford University
[email protected]
1200
Characterizing Hidden Variables
This following theorem helps us to detect structural
signatures for the presence of hidden variables:
600
600
Logloss on
test data
What is a Bayesian Network
Gal Elidan, Noam Lotner, Nir Friedman
120
400
800
400
80
200
200
0
0
-200
A
G M
V
400
I
L
V
The Alarm
network
Hidden
PVSAT
ANAPHYLAXIS
Naive
ARTCO2
40
H
I
L
EXPCO2
SAO2
TPR
0
0
H
Original
V
HYPOVOLEMIA
LVFAILURE
CATECHOL
200
Visit to
Asia
Smoking
Tuberculosis
Parents of H
Lung Cancer
preserve I-Map
Bronchitis
Abnormality
in Chest
all parents connected
to all children
Children of H
200
-200
-400
0
-600
-200
-800
P(D|¬A,B)=0.1
150
0
100
-1000
50
-2000
0
LVEDVOLUME
CVP
PCWP
STROEVOLUME
CO
HREKG
P(D| ¬ A, ¬B)=0.01
G M
V
H
I
L
V
Alarm 1k
H
I
L
BP
V
Alarm 10k
HR is hidden and
structure learned
from data
PVSAT
ANAPHYLAXIS
Bayesian scoring metric:
Real-life example: Stockdata
The FindHidden Algorithm
P (D | G , )P ( | G )d
Search for semi-cliques by expansion of 3-clique seeds
EXPCO2
LVFAILURE
CATECHOL
HR
market trend:
“Strong”
LVEDVOLUME
vs.
Semi-Clique S with N nodes
Score (G : D ) log P (G | D ) log P (D | G ) log P (G ) C
HYPOVOLEMIA
ARTCO2
SAO2
TPR
Learning: Structural EM
HRSAT
HRBP
Reference: network with no hidden. Original: golden model for artificial datasets;
best on test data. Naive: hidden parent of all nodes; acts as a straw-man. Hidden:
best FindHidden network; outperforms Naive and Reference, excels Original on
training data. Efficient Frozen EM performs as well as inefficient Flexible EM.
P(D|A, ¬B)=0.1
ERRCAUTER
HR
ERRBLOW
HISTORY
-400
Insurance 1k
Clique over
children of H
1000
0
A
P(D|A,B) = 0.8
Dyspnea
X-Ray
Parents of H
(not introducing new
independencies)
H
Score on
Training data
400
STROEVOLUME
ERRCAUTER
ERRBLOW
HISTORY
HIDDEN
(MARKET TREND)
“Stationary”
CVP
PCWP
CO
HREKG
HRSAT
HRBP
BP
E-Step:
Computation
X1
Training
Data
X2
+
X3
H
Y1
Y2
Y3
M-Step:
Score & Parameterize
X1
Expected Counts
N(X1)
N(X2)
N(X3)
N(H, X1, X1, X3)
...
X2
# neighbors
X3
MICROSOFT
N
2
DELL
3Com
COMPAQ
FindHidden
breaks clique
H
PVSAT
Y1
X1
Y2
X2
Y3
Propose a candidate network:
H
re-iterate with best candidate
Y1
Y2
ANAPHYLAXIS
all other nodes
(1) Introduce H as a parent of all nodes in S
(2) Replace all incoming edges to S by edges to H
(3) Remove all inter-S edges
(4) Make all children of S children of H if acyclic
X3
Y3
TPR
Hidden
HYPOVOLEMIA
X2
H
X3
X1
X2
Applying the algorithm
X3
not introducing new
independencies
Summary and Future Work
EM was applied with Fixed structure, Frozen structure
(modify only semi-clique neighborhood) and Flexible
structure
X2
X1
X2
X1
X3
LVFAILURE
CATECHOL
HR
CVP
X1
X3
PCWP
STROEVOLUME
Y1
Y2
Y3
Y1
Y2
Y3
Representation: The I-map—minimal structure which implies
only independencies that hold in the marginal distribution—is
typically complex
Improve Learning: Detecting approximate position is crucial
pre-processing for the EM algorithm
Understanding: A true hidden variable improves the quality
and “order” of the explanation
X1
X2
Y1
Structural
EM
H
H
Find
Hidden
Y1
Y2
Y1
Y3
Y2
Y3
HREKG
We introduced the importance of hidden variables and
implemented a natural idea to detect them. FindHidden
performed surprisingly well and proved extremely useful as a
preliminary step to a learning algorithm.
X3
Y2
Y3
Y1
Y2
X1
Structural
EM
PVSAT
Y3
X2
X3
Y1
Y2
X1
ARTCO2
EXPCO2
SAO2
TPR
Experiment with multi-valued hidden variables
Y3
X2
Use additional information such as edge confidence
Detect hidden variables when the data is sparse
Hidden
LVFAILURE
CATECHOL
HR
LVEDVOLUME
CVP
PCWP
Explore hidden variables in Probabilistic Relational Models
STROEVOLUME
HISTORY
ERRCAUTER
ERRBLOW
CO
HREKG
HRBP
BP
We choose the best scoring candidate produced by the SEM
HRSAT
EM adapts
structure
Explore additional structural signatures
H
ERRCAUTER
BP
HYPOVOLEMIA
H
ERRBLOW
HRBP
Further extensions:
X3
HISTORY
CO
ANAPHYLAXIS
Original
network
EXPCO2
SAO2
LVEDVOLUME
Why hidden variables?
ARTCO2
HRSAT
© Copyright 2026 Paperzz