Estimation of Distribution Algorithm based on
Probabilistic Grammar with Latent Annotations
Written by Yoshihiko Hasegawa and Hitoshi Iba
Summarized by Minhyeok Kim
Contents
• Introduction
– Two groups in GP-EDA
• PCFG-LA
–
–
–
–
–
–
–
–
–
PCFG and PCFG-LA
probability of the annotated tree
Probability of a observed tree
Log-likelihood and Update formula
Assumptions
Forward-backward probability
P(T;Θ) by forward and backward
Parameter update formula
Initial parameters
• PAGE(Programming with Anntated Grammar Estimation)
• Experiment
– Royal Tree Problem
– DMAX Problem
• Conclusion
Two groups in GP-EDA
• Proto-type tree based method
– It translates variable length tree-structures into fixed
length structures
• PCFG based method
– It is considered to be well suited for expressing
functions in GP
– It’s production rules do not depend on the ascendant
nodes or sibling nodes
– It can not take into account the interactions among
nodes
PCFG-LA(1/10)
- PCFG and PCFG-LA
• PCFG
– 0.7 VP → V NP
– 0.3 VP → V NP NP
• PCFG-LA
– PCFG + Latent annotations
PCFG-LA (2/10)
-Probability of the annotated tree
• The probability of the annotated tree
– T : derivation tree
– xi : annotation of ith non-terminal (all the nonterminals are numbered from the root)
– X = {x1, x2, ...}
– π(S[x]) : probability of S[x] at the root position
– β(r) : probability of annotated production rule r
– DT[X] : multi-set of used annotated rules in tree T
– Θ : set of parameters Θ = {π, β}.
PCFG-LA (3/10)
-Probability of a observed tree
• The probability of a observed tree
– It can be calculated with summing over
annotations
– The parameters (π and β) have to be
estimated by EM algorithm
PCFG-LA (4/10)
-Log-likelihood and Update formula
• The difference of log-likelihood between p
arameters Θ’ and Θ
• the update formula can be obtained by
optimizing Q(Θ’|Θ)
PCFG-LA (5/10)
-Assumptions
• Using not CNF but GNF
• To reduce the number of parameters,
assume that all right-side non-terminal
symbols have the same annotation
PCFG-LA (6/10)
-Forward-backward probability(1/2)
• Backward probability biT(x)
– The probability that the tree beneath ith nonterminal S[x] is generated
• Forward probability fiT(y)
– The probability that the tree above ith nonterminal S[y] is generated
PCFG-LA (7/10)
-Forward-backward probability(2/2)
• Backward probability
• Forward probability
– ch(i,T) :function which returns the set of non-terminal
children indices of ith non-terminal in T
– pa(i, T) : returns a parent index of ith non-terminal in T
– giT :terminal symbol in CFG and is connected to ith
non-terminal symbol in T
PCFG-LA (8/10)
-P(T;Θ) by forward and backward
• P(T;Θ)
– cover(g,Ti) : function which returns a set of
non-terminal indices at which the production
rule generating g without annotations is
rooted in Ti
PCFG-LA (9/10)
-Parameter update formula
• Parameter update formula
– By using the forward-backward probability
and optimizing Q(Θ’|Θ)
PCFG-LA (10/10)
-Initial parameters
• EM algorithm maximizes the log-likelihood
•
monotonically from the initial parameters
Initial parameters
– κ : random value which is uniformly distributed over
[−log 3, log 3]
– γ(S → g S....S) : probability of observed production
rule (without annotations)
– β(S[x] → g S[x]...S[x]) = 0.
PAGE
(Programming with Anntated Grammar Estimation)
• Flowchart
Initialization of individuals
Evaluation of individuals
Selection of individuals
Estimation of parameters
Generation of new individuals
Experiment
-Royal Tree Problem
• Royal Tree Problem
– Each function has increasing arity
• a has 1 arity, b has 2,…
– Perfect tree whose level is smaller by 1 level than the
level
• P-tree of level c is composed of the P-tree of level b
– Level d royal tree problem in this experiments
– To compare the performance between PAGE and
PCFG-GP
Experiment
-DMAX Problem
• DMAX problem
– MAX problem + deceptiveness
– {+m,×m}, {λ,0.95}, λr=1
– Depth 4, m(arity) 5, r(power) 3
– To show the superiority over simple GP
Conclusion
• In the royal tree problem, we showed that the number of
•
•
•
•
annotations greatly affects the search performance and
larger annotation size offered better performance
The result of DMAX problem showed that PAGE is highly
effective for the problem with strong deceptiveness
PAGE uses EM algorithm, so it is more computationally
expensive
The performance of PAGE is much more superior than
none-annotation algorithm
It is important to optimize these two contradicting factors
which will be examined in the future work
© Copyright 2026 Paperzz