Decision trees for hierarchical
multilabel classification
A case study in functional genomics
Work by
Hendrik Blockeel
Leander Schietgat
Jan Struyf
Katholieke Universiteit Leuven (Belgium)
Amanda Clare
University of Aberystwyth (Wales)
Sašo Džeroski
Jozef Stefan Institute Ljubljana (Slovenia)
Overview
Hierarchical Multilabel Classification
Predictive Clustering Trees for HMC
task description
the algorithm: Clus-HMC
Evaluation on yeast datasets
Hierarchical multilabel
classification (HMC)
Classification
HMC
instance can belong to multiple classes
classes are organised in a hierarchy
Example
predict class for unseen instances based on
(classified) training examples
toy hierarchy
Advantages
efficiency
skewed class distributions
hierarchical relationships
1
(1)
2/1
2
(3)
(2)
3
2/2
(4)
(5)
Predictive clustering trees
~ decision trees
each node (including leaves) is a cluster
tests in nodes are descriptions of clusters
Heuristic
[Blockeel et al. 1998]
minimize intra-cluster variance
maximise inter-cluster variance
Can be extended to perform HMC
distance measure d (quantifies similarity)
prediction function p (maps a cluster in a leaf
onto prediction)
Instantiating d
Class labels are represented in a
vector
Example
Si = {1,2,2/2},
{2}
1
(1)
2
(2)Sj =
3
(5)
vi = [1,1,0,1,0]
(1) (2) (3) (4) (5)
dEucl([1,1,0,1,0],[0,1,0,0,0])
2/1 (3)= 2/2
(4)
sqrt(w
+ w²)
Distance between vectors is defined
as the component-wise Euclidean
distance:
d(x1,x2) = √∑k wk • (v1,k – v2,k)2 (wk = wdepth(c ))
k
Instantiating p
Each leaf contains multiple classes (organised in a
hierarchy)
Which classes to predict?
binary classification: predict positive if the instance ends
up in a leaf with at least 50% positives
multilabel classification: skewed class distributions
Threshold
an instance ending up in some leaf is predicted to belong
to class ci if vi ti, with vi the proportion of instances in
the leaf belonging to ci, and ti some threshold
by varying threshold, we obtain different points on the
precision-recall curve
Clus-HMC algorithm
Pseudo code
stopping
criterion
Experiments in yeast functional
genomics
Saccharomyces cerevisiae or
MIPS FunCat hierarchy
baker’s/brewer’s yeast
function of yeast genes
12 data sets
1 METABOLISM
[Clare 2003]
amino acid metabolism
structure1/1
(seq)
1/2 nitrogen and sulfur metabolisms
Sequence
Phenotype growth (pheno)
… (struc)
Secondary structure
2 ENERGY
Homology search (hom)
Microarray data 2/1 glycolysis and gluconeogenesis
cellcycle, church, derisi,
eisen, gasch1, gasch2, spo,
…
expr (all)
Experimental evaluation
Objectives
Comparison with C4.5H [Clare 2003]
Evaluation of the improvement
obtainable with HMC trees over single
classification trees
Evaluation with precision-recall curves
precision = TP / Yes = TP / (TP+FP)
= TP / + = TP / (TP+FN)
recall
advantages
Comparison with C4.5H
C4.5H = hierarchical multilabel extension of
C4.5 [Clare 2003]
Designed by Amanda Clare
Heuristic: information gain
adaptation of entropy (sum of all classes)
Prediction: most frequent set of classes +
significance test
Clus-HMC method
Tuning: different F-tests on validation data,
choose F-test with highest AUPRC
Comparison between Clus-HMC
and C4.5H
Average case
Comparison between Clus-HMC
and C4.5H
Specific classes
I
II
IV
III
25 wins (II), 6 losses (IV)
Comparing rules
e.g. predictions for class 40/3 in
“gasch1” data set
C4.5H: two rules
IF
29C_Plus1M_sorbitol_to_33C_Plus_1M_sorbitol_
__15_minutes <= 0.03 AND
constant_0point32_mM_H202_20_min_redo <=
0.72 AND
Precision:
0.52
1point5_mM_diamide_60_min
Recall:steady_state_1M_sorbitol
0.26
<= -0.17 AND
> -0.37 AND
DBYmsn2_4__37degree_heat___20_min <= -0.67
Clus-HMC
(most precise rule)
Heat_Shock_030inutes__hs_2 <=
-0.48 AND
29C_Plus1M_sorbitol_to_33C_Pl
us_1M_sorbitol___5_minutes >
-0.1
Precision: 0.56
THEN 40/3
Recall: 0.18
THEN 40/3
IF Heat_Shock_10_minutes_hs_1
<= 1.82 AND
IF Nitrogen_Depletion_8_h <= -2.74 AND
Nitrogen_Depletion_2_h > -1.94 AND
Precision: 0.97
1point5_mM_diamide_5_min > -0.03 AND
1M_sorbitol___45_min_ > -0.36 AND
Recall: 0.15
37C_to_25C_shock___60_min > 1.28
THEN 40/3
HMC vs. single classification
Method
Average case
HMC vs. single classification
Specific classes
numbers are AUPRC(Clus-HMC) – AUPRC(Clus-SC)
HMC performs better!
Conclusions
Use of precision-recall curves to
optimize the learned models and to
evaluate the results
Improvement over C4.5H
HMC compared to SC
Comparable predictive performance
Faster
Easier to interpret
References
Hendrik Blockeel, Luc De Raedt, Jan
Ramon, Top-down induction of clustering
trees (1998)
Amanda Clare, Machine learning and data
mining for yeast functional genomics,
Doctoral dissertation (2003)
Jan Struyf, Sašo Džeroski, Hendrik
Blockeel, Amanda Clare, Hierarchical multiclassification with predictive clustering
trees in functional genomics (2005)
Questions?
© Copyright 2026 Paperzz