Dia 1

Decision trees for hierarchical
multilabel classification
A case study in functional genomics
Work by
Hendrik Blockeel
 Leander Schietgat
 Jan Struyf

Katholieke Universiteit Leuven (Belgium)

Amanda Clare
University of Aberystwyth (Wales)

Sašo Džeroski
Jozef Stefan Institute Ljubljana (Slovenia)
Overview

Hierarchical Multilabel Classification


Predictive Clustering Trees for HMC


task description
the algorithm: Clus-HMC
Evaluation on yeast datasets
Hierarchical multilabel
classification (HMC)

Classification


HMC



instance can belong to multiple classes
classes are organised in a hierarchy
Example


predict class for unseen instances based on
(classified) training examples
toy hierarchy
Advantages



efficiency
skewed class distributions
hierarchical relationships
1
(1)
2/1
2
(3)
(2)
3
2/2
(4)
(5)
Predictive clustering trees

~ decision trees



each node (including leaves) is a cluster
tests in nodes are descriptions of clusters
Heuristic



[Blockeel et al. 1998]
minimize intra-cluster variance
maximise inter-cluster variance
Can be extended to perform HMC


distance measure d (quantifies similarity)
prediction function p (maps a cluster in a leaf
onto prediction)
Instantiating d

Class labels are represented in a
vector
Example
Si = {1,2,2/2},
{2}
1
(1)
2
(2)Sj =
3
(5)
 vi = [1,1,0,1,0]
(1) (2) (3) (4) (5)
dEucl([1,1,0,1,0],[0,1,0,0,0])
2/1 (3)= 2/2
(4)
sqrt(w
+ w²)

Distance between vectors is defined
as the component-wise Euclidean
distance:

d(x1,x2) = √∑k wk • (v1,k – v2,k)2 (wk = wdepth(c ))
k
Instantiating p

Each leaf contains multiple classes (organised in a
hierarchy)

Which classes to predict?



binary classification: predict positive if the instance ends
up in a leaf with at least 50% positives
multilabel classification: skewed class distributions
Threshold


an instance ending up in some leaf is predicted to belong
to class ci if vi  ti, with vi the proportion of instances in
the leaf belonging to ci, and ti some threshold
by varying threshold, we obtain different points on the
precision-recall curve
Clus-HMC algorithm

Pseudo code
stopping
criterion
Experiments in yeast functional
genomics

Saccharomyces cerevisiae or

MIPS FunCat hierarchy
baker’s/brewer’s yeast


function of yeast genes
12 data sets





1 METABOLISM
[Clare 2003]
amino acid metabolism
structure1/1
(seq)
1/2 nitrogen and sulfur metabolisms
Sequence
Phenotype growth (pheno)
… (struc)
Secondary structure
2 ENERGY
Homology search (hom)
Microarray data 2/1 glycolysis and gluconeogenesis

cellcycle, church, derisi,
eisen, gasch1, gasch2, spo,
…
expr (all)
Experimental evaluation

Objectives
Comparison with C4.5H [Clare 2003]
 Evaluation of the improvement
obtainable with HMC trees over single
classification trees


Evaluation with precision-recall curves
precision = TP / Yes = TP / (TP+FP)
= TP / + = TP / (TP+FN)
 recall
 advantages

Comparison with C4.5H

C4.5H = hierarchical multilabel extension of
C4.5 [Clare 2003]

Designed by Amanda Clare

Heuristic: information gain


adaptation of entropy (sum of all classes)
Prediction: most frequent set of classes +
significance test

Clus-HMC method

Tuning: different F-tests on validation data,
choose F-test with highest AUPRC
Comparison between Clus-HMC
and C4.5H

Average case
Comparison between Clus-HMC
and C4.5H

Specific classes
I
II
IV
III
25 wins (II), 6 losses (IV)
Comparing rules

e.g. predictions for class 40/3 in
“gasch1” data set

C4.5H: two rules
IF
29C_Plus1M_sorbitol_to_33C_Plus_1M_sorbitol_
__15_minutes <= 0.03 AND
constant_0point32_mM_H202_20_min_redo <=
0.72 AND
Precision:
0.52
1point5_mM_diamide_60_min
Recall:steady_state_1M_sorbitol
0.26
<= -0.17 AND
> -0.37 AND
DBYmsn2_4__37degree_heat___20_min <= -0.67
Clus-HMC
(most precise rule)
Heat_Shock_030inutes__hs_2 <=
-0.48 AND
29C_Plus1M_sorbitol_to_33C_Pl
us_1M_sorbitol___5_minutes >
-0.1
Precision: 0.56
THEN 40/3
Recall: 0.18
THEN 40/3

IF Heat_Shock_10_minutes_hs_1
<= 1.82 AND
IF Nitrogen_Depletion_8_h <= -2.74 AND
Nitrogen_Depletion_2_h > -1.94 AND
Precision: 0.97
1point5_mM_diamide_5_min > -0.03 AND
1M_sorbitol___45_min_ > -0.36 AND
Recall: 0.15
37C_to_25C_shock___60_min > 1.28
THEN 40/3
HMC vs. single classification
Method
 Average case

HMC vs. single classification

Specific classes

numbers are AUPRC(Clus-HMC) – AUPRC(Clus-SC)
HMC performs better!
Conclusions

Use of precision-recall curves to
optimize the learned models and to
evaluate the results

Improvement over C4.5H

HMC compared to SC
Comparable predictive performance
 Faster
 Easier to interpret

References



Hendrik Blockeel, Luc De Raedt, Jan
Ramon, Top-down induction of clustering
trees (1998)
Amanda Clare, Machine learning and data
mining for yeast functional genomics,
Doctoral dissertation (2003)
Jan Struyf, Sašo Džeroski, Hendrik
Blockeel, Amanda Clare, Hierarchical multiclassification with predictive clustering
trees in functional genomics (2005)
Questions?