Exploratory learning

Hierarchical Semi-supervised Classification
with Incomplete Class Hierarchies
Bhavana Dalvi, Aditya Mishra, William W. Cohen
Semi-supervised Entity Classification
Everything
Food
Animals
Mammals
2
Reptiles
Fruits
Vegetables
Semi-supervised Entity Classification
Everything
Food
Animals
Subset
Mammals
Reptiles
Disjoint
3
Fruits
Vegetables
Semi-supervised Entity Classification
Everything
Food
Animals
Mammals
4
Reptiles
Fruits
Vegetables
Prior work
Everything
• Coupled Semi-Supervised Learning for Information
Extraction,
Carlson et al.WSDM 2010Food
Animals
• Automatic Gloss Finding for a Knowledge Base using
Ontological Constraints, Dalvi et al.WSDM 2015
Mammals
5
Reptiles
Fruits
Vegetables
Challenge: Incomplete Class Hierarchies
Everything
Food
Animals
Mammals
6
Reptiles
Fruits
Vegetables
Challenge: Incomplete Class Hierarchies
Everything
Animals
Mammals
7
C9
Food
Reptiles
Fruits
Location
Vegetables
C8
Beverages
Challenge: Incomplete Class Hierarchies
Everything
C9
GOAL:
Do semi-supervised classificationLocation
Animals
and ontology extension
in a single unified framework.
Food
Mammals
8
Reptiles
Fruits
Vegetables
C8
Beverages
Optimization Problem
Maximize { Log Data Likelihood – Model Penalty }
m: #clusters,
Params{C1… Cm}
subject to,
Class constraints: Zm
Expectation Maximization
9
Optimized Divide-And-Conquer Strategy
 Class constraints: Mixed Integer Linear program
 Missing classes: Soft Divide-And-Conquer method
10
Class constraints: Mixed Integer Linear program
Max {likelihood of assignment – constraint violation penalty}
11
Class constraints: Mixed Integer Linear program
Max {likelihood of assignment – constraint violation penalty}
Score of label
assignment
Subset constraint
Disjoint Constraint
12
Subset constraint
Penalty
Disjoint constraint
Penalty
Missing classes: Soft Divide-And-Conquer
1
3
7
8
4
9
10
Near uniform?
13
11
Missing classes: Soft Divide-And-Conquer
1
3
7
8
4
9
10
Near uniform?
𝑪𝒏𝒆𝒘
14
11
Results: 10% improvement F1 scores
Flat Explore EM
Macro avg. seed class F1
75
55
45
35
25
Level =
15
OptDAC ExploreEM
65
2
3
4
Results: Ontology Extension
16
Datasets are made public
Four hierarchical entity classification datasets
are made publicly available at
http://rtw.ml.cmu.edu/wk/WebSets/hierarchical_
ExploratoryLearning_WSDM2016/index.html
17
Thank You
[email protected]
18