Seed, Expand and Constrain: Three Principles
for Weakly-Supervised Image Segmentation
Alexander Kolesnikov and Christoph H. Lampert
IST Austria
Introduction and Main Contributions
Experimental results
{Plant, Chair}
Training Data
Main contributions:
• A new loss function for weakly-supervised semantic image segmentation.
• State-of-the-art performance on PASCAL VOC 2012 segmentation challenge.
Three loss terms: Seed, Expand and Constrain
background
aeroplane
bike
bird
boat
bottle
bus
car
cat
chair
cow
≈71
24.2
19.9
26.3
18.6
38.1
51.7
42.9
48.2
15.6
37.2
Image
Ground Truth
74.7
38.8
19.8
27.5
21.7
32.8
40.0
50.1
47.1
7.2
44.8
≈74
33.1
21.7
27.7
17.7
38.4
55.8
38.3
57.9
13.6
37.4
diningtable
dog
horse
motorbike
person
plant
sheep
sofa
train
tv/monitor
average
Prediction
18.3
43.0
38.2
52.2
40.0
33.8
36.0
21.6
33.4
38.3
35.6
Ground Truth
15.8
49.4
47.3
36.6
36.4
24.3
44.5
21.0
31.5
41.3
35.8
29.2
43.9
39.1
52.4
44.4
30.2
48.7
26.4
31.8
36.3
38.0
Image
SEC (proposed)
83.5
56.4
28.5
64.1
23.6
46.5
70.6
58.5
71.3
23.2
54.0
MIL+ILP
+SP-sppxl† [4]
Region score
pooling [2]
PASCAL
VOC 2012
test set
CCNN [3]
SEC (proposed)
{Person, Bike} {Aeroplane}
CNN
PASCAL
VOC 2012
test set
MIL+ILP
+SP-sppxl† [4]
Region score
pooling [2]
Weakly-Supervised
Learning
CCNN [3]
SEC significantly outperforms previously proposed techniques:
28.0
68.1
62.1
70.0
55.0
38.4
58.0
39.9
38.4
48.3
51.7
Prediction
D = {Xi, T i}N
i=1 – training data, where any X is an image and T is a set of its labels.
C – a set of all semantic categories.
fu,c(X; θ) – probability score of class c at location u as predicted by deep convolutional
neural network (CNN), f, parametrized by θ.
Person
Cow
Weak Localization
Person
Seeding Loss
Cow
Person
Cow
Backgr.
Person
Downscale
SEC objective: min
Cow
Global
Weighted
Rank-Pooling
Segmentation
CNN
Person
Cow
Expansion Loss
Constrain-toboundary Loss
CRF
[Lseed(f(X; θ), T )+Lexpand(f(X; θ), T )+Lconstrain(X, f(X; θ))].
θ (X,T )∈D
1 Seeding principle
• Classification
• Seeding
networks may be used to provide object localization cues.
loss encourages the network to be consistent with localization cues:
1 log fu,c(X), where
Lseed(f(X), T, Sc) = − |Sc| c∈T u∈S
c∈T
Examples of successfully predicted segmentation masks.
Ground Truth
Image
Prediction
Ground Truth
Image
Prediction
c
Sc – set of locations, which were identified as a class c ∈ C by the weak
localization procedure.
2 Expansion principle
• Expansion
loss incorporates a prior knowledge about object sizes.
characteristic size of any class c is controlled by a decay parameter dc.
• We use decay d+ for all classes, which present in the image, and decay d− for all
classes, which are absent.
• The
Ic = {i1, . . . , in} defines descending order for class scores: fi1,c(x) ≥ · · · ≥ fin,c(x)
n
n
1 (dc)j−1fij,c(X), where Z(dc) =
(dc)j−1
Gc(f(X); dc) =
Z(dc) j=1
j=1
Lexpand(f(X), T ) =−
Examples of typical failures. The recent paper [1] tackles some of them.
Ablation study
Ground Truth
Image
Lexpand
Lseed
Lseed +Lexpand Lseed +Lconstr.
Full Loss
loss
function
Lexpand
Lseed
Lseed + Lexpand
Lseed + Lconstrain
all terms
1 1 log Gc(f(X);d+)−
log(1 − Gc(f(X); d−))
|T | c∈T
|C\T | c∈C\T
3 Constrain-to-boundary principle
• Constrain-to-boundary
loss penalizes the CNN for producing non-smooth
segmentation masks:
Effect of various loss terms.
n 1
Qu,c(X, f(X))
Qu,c(X, f(X)) log
, where
Lconstrain(X, f(X)) = KL(Q||f) =
n u=1 c∈C
fu,c(X)
Qu,c(X, f(X)) – output of the fully-connected CRF with unaries given by fu,c(X).
A. Kolesnikov and C. H. Lampert. Improving weakly-supervised object localization by micro-annotation. BMVC, 2016.
J. Krapac and S. Šegvic. Weakly-supervised semantic segmentation by redistributing region scores to pixels. GCPR, 2016.
D. Pathak, P. Krähenbühl, and T. Darrell. Constrained convolutional neural networks for weakly supervised segmentation. In ICCV, 2015.
P. O. Pinheiro and R. Collobert. From image-level to pixel-level labeling with convolutional networks. In CVPR, 2015.
The code is publicly available: https://github.com/kolesman/SEC
European Conference on Computer Vision, 19-22 September 2016, Amsterdam, Netherlands
mIoU
(val)
20.6
45.4
44.3
50.4
50.7
© Copyright 2026 Paperzz