Adaptive Segmentation Based on a Learned Quality Metric I. Frosio1, E. Ratner2 1 NVIDIA, USA, 2 Lyrical Labs, USA Motivation: good / bad segmentation SLIC (Achanta, 2012) 2 Motivation: good / bad segmentation GRAPH-CUT (Felzenszwalb, 2004) 3 Motivation: good / bad segmentation ADAPTIVE GRAPH-CUT (our) 4 Motivation: good / bad segmentation > SLIC (Achanta, 2012) > GRAPH-CUT (Felzenszwalb, 2004) ADAPTIVE GRAPH-CUT (our) 5 Motivation: good / bad segmentation Achanta, 2012 (SLIC); Kaufhold, 2004: segmentation algorithms aggregate sets of perceptually similar pixels in an image. Felzenszwalb, 2004 (graph-cut): a segmentation algorithm should capture perceptually important groupings or regions, which often reflect global aspects of the image. 6 Motivation: segmentation & video compression Frame segmentation Segment motion estimation Encoding True block and sub-block motion vectors 7 Aim #1: use the human factor (aka segmentation quality metric) 8 Aim #2: automatic parameter tuning 9 Road map 3) … And put them together (autotuning). 2) … Learn a quality metric including the human factor (application needs) … 1) Pick a segmentation algorithm… 10 Graph-cut vj w(vi, vj)>>0 Graph: w(vi, vj)>0 Nodes: vi Edges: Weights: w(vi, vj)=0 11 Graph-cut Cm Internal difference: 12 Graph-cut Cm Difference between components: Cn 13 Graph-cut Ck Boundary predicate: 10 15 12 Cn 14 Graph-cut C1 Boundary predicate: 15 8 11 C2 15 Graph-cut C1 Boundary predicate: Observation scale ~ k C2 16 K = 100 K = 10,000 K=3 Graph-cut 17 Road map 3) … And put them together (autotuning). 2) … Learn a quality metric including the human factor… 1) Pick a segmentation algorithm… 18 (Weighted) symmetric uncertainty 4 bits ------------------ = 33% 7 bits + 5 bits Entropy based average 19 k vs. Uw vs. quality 160 x 120 image block 20 k vs. Uw vs. quality Training 160 x 120 blocks 320x240 rgb images K = [1, …, 10,000] visual inspection & classification 21 k vs. Uw vs. quality Training 160 x 120 blocks 640x480 rgb images K = [1, …, 10,000] visual inspection & classification 22 Learning the metric Uw = m log(k) + b 23 Road map 3) … And put them together (autotuning). 2) … Learn a quality metric including the human factor… 1) Pick a segmentation algorithm… 24 Automatic k selection 25 Automatic k selection 26 Automatic k selection 27 Automatic k selection 28 Automatic k selection 29 … and adaptivity k = k(x,y) 30 Road map 31 Results - Quality Adaptive graph-cut (ours) Graph-cut (Felzensswalb, 2004) * SLIC (Achanta, 2012) * * Same number of segments forced for each algorithm 32 Results 33 Results SLIC Graph-cut Adaptive graph-cut 34 Results 35 Results SLIC Graph-cut Adaptive graph-cut 36 Results: inter-class contrast (the higher the better) Sum of the contrasts among segments weighted by their areas (Chabrier, 2004) 320x240 640x480 0.2 0.18 0.18 0.16 0.16 0.14 0.14 0.12 0.12 0.1 0.1 SLIC Graph-cut Adaptive graphcut Inter class contrast 0.08 average 0.08 average 0.06 median 0.06 median 0.04 0.04 0.02 0.02 0 0 SLIC Graph-cut Adaptive graphcut Inter class contrast 37 Results: intra-class uniformity (the lower the better) Sum of the normalized standard deviation for each region (Chabrier, 2004) 320x240 640x480 14 45 40 12 35 10 30 8 6 4 25 average 20 average median 15 median 10 2 5 0 SLIC Graph-cut Adaptive graphcut Intra class uniformity 0 SLIC Graph-cut Adaptive graphcut Intra class uniformity 38 Results: contrast - uniformity ratio (the higher the better) 320x240 640x480 35 14 30 12 25 10 20 8 15 average 6 median SLIC Graph-cut 1000 * Inter / Intra Adaptive graphcut average median 10 4 5 2 0 0 SLIC Graph-cut Adaptive graphcut 1000 * Inter / Intra 39 Discussion LEARNED segmentation quality metric including the HUMAN FACTOR Iterative method AUTOMATICALLY ADAPTIVELY compute optimal scale parameter to and the 40 A more general approach (edge thresholding segmentation in YUV) 41 A more general approach (edge thresholding segmentation in YUV) Openboradcast encoding (x264) Lyricallabs encoding (adaptive segmentation) Show 42 A more general approach (edge thresholding segmentation in YUV) Openboradcast encoding (x264) Lyricallabs encoding (adaptive segmentation) Show 43 Open issues & improvements Resolution dependency (160x120 blocks) Learning: the Berkeley Segmentation Dataset Avoid iterations (see I. Frosio, SPIE EI 2015) 44 Questions 45
© Copyright 2026 Paperzz