EXPLOITING SEMANTIC INFORMATION AND DEEP MATCHING

E XPLOITING S EMANTIC I NFORMATION AND D EEP
M ATCHING FOR O PTICAL F LOW
M IN B AI∗ , W ENJIE L UO∗ , K AUSTAV K UNDU
AND R AQUEL
{mbai, wenjie, kkundu, urtasun}@cs.toronto.edu
U RTASUN
I NTRODUCTION
E PIPOLAR F LOW E STIMATION
• Overview: Estimating robust and accurate optical flow.
• We use instance segmentation to separate possibly moving cars from
background.
• We estimate flow for each object and background independently by exploiting epipolar constraints (i.e., each object moves rigidly in 3D).
• We use the 8-pt algorithm to estimate fundamental matrices with
matching network output.
• Input:
• We separate flow into linearized flow from relative rotation + flow from
relative translation which is directly toward or away from epipole of
second image. This is a 1D search problem for matches.
– Monocular temporal image pair
• Output:
• We calculate the 1D disparities with Semi-Global matching.
– Dense vector field linking each pixel in first image with corresponding pixel in second image
E(d) =
C (pk , dpk ) +
pk
• Our Contributions:
⎧
⎨ λ1
λ2
S(dpk , dpk ) =
⎩
0
– A novel optical flow technique for autonomous driving that exploits
instance level semantic information, assumes rigid body motion, and
enforces epipolar constraints for each car
– State of the art result on KITTI Flow 2015 benchmark
– A novel siamese deep neural network pipeline for corresponding
pixel matching and uncertainty estimation
pk ,pk ∈N
S(dpk , dpk )
if |dpk − dpk | = 1
if |dpk − dpk | > 1
otherwise
• We smooth the resulting output with EpicFlow for foreground objects.
We use linearized epipolar interpolation and slanted plane fitting for
the background.
D EEP M ATCHING
E XPERIMENTS
• Training: We train our matching CNN with a small left image patch
and a bigger right image strip. The right image strip is cropped vertically and horizontally.
• Quantitative results: KITTI Flow 2015 Test set results (error rate comparisons to other monocular approaches)
Method
HS
DeepFlow
EpicFlow
MotionSLIC
DiscreteFlow
SOF
Ours
• Cost Function: We apply softmax over all disparities within search
window. We use cross-entropy with reduced penalty for small offset
error.
min
w
N i=1 si
pGT
i (si ) log pi (si , w)
⎧
λ1
⎪
⎪
⎨
λ2
GT
pi (si ) =
⎪ λ3
⎪
⎩
0
if si = sGT
i
if |si − sGT
i |=1 .
if |si − sGT
i |=2
o.w.
• Inference: We apply the network to entire image with 400 × 200 search
window. We save only top-K matches to save memory.
• Confidence map: We perform cost aggregation on top-K volume. The
matching score provides a confidence measure. We use thresholding to
generate the initial flow image.
(First image)
(Init unary image)
(Confidence map)
Non occluded px
Fl-bg
Fl-fg
Fl-all
30.49 % 50.59 % 34.13 %
16.47 % 31.25 % 19.15 %
15.00 % 29.39 % 17.61 %
6.19 % 64.82 % 16.83 %
9.96 % 22.17 % 12.18 %
8.11 % 23.28 % 10.86 %
5.75 % 22.28 % 8.75 %
Fl-bg
39.90 %
27.96 %
25.81 %
14.86 %
21.53 %
14.63 %
8.61 %
All px
Fl-fg
53.59 %
35.28 %
33.56 %
66.21 %
26.68 %
27.73 %
26.69 %
Fl-all
42.18 %
29.18 %
27.10 %
23.40 %
22.38 %
16.81 %
11.62 %
• Qualitative results: within each group, top to bottom: one input image,
matching network output, external instance segmentations (x2), flow
output, error image.