Supplementary Material – Appearances can be deceiving: Learning

– Supplementary Material –
Appearances can be deceiving: Learning visual
tracking from few trajectory annotations
Santiago Manen1 , Junseok Kwon1 , Matthieu Guillaumin1 , and Luc Van Gool1,2
1
Computer Vision Laboratory, ETH Zurich
2
ESAT - PSI / IBBT, K.U. Leuven
{smanenfr, kwonj, guillaumin, vangool}@vision.ee.ethz.ch
In this supplementary material, we present additional details for some aspects
of our work. We detail in Sec. 1 the properties of the optimization problem that
we solve for learning how to track, and which has lead to the algorithm proposed
in the paper. Then we show in Sec. 2 all the plots on all the datasets.
1
Deceptiveness Learning Properties
In this section, we discuss the properties of the optimization problem in eq. (5)
in the main paper (Sect. 4.1):
minimize
e
ρ
T
X
ρek
k=1
subject to ∀k ∈ [1, T ],
0 ≤ ρek ≤ 1,
e ≤ θ,
trackingError(ρ)
At first, this optimization might look very general, however a closer look
reveals some interesting properties:
– For all θ ≥ 0, there is always a solution to eq. (5): ρe = 1. That is, the
tracker completely relies on the motion model, transferring exactly the only
trajectory available which is the ground-truth trajectory of the object being
tracked. That is: trackingError = 0.
– trackingError depends on the full trajectory, so in principle we need the full
track to compute it. However, at each frame k we can compute a lower bound
on the final trackingError by looking at the error up to frame k:
trackingErrork =
k
X
ξ i /T ,
(1)
i=1
where ξ i is the center location distance between the window at frame i
and its corresponding ground truth. Since trackingErrork is monotonically
increasing, we can stop tracking as soon as trackingErrork reaches θ (we say
that the tracking has failed at step k).
2
S. Manen, J. Kwon, M. Guillaumin, and L. Van Gool
– Failure at step k (and more generally, trackingErrork ) can only be recovered
(resp. decreased) by increasing ρe at a frame k 0 < k. That is, there is an
earlier deceptive region not yet accounted for.
– When ρe is modified at frame k, the track after frame k is altered. Any
previous knowledge about the value of ρe after frame k should be discarded.
These properties have lead us to the algorithm proposed in Sec. 4.1 of the
main paper (paragraph Optimization process).
2
Complementary Plots
In the next page, we show plots that complement to the ones shown in the main
paper (Sec. 6). Fig. 1 refers to the annotation selection approach, Fig. 2 to the
conditioning on an event model and Fig. 3 on the learning of deceptiveness. They
show the performance of our method compared to competitors or baselines for
both datasets and both metrics, whereas the main paper showed only a selection
of those.
Hospedales3
Hospedales1
50
Random annots
Incremental AP (ours)
[9]
Center Location Error
80
40
60
30
40
20
20
0
0
10
20
30
40
50
10
0
20
0
20
40
60
80
100
80
100
0.7
0.6
0.6
Success Plot (auc)
0.55
0.5
0.5
0.4
0.45
0.3
0.4
0
10
20
30
#Annotations
40
50
0.2
40
60
#Annotations
Fig. 1: Annotation selection on both datasets and both metrics (c.f. Fig. 7a).
Supplementary material
Center Location Error
Hospedales1
Success Plot (auc)
Hospedales3
80
80
60
60
40
40
20
20
0
0
10
20
30
40
50
0
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0
10
20
30
#Annotations
3
40
50
0.2
Fixed ρ 0.2 - One event
Fixed ρ 0.2 - All events
Learned ρ - One event
Learned ρ - All events
[9]
0
20
0
20
40
60
40
60
#Annotations
80
100
80
100
Fig. 2: Event modelling for both datasets and metrics (c.f. Fig. 8b).
4
S. Manen, J. Kwon, M. Guillaumin, and L. Van Gool
Center Location Error
Hospedales1
50
40
40
30
30
20
20
10
10
0
Success Plot (auc)
Hospedales3
50
0
10
20
30
40
50
0
0.6
0.5
0.55
0.45
0.5
0.4
0.45
0
10
20
30
#Annotations
40
50
0.35
Fixed ρ 0.2
Learned ρe
Learned ρ (ours)
[9]
0
20
0
20
40
60
40
60
#Annotations
80
100
80
100
Fig. 3: Learning deceptiveness for both datasets and metrics (c.f. Fig. 7b).