Features for Duplicate and Similar Video

Video Fingerprinting: Features for Duplicate
and Similar Video Detection and Querybased Video Retrieval
Anindya Sarkar, Pratim Ghosh,
Emily Moxley and B. S. Manjunath
Presented by:
Anindya Sarkar
Vision Research Lab,
Department of Electrical & Computer Engg,
University of California, Santa Barbara
Januray 30, 2008
Problem Definition:
• Duplicate video and similar video detection
– we represent a video compactly (fingerprint), for
efficient storage and faster search without
compromising the retrieval accuracy
• Query-based video retrieval
– Input: short length (1-2% of big video length) query
video
– Output: actual “big” video from which the query is
taken
July 31, 2017
Generation of Duplicate Videos
• Dataset: BBC rushes dataset, provided for the TRECVID2007 task of video summarization
• Operations performed:
– Image processing (per frame) based:
•
•
•
•
Blurring using 3x3 and 5x5 window
Gamma correction by 20% and -20%
Gaussian noise addition at SNR of -20,0,10,20,30 and 40 dB
JPEG compression at QF=10,30,50,70 and 90
– Frame drop based errors:
• frame drops of 20%, 40% and 60% of the original video for both
random and bursty case.
July 31, 2017
Interpretation of Similar videos
• Different takes of the same scene are considered as
“similar” videos
• These videos are similar in content
– However, due to human variability at both the cameraman
and actor level, (camera angles, cuts, and actor
performance), videos may look similar but are still different
• BBC rushes dataset has unedited footage of the
different retakes – hence, ideally suited for generation
of similar videos
July 31, 2017
Keyframe based Video Fingerprint
N frames in the actual video
K key-frames
Kxd
Video
Summarization
and key-frame
extraction
Features used for fingerprint creation:
1. Compact Fourier Mellin Transform
2. Scale Invariant Feature Transform
July 31, 2017
d-dimensional
signature
computed
per key-frame
Video
Fingerprint
Log-Polar Transformation
Any 2-D Matrix
m,n=0
R
origin
∆θ
M-1
x=em∆rcos(n∆θ)
y=em∆rsin(n∆θ)
(m,n)
∆r
(x,y)
N-1
First fix the value of M,N
R is the maximum radius of in-circle
July 31, 2017
∆r= log(R)/M, ∆θ=2π/N
M is the no of concentric
circles .
N is the no. of diverging
radial lines .
CFMT FEATURE EXTRACTION
-(K-1)
m, n=0
M-1
K-1
-(V-1)
|FFT|
50% A.C.
Energy
V-1
N-1
Normalization
&
vectorization
PCA
Quantization
July 31, 2017
SIFT Feature
• Generally used for object recognition – hence, can be
used as an image similarity measure
• Distance between SIFT features – number of
descriptor comparisons makes it computationally
prohibitive
• Speed up – quantize descriptors to a finite vocabulary
(consisting of words)
– Each image is a weighted vector of the word frequencies
July 31, 2017
Straight vocabulary – created
by clustering – e.g. 12
dimensional feature needs 12
clusters
image
descriptors
words
Vocabulary tree: created
using hierarchical k-means
on SIFT features;
more general
words
M=1
M=3
final vocabulary
size=3+9=12
Each feature belongs to
one “word” at each level
July 31, 2017
most specific words
M=9
Straight Vocabulary vs Vocabulary Tree
• Straight vocabulary:
– Does not consider relationship between words
• That is, ignores that certain words are closer to each other than other
words.
– At very coarse level (dictionary size ~10-20), additional
words are more descriptive than the relationship among
words. Therefore, outperforms vocabulary tree.
• In our experiments, low-dimensional SIFT features,
obtained using straight vocabulary, perform much
better as “fingerprints” than tree-based features
July 31, 2017
Non-keyframe based Video Fingerprint
Features used for fingerprint creation:
YCbCr histogram based feature
P=N/K frames, where each window has P frames
P frames
Video Fingerprint Extraction
K x 125
P frames
for each of K windows
N frames
July 31, 2017
Computing the 125-dim YCbCr Histogram
in YCbCr Space using P Consecutive
Frames and thus avoiding Key Frames
Extraction. Whole color space is quantized
into 125 bins (5 bins for each of Y, Cb and Cr).
Video
Fingerprint
Signature Distance Computation
• For two (K x d) fingerprints,
X and Y,
½
¾
XK
d(X ; Y ) =
min jjX (i ) ¡ Y (j )jj
i= 1
1· j · K
1
where X(i) = ith feature vector of X
• Properties of this distance function:
d(X ; Y ) = 0; is possible even if X 6
= Y
d(X ; Y ) 6
= d(Y; X )
• Such a distance relation is called a “quasi-distance”
July 31, 2017
Motivation Behind Distance Function
This closest-overlap based distance is robust to:
Frame reordering:
For 2 signatures, temporal sequence may not be
maintained between them – e.g. a video consisting of a
reordering of scenes from the same video is still regarded as
a duplicate
Frame drops:
If frame drops occur or some video frames are
corrupted by noise, distance between duplicate videos
should still be small
July 31, 2017
Experiments and Results
• We present precision-recall plots for both similarity
and duplicate detection, over 3888 videos
–
–
–
–
CFMT for dimensions 36/24/20/12/4
SIFT for dimensions 781/341/33/21/12
CFMT vs best performing SIFT for duplicate detection
SIFT vs best performing CFMT for similarity detection
• CFMT performs better for duplicate detection
• SIFT performs better for similarity detection
July 31, 2017
CFMT Signature Exact Retrieval on 3888 videos (Bursty Error)
1
0.9
0.8
Precision-recall curves for
different dimensional CFMT for
duplicate detection
PRECISION
0.7
0.6
0.5
0.4
36
24
20
12
4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RECALL
CFMT Signature Similar Retrieval on 3888 videos (Bursty Error)
1
0.9
0.8
Precision-recall curves for
different dimensional CFMT for
similarity detection
PRECISION
0.7
0.6
0.5
36
24
20
12
4
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
RECALL
July 31, 2017
0.6
0.7
0.8
0.9
1
Exact Retrieval on 3888 videos (Bursty Error) for SIFT dim 11111 to 12
1
0.9
0.8
Precision-recall curves for
different dimensional SIFT for
duplicate detection
PRECISION
0.7
0.6
0.5
11111
781
341
33
31
21
12
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RECALL
Similar Retrieval on 3888 videos (Bursty Error) for SIFT dimensions 11111 to 12
1
0.9
0.8
Precision-recall curves for
different dimensional SIFT for
similarity detection
PRECISION
0.7
0.6
0.5
11111
781
33
31
21
12
0.4
0.3
0.2
0.1
July 31, 2017
0
0.1
0.2
0.3
0.4
0.5
RECALL
0.6
0.7
0.8
0.9
1
Exact Retrieval on 3888 videos (Bursty Error) - comparing 3 features
1
0.9
0.8
Precision-recall curves
comparing different descriptors
for duplicate detection
PRECISION
0.7
0.6
CFMT-36
SIFT-11111
YCbCr-125
CLD-18
CFMT-24
CFMT-20
CFMT-12
SIFT-31
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
RECALL
0.9
1
Similar Retrieval on 3888 videos (Bursty Error) - comparing 3 features
1
0.9
0.8
Precision-recall curves,
comparing different descriptors
for similarity detection
PRECISION
0.7
0.6
SIFT-33
CFMT-36
YCbCr-125
0.5
0.4
0.3
0.2
0.1
0
July 31, 2017
0.1
0.2
0.3
0.4
0.5
0.6
RECALL
0.7
0.8
0.9
1
Full-length Video Retrieval with Clip Querying
• Generation of the small-length query:
– We put together 4 different scenes from a full length video to
create our input query:
– Each individual scene is represented by 8 keyframes
– For a single query, we have 4x8=32 keyframes
– We experiment with different features for query
representation
• Repository is of full-length video signature (65 videos):
– Number of keyframes used to create the signature size for
“large video” is varied from 1%-4% of video length
July 31, 2017
Algorithm
• Step 1: Input query signature Xquery is a (32 x d) matrix
• Step 2: Its distance from all the stored “large video”
signatures (Xlarge) is computed, as shown
below:
¢ (i ) = min jjX qu er y (i ) ¡ X l ar ge (j )jj ; 1 · i · 32
1
j
(1)
X32
D(X qu er y ; X l ar ge ) =
¢ (i )=32
i= 1
• Step 3: The best matched video is returned
July 31, 2017
(2)
Retrieval results for 1% summary lengths for “large” videos
Video
name
CFMT36
CFMT20
CFMT- YCbCr- SIFT SIFT SIFT12
125
-781 -31
21
Query 1
1.00
1.01
1.00
7.92
1.01
3.83
13.26
Query 2
1.00
1.01
1.00
1.60
1.00
2.67
1.49
Query 3
1.03
1.36
1.03
1.71
1.00
1.00
2.15
Query 4
1.00
1.00
1.00
1.92
1.00
1.00
1.00
Retrieval results for 4% summary lengths for “large” videos
Video
name
CFMT36
CFMT20
CFMT- YCbCr- SIFT SIFT SIFT12
125
-781 -31
21
Query 1
1.00
1.09
1.23
1.78
1.00
2.52
3.94
Query 2
1.00
1.00
1.00
2.11
1.00
1.00
1.45
Query 3
1.00
1.21
1.59
4.70
1.00
1.41
8.44
Query 4
1.00
1.00
1.47
1.99
1.00
1.00
1.00
July 31, 2017
Conclusions
• CFMT features provide quick/accurate retrieval for
duplicate videos
• SIFT features perform better for similar video
detection
• Future work
– expanding the domain of “similar” videos (non-retakes yet
still similar ?)
– Importance of an efficient summary to create video signature
(strategic keyframes vs random keyframes ?)
July 31, 2017
Thanks for your patience.
Questions?
July 31, 2017