Features for Duplicate and Similar Video

Video Fingerprinting: Features for Duplicate
and Similar Video Detection and Querybased Video Retrieval
Anindya Sarkar, Pratim Ghosh,
Emily Moxley and B. S. Manjunath
Presented by:
Anindya Sarkar
Vision Research Lab,
Department of Electrical & Computer Engg,
University of California, Santa Barbara
Januray 30, 2008
Problem Definition:
• Duplicate video and similar video detection
– we represent a video compactly (fingerprint), for
efficient storage and faster search without
compromising the retrieval accuracy
• Query-based video retrieval
– Input: short length (1-2% of big video length) query
video
– Output: actual “big” video from which the query is
taken
July 31, 2017
Generation of Duplicate Videos
• Dataset: BBC rushes dataset, provided for the TRECVID2007 task of video summarization
• Operations performed:
– Image processing (per frame) based:
•
•
•
•
Blurring using 3x3 and 5x5 window
Gamma correction by 20% and -20%
Gaussian noise addition at SNR of -20,0,10,20,30 and 40 dB
JPEG compression at QF=10,30,50,70 and 90
– Frame drop based errors:
• frame drops of 20%, 40% and 60% of the original video for both
random and bursty case.
July 31, 2017
Interpretation of Similar videos
• Different takes of the same scene are considered as
“similar” videos
• These videos are similar in content
– However, due to human variability at both the cameraman
and actor level, (camera angles, cuts, and actor
performance), videos may look similar but are still different
• BBC rushes dataset has unedited footage of the
different retakes – hence, ideally suited for generation
of similar videos
July 31, 2017
Keyframe based Video Fingerprint
N frames in the actual video
K key-frames
Kxd
Video
Summarization
and key-frame
extraction
Features used for fingerprint creation:
1. Compact Fourier Mellin Transform
2. Scale Invariant Feature Transform
July 31, 2017
d-dimensional
signature
computed
per key-frame
Video
Fingerprint
Log-Polar Transformation
Any 2-D Matrix
m,n=0
R
origin
∆θ
M-1
x=em∆rcos(n∆θ)
y=em∆rsin(n∆θ)
(m,n)
∆r
(x,y)
N-1
First fix the value of M,N
R is the maximum radius of in-circle
July 31, 2017
∆r= log(R)/M, ∆θ=2π/N
M is the no of concentric
circles .
N is the no. of diverging
radial lines .
CFMT FEATURE EXTRACTION
-(K-1)
m, n=0
M-1
K-1
-(V-1)
|FFT|
50% A.C.
Energy
V-1
N-1
Normalization
&
vectorization
PCA
Quantization
July 31, 2017
SIFT Feature
• Generally used for object recognition – hence, can be
used as an image similarity measure
• Distance between SIFT features – number of
descriptor comparisons makes it computationally
prohibitive
• Speed up – quantize descriptors to a finite vocabulary
(consisting of words)
– Each image is a weighted vector of the word frequencies
July 31, 2017
Straight vocabulary – created
by clustering – e.g. 12
dimensional feature needs 12
clusters
image
descriptors
words
Vocabulary tree: created
using hierarchical k-means
on SIFT features;
more general
words
M=1
M=3
final vocabulary
size=3+9=12
Each feature belongs to
one “word” at each level
July 31, 2017
most specific words
M=9
Straight Vocabulary vs Vocabulary Tree
• Straight vocabulary:
– Does not consider relationship between words
• That is, ignores that certain words are closer to each other than other
words.
– At very coarse level (dictionary size ~10-20), additional
words are more descriptive than the relationship among
words. Therefore, outperforms vocabulary tree.
• In our experiments, low-dimensional SIFT features,
obtained using straight vocabulary, perform much
better as “fingerprints” than tree-based features
July 31, 2017
Non-keyframe based Video Fingerprint
Features used for fingerprint creation:
YCbCr histogram based feature
P=N/K frames, where each window has P frames
P frames
Video Fingerprint Extraction
K x 125
P frames
for each of K windows
N frames
July 31, 2017
Computing the 125-dim YCbCr Histogram
in YCbCr Space using P Consecutive
Frames and thus avoiding Key Frames
Extraction. Whole color space is quantized
into 125 bins (5 bins for each of Y, Cb and Cr).
Video
Fingerprint
Signature Distance Computation
• For two (K x d) fingerprints,
X and Y,
½
¾
XK
d(X ; Y ) =
min jjX (i ) ¡ Y (j )jj
i= 1
1· j · K
1
where X(i) = ith feature vector of X
• Properties of this distance function:
d(X ; Y ) = 0; is possible even if X 6
= Y
d(X ; Y ) 6
= d(Y; X )
• Such a distance relation is called a “quasi-distance”
July 31, 2017
Motivation Behind Distance Function
This closest-overlap based distance is robust to:
Frame reordering:
For 2 signatures, temporal sequence may not be
maintained between them – e.g. a video consisting of a
reordering of scenes from the same video is still regarded as
a duplicate
Frame drops:
If frame drops occur or some video frames are
corrupted by noise, distance between duplicate videos
should still be small
July 31, 2017
Experiments and Results
• We present precision-recall plots for both similarity
and duplicate detection, over 3888 videos
–
–
–
–
CFMT for dimensions 36/24/20/12/4
SIFT for dimensions 781/341/33/21/12
CFMT vs best performing SIFT for duplicate detection
SIFT vs best performing CFMT for similarity detection
• CFMT performs better for duplicate detection
• SIFT performs better for similarity detection
July 31, 2017
CFMT Signature Exact Retrieval on 3888 videos (Bursty Error)
1
0.9
0.8
Precision-recall curves for
different dimensional CFMT for
duplicate detection
PRECISION
0.7
0.6
0.5
0.4
36
24
20
12
4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RECALL
CFMT Signature Similar Retrieval on 3888 videos (Bursty Error)
1
0.9
0.8
Precision-recall curves for
different dimensional CFMT for
similarity detection
PRECISION
0.7
0.6
0.5
36
24
20
12
4
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
RECALL
July 31, 2017
0.6
0.7
0.8
0.9
1
Exact Retrieval on 3888 videos (Bursty Error) for SIFT dim 11111 to 12
1
0.9
0.8
Precision-recall curves for
different dimensional SIFT for
duplicate detection
PRECISION
0.7
0.6
0.5
11111
781
341
33
31
21
12
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RECALL
Similar Retrieval on 3888 videos (Bursty Error) for SIFT dimensions 11111 to 12
1
0.9
0.8
Precision-recall curves for
different dimensional SIFT for
similarity detection
PRECISION
0.7
0.6
0.5
11111
781
33
31
21
12
0.4
0.3
0.2
0.1
July 31, 2017
0
0.1
0.2
0.3
0.4
0.5
RECALL
0.6
0.7
0.8
0.9
1
Exact Retrieval on 3888 videos (Bursty Error) - comparing 3 features
1
0.9
0.8
Precision-recall curves
comparing different descriptors
for duplicate detection
PRECISION
0.7
0.6
CFMT-36
SIFT-11111
YCbCr-125
CLD-18
CFMT-24
CFMT-20
CFMT-12
SIFT-31
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
RECALL
0.9
1
Similar Retrieval on 3888 videos (Bursty Error) - comparing 3 features
1
0.9
0.8
Precision-recall curves,
comparing different descriptors
for similarity detection
PRECISION
0.7
0.6
SIFT-33
CFMT-36
YCbCr-125
0.5
0.4
0.3
0.2
0.1
0
July 31, 2017
0.1
0.2
0.3
0.4
0.5
0.6
RECALL
0.7
0.8
0.9
1
Full-length Video Retrieval with Clip Querying
• Generation of the small-length query:
– We put together 4 different scenes from a full length video to
create our input query:
– Each individual scene is represented by 8 keyframes
– For a single query, we have 4x8=32 keyframes
– We experiment with different features for query
representation
• Repository is of full-length video signature (65 videos):
– Number of keyframes used to create the signature size for
“large video” is varied from 1%-4% of video length
July 31, 2017
Algorithm
• Step 1: Input query signature Xquery is a (32 x d) matrix
• Step 2: Its distance from all the stored “large video”
signatures (Xlarge) is computed, as shown
below:
¢ (i ) = min jjX qu er y (i ) ¡ X l ar ge (j )jj ; 1 · i · 32
1
j
(1)
X32
D(X qu er y ; X l ar ge ) =
¢ (i )=32
i= 1
• Step 3: The best matched video is returned
July 31, 2017
(2)
Retrieval results for 1% summary lengths for “large” videos
Video
name
CFMT36
CFMT20
CFMT- YCbCr- SIFT SIFT SIFT12
125
-781 -31
21
Query 1
1.00
1.01
1.00
7.92
1.01
3.83
13.26
Query 2
1.00
1.01
1.00
1.60
1.00
2.67
1.49
Query 3
1.03
1.36
1.03
1.71
1.00
1.00
2.15
Query 4
1.00
1.00
1.00
1.92
1.00
1.00
1.00
Retrieval results for 4% summary lengths for “large” videos
Video
name
CFMT36
CFMT20
CFMT- YCbCr- SIFT SIFT SIFT12
125
-781 -31
21
Query 1
1.00
1.09
1.23
1.78
1.00
2.52
3.94
Query 2
1.00
1.00
1.00
2.11
1.00
1.00
1.45
Query 3
1.00
1.21
1.59
4.70
1.00
1.41
8.44
Query 4
1.00
1.00
1.47
1.99
1.00
1.00
1.00
July 31, 2017
Conclusions
• CFMT features provide quick/accurate retrieval for
duplicate videos
• SIFT features perform better for similar video
detection
• Future work
– expanding the domain of “similar” videos (non-retakes yet
still similar ?)
– Importance of an efficient summary to create video signature
(strategic keyframes vs random keyframes ?)
July 31, 2017
Thanks for your patience.
Questions?
July 31, 2017

Download Report

Features for Duplicate and Similar Video

Paperzz.com

Your Paperzz