Video Fingerprinting: Features for Duplicate and Similar Video Detection and Querybased Video Retrieval Anindya Sarkar, Pratim Ghosh, Emily Moxley and B. S. Manjunath Presented by: Anindya Sarkar Vision Research Lab, Department of Electrical & Computer Engg, University of California, Santa Barbara Januray 30, 2008 Problem Definition: • Duplicate video and similar video detection – we represent a video compactly (fingerprint), for efficient storage and faster search without compromising the retrieval accuracy • Query-based video retrieval – Input: short length (1-2% of big video length) query video – Output: actual “big” video from which the query is taken July 31, 2017 Generation of Duplicate Videos • Dataset: BBC rushes dataset, provided for the TRECVID2007 task of video summarization • Operations performed: – Image processing (per frame) based: • • • • Blurring using 3x3 and 5x5 window Gamma correction by 20% and -20% Gaussian noise addition at SNR of -20,0,10,20,30 and 40 dB JPEG compression at QF=10,30,50,70 and 90 – Frame drop based errors: • frame drops of 20%, 40% and 60% of the original video for both random and bursty case. July 31, 2017 Interpretation of Similar videos • Different takes of the same scene are considered as “similar” videos • These videos are similar in content – However, due to human variability at both the cameraman and actor level, (camera angles, cuts, and actor performance), videos may look similar but are still different • BBC rushes dataset has unedited footage of the different retakes – hence, ideally suited for generation of similar videos July 31, 2017 Keyframe based Video Fingerprint N frames in the actual video K key-frames Kxd Video Summarization and key-frame extraction Features used for fingerprint creation: 1. Compact Fourier Mellin Transform 2. Scale Invariant Feature Transform July 31, 2017 d-dimensional signature computed per key-frame Video Fingerprint Log-Polar Transformation Any 2-D Matrix m,n=0 R origin ∆θ M-1 x=em∆rcos(n∆θ) y=em∆rsin(n∆θ) (m,n) ∆r (x,y) N-1 First fix the value of M,N R is the maximum radius of in-circle July 31, 2017 ∆r= log(R)/M, ∆θ=2π/N M is the no of concentric circles . N is the no. of diverging radial lines . CFMT FEATURE EXTRACTION -(K-1) m, n=0 M-1 K-1 -(V-1) |FFT| 50% A.C. Energy V-1 N-1 Normalization & vectorization PCA Quantization July 31, 2017 SIFT Feature • Generally used for object recognition – hence, can be used as an image similarity measure • Distance between SIFT features – number of descriptor comparisons makes it computationally prohibitive • Speed up – quantize descriptors to a finite vocabulary (consisting of words) – Each image is a weighted vector of the word frequencies July 31, 2017 Straight vocabulary – created by clustering – e.g. 12 dimensional feature needs 12 clusters image descriptors words Vocabulary tree: created using hierarchical k-means on SIFT features; more general words M=1 M=3 final vocabulary size=3+9=12 Each feature belongs to one “word” at each level July 31, 2017 most specific words M=9 Straight Vocabulary vs Vocabulary Tree • Straight vocabulary: – Does not consider relationship between words • That is, ignores that certain words are closer to each other than other words. – At very coarse level (dictionary size ~10-20), additional words are more descriptive than the relationship among words. Therefore, outperforms vocabulary tree. • In our experiments, low-dimensional SIFT features, obtained using straight vocabulary, perform much better as “fingerprints” than tree-based features July 31, 2017 Non-keyframe based Video Fingerprint Features used for fingerprint creation: YCbCr histogram based feature P=N/K frames, where each window has P frames P frames Video Fingerprint Extraction K x 125 P frames for each of K windows N frames July 31, 2017 Computing the 125-dim YCbCr Histogram in YCbCr Space using P Consecutive Frames and thus avoiding Key Frames Extraction. Whole color space is quantized into 125 bins (5 bins for each of Y, Cb and Cr). Video Fingerprint Signature Distance Computation • For two (K x d) fingerprints, X and Y, ½ ¾ XK d(X ; Y ) = min jjX (i ) ¡ Y (j )jj i= 1 1· j · K 1 where X(i) = ith feature vector of X • Properties of this distance function: d(X ; Y ) = 0; is possible even if X 6 = Y d(X ; Y ) 6 = d(Y; X ) • Such a distance relation is called a “quasi-distance” July 31, 2017 Motivation Behind Distance Function This closest-overlap based distance is robust to: Frame reordering: For 2 signatures, temporal sequence may not be maintained between them – e.g. a video consisting of a reordering of scenes from the same video is still regarded as a duplicate Frame drops: If frame drops occur or some video frames are corrupted by noise, distance between duplicate videos should still be small July 31, 2017 Experiments and Results • We present precision-recall plots for both similarity and duplicate detection, over 3888 videos – – – – CFMT for dimensions 36/24/20/12/4 SIFT for dimensions 781/341/33/21/12 CFMT vs best performing SIFT for duplicate detection SIFT vs best performing CFMT for similarity detection • CFMT performs better for duplicate detection • SIFT performs better for similarity detection July 31, 2017 CFMT Signature Exact Retrieval on 3888 videos (Bursty Error) 1 0.9 0.8 Precision-recall curves for different dimensional CFMT for duplicate detection PRECISION 0.7 0.6 0.5 0.4 36 24 20 12 4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RECALL CFMT Signature Similar Retrieval on 3888 videos (Bursty Error) 1 0.9 0.8 Precision-recall curves for different dimensional CFMT for similarity detection PRECISION 0.7 0.6 0.5 36 24 20 12 4 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 RECALL July 31, 2017 0.6 0.7 0.8 0.9 1 Exact Retrieval on 3888 videos (Bursty Error) for SIFT dim 11111 to 12 1 0.9 0.8 Precision-recall curves for different dimensional SIFT for duplicate detection PRECISION 0.7 0.6 0.5 11111 781 341 33 31 21 12 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RECALL Similar Retrieval on 3888 videos (Bursty Error) for SIFT dimensions 11111 to 12 1 0.9 0.8 Precision-recall curves for different dimensional SIFT for similarity detection PRECISION 0.7 0.6 0.5 11111 781 33 31 21 12 0.4 0.3 0.2 0.1 July 31, 2017 0 0.1 0.2 0.3 0.4 0.5 RECALL 0.6 0.7 0.8 0.9 1 Exact Retrieval on 3888 videos (Bursty Error) - comparing 3 features 1 0.9 0.8 Precision-recall curves comparing different descriptors for duplicate detection PRECISION 0.7 0.6 CFMT-36 SIFT-11111 YCbCr-125 CLD-18 CFMT-24 CFMT-20 CFMT-12 SIFT-31 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 RECALL 0.9 1 Similar Retrieval on 3888 videos (Bursty Error) - comparing 3 features 1 0.9 0.8 Precision-recall curves, comparing different descriptors for similarity detection PRECISION 0.7 0.6 SIFT-33 CFMT-36 YCbCr-125 0.5 0.4 0.3 0.2 0.1 0 July 31, 2017 0.1 0.2 0.3 0.4 0.5 0.6 RECALL 0.7 0.8 0.9 1 Full-length Video Retrieval with Clip Querying • Generation of the small-length query: – We put together 4 different scenes from a full length video to create our input query: – Each individual scene is represented by 8 keyframes – For a single query, we have 4x8=32 keyframes – We experiment with different features for query representation • Repository is of full-length video signature (65 videos): – Number of keyframes used to create the signature size for “large video” is varied from 1%-4% of video length July 31, 2017 Algorithm • Step 1: Input query signature Xquery is a (32 x d) matrix • Step 2: Its distance from all the stored “large video” signatures (Xlarge) is computed, as shown below: ¢ (i ) = min jjX qu er y (i ) ¡ X l ar ge (j )jj ; 1 · i · 32 1 j (1) X32 D(X qu er y ; X l ar ge ) = ¢ (i )=32 i= 1 • Step 3: The best matched video is returned July 31, 2017 (2) Retrieval results for 1% summary lengths for “large” videos Video name CFMT36 CFMT20 CFMT- YCbCr- SIFT SIFT SIFT12 125 -781 -31 21 Query 1 1.00 1.01 1.00 7.92 1.01 3.83 13.26 Query 2 1.00 1.01 1.00 1.60 1.00 2.67 1.49 Query 3 1.03 1.36 1.03 1.71 1.00 1.00 2.15 Query 4 1.00 1.00 1.00 1.92 1.00 1.00 1.00 Retrieval results for 4% summary lengths for “large” videos Video name CFMT36 CFMT20 CFMT- YCbCr- SIFT SIFT SIFT12 125 -781 -31 21 Query 1 1.00 1.09 1.23 1.78 1.00 2.52 3.94 Query 2 1.00 1.00 1.00 2.11 1.00 1.00 1.45 Query 3 1.00 1.21 1.59 4.70 1.00 1.41 8.44 Query 4 1.00 1.00 1.47 1.99 1.00 1.00 1.00 July 31, 2017 Conclusions • CFMT features provide quick/accurate retrieval for duplicate videos • SIFT features perform better for similar video detection • Future work – expanding the domain of “similar” videos (non-retakes yet still similar ?) – Importance of an efficient summary to create video signature (strategic keyframes vs random keyframes ?) July 31, 2017 Thanks for your patience. Questions? July 31, 2017
© Copyright 2025 Paperzz