Vision friendly video coding towards new standard H.265/HPVC Manoranjan Paul Research Fellow School of Computer Engineering, Nanyang Technological University Faculty Member, Charles Sturt University, Australia (from January 2011) 25th December 2010 at AUST, Bangladesh http://sites.google.com/site/manoranjanpaulpersonalweb/ © 2010 Manoranjan Paul. All Rights Reserved. Outline • • • • • • • • Personal Information in Brief Video Compression and Video Coding Video/Image Quality Assessment Computer Vision & Video Coding Eye Tracking Technology & Visual Attention Abnormal Event Detections New Research Areas Conclusions © 2010 Manoranjan Paul. All Rights Reserved. Other people involved in the Research Works • • • • • • • • • • • • • Prof. Michael Frater, UNSW Prof. John Arnold, UNSW Prof. Laurence Dooley, Monash A/Prof. Manzur Murshed, Monash A/Prof. Weisi Lin, NTU A/Prof. Chiew Tong Lau, NTU A/Prof. Bu- Sung Lee, NTU Dr. Chenwei Deng, NTU Dr. Mahfuzul Haque, Monash Dr. Golam Sorwar, SCU Dr. Fan Zhang Anmin Liu, NTU Zhouye Gu, NTU © 2010 Manoranjan Paul. All Rights Reserved. Personal Information • Education: – PhD, Monash University (2005). Thesis title: Block-based very low bit rate video coding techniques using pattern templates. -nominated for Mollie Holman Gold Medal, 2006. – Bachelor of Computer Science and Engineering (4 years honors degree with research project), Bangladesh University of Engineering and Technology, Dhaka, Bangladesh (1997). Thesis Supervisor: Professor Chowdhury Mofizur Rahman • Employments: – Current Position: • • – Faculty Member, Charles Sturt University, Australia (From January 2011) Research Fellow, Nanyang Technological University, Singapore (World Rank 69), CI: Professor Weisi Lin. Previous Positions: • • • • • Research Fellow and Lecturer (05/06~03/ 09), Monash University (World Rank 45), Under ARC Discovery Grant, CI: Professor Manzur Murshed. Research Fellow, Under an ARC DP project, ADFA, The University of New South Wales (World Rank 47), CI: Professor Michel Frater. Assistant Lecturer (11/ 01~04/05), Monash University, Australia. Assistant Professor (10/00~10/01), Ahsanullah University of Science and Technology, Bangladesh. Lecturer (09/97~09/00), Ahsanullah University of Science and Technology, Bangladesh. © 2010 Manoranjan Paul. All Rights Reserved. Personal Information Cont… • Research: – – – – – Publish 50+ International articles Deliver keynote speech in the IEEE ICCIT conference, 2010 Organize a special session on the IEEE ISCAS 2010 on “Video coding” Supervise 5 PhD students and examine 5 PhD and MS theses Editor of (i) International Journal of Engineering and Industries (IJEI) and (ii) Special Issues (2008, ‘09, ‘10, & ‘11) of Journal of Multimedia. – Served Program Committee member for 15 International Conferences. • Research Quality: – Publish 7 IEEE Transactions Papers in TIP(3), TCSVT(3), and TMM (1) • • • – – – – – IEEE TIP: IF: 4.6, Top rank journal in Image Processing, JCR Rank: Q1 IEEE TCSVT: IF: 4.3, Top ranked journal in Video Technology, JCR Rank: Q1 IEEE TMM: IF: 2.9, Top ranked journal in Multimedia, JCR Rank: Q1 Publish 25+ IEEE Conf. e.g., ICIP(6), ICASSP(4), MMSP(3), ICPR(1), ISCAS(1) An ARC DP grant awarded from my PhD works Obtained $80,000 Competitive grant money Involve reviewing of IEEE TIP, IEEE TCSVT, IEEE TMM, IEEE SPL, IEEE CL. Publications H-index is 7 (according to Google) © 2010 Manoranjan Paul. All Rights Reserved. Outline • • • • • • • • Personal Information in Brief Video Compression and Video Coding Video/image Quality Assessment Computer Vision & Video Coding Eye Tracking Technology & Visual Attention Abnormal Event Detections New Research Areas Conclusions © 2010 Manoranjan Paul. All Rights Reserved. Why Video Compression and Video Coding Original ICE video sequence • • • • • • Frame 46 Frame 61 A number of frames (comprises video) going through faster in front of us provides moving picture impression i.e., video One second video requires = 25×288×352×8×3 bits = 59,400 kilo bits Whereas YouTube supports only 384 kilo bits per second Thus, we need to compress the video data by 200 times To compress data we need Video Coding Video Coding Standards: Motion JPEG, MPEG-2, MPEG-4, H.264, H.265 © 2010 Manoranjan Paul. All Rights Reserved. Video Coding Steps (H.264) • • First Frame of Tennis video 1 2 3 4 5 6 7 8 9 • Frame Difference 16 10 11 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 • 5 16 6 7 8 9 10 11 12 13 14 Macroblock 15 2d+1+M • Frame encoding format is IPPP…, IBBP…, or IBPBP… Processed P-frame using small block, Macroblock, 16×16 pixel block For more compression try to match each block into the already coded frames, Motion Estimation Further divided the 16×16 block into 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 Then select the best Mode using bit rates (R), distortions (D), and Lagrangian parameter (λ) (k+u, l+v) X Y Motion Vector (u,v)M(otion d M M 2d+1+M (k,l) MB of Reference frame J D(mi ) R(mi ) MB of Current frame Figure: BlockdMatching Technique Search Window of Reference Frame © 2010 Manoranjan Paul. All Rights Reserved. Limitations of Variable Block Size ME and MC 1 2 3 4 5 6 7 8 9 10 11 • 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 5 6 88 7 8 9 10 11 12 13 14 15 A Frame or Picture of a video 16 16 pixels block A Frame or picture divided into 16×16 pixel blocks for coding • Each block may have different motions thus, 16×16, to 4×4 blocks are used to estimate motion Observations • At low bit rate the largest block (i.e., 16×16) is selected more than 60~70%. • At the higher bit rate 16×16 is 20~30 % but smallest block 8×8) 40~50%. • The variable-block size ME & MC of H.264 is not effective, as expected, at Low Bit Rates (a) Frame #2 of Miss America sequence, (b) different size blocks, (c) & (d) different size blocks of Frame #2 at low and high bit rate respectively. © 2010 Manoranjan Paul. All Rights Reserved. Exploitation of Intra-Block Temporal Redundancy by Pattern Templates Fig. An example on how pattern-based coding can exploit the intra-block temporal redundancy in improving coding efficiency © 2010 Manoranjan Paul. All Rights Reserved. Pattern-based video coding at LBR Published in IEEE Transaction on Image Processing 2010 and IEEE TCSVT 2005 • • • A set of pre-defined pattern templates A block has foreground and background Pattern matching can separates them Encoding foreground and skipping background provides compression Motion estimation using only moving regions also provides computational efficiency • Observation: • Pattern-matching • The PVC provides 1.0~2.0dB PSNR improvement or 10~20% more compression compared to the H.264 at LBR The PVC provides 20~30% computational reductions compared to the H.264 © 2010 Manoranjan Paul. All Rights Reserved. Arbitrary-Shaped Pattern Templates published in IEEE SPL 2007 Moving regions distributions The final pattern template Pattern templates from different video sequences Clustering the moving regions • • • Highest magnitude 64 pixels positions Rate-distortion performance comparison Pre-defined pattern templates sometimes cannot approximate the object shapes properly Arbitrary-shaped pattern templates generated from video can better approximate the object shapes, thus provides better coding performance ASPVC provides 0.5dB PSNR or 5% more compression compared the PVC © 2010 Manoranjan Paul. All Rights Reserved. Singular Value Decomposition (SVD) • DCT VS. SVD DCT Original Image DCT Coefficients Coefficient distribution of the120th frame after DCT • 2D-SVD Original Image 2D-SVD Coefficients Coefficient distribution of the 120th frame after SVD 2D-SVD yields fewer high amplitude coefficients (dark spots in the coefficient matrix) Hybrid Video Codec (MPEG-1, 2, -4, H.263/4) Advantage: Exploits temporal redundancy by motion estimation, high compression Disadvantage: High computational complexity, Interframe dependency, not support random frame access function, error propagation Intra-frame Video Codec (Motion JPEG, Motion JPEG2000) Advantage: Low computational complexity, Inter-frame independency, Support random access to each frame, Facilitate video editing, no error propagation Disadvantage: Low compression © 2010 Manoranjan Paul. All Rights Reserved. The Framework of SVD Video Codec Published in IEEE ICIP 2010 Input Video sequence Ai Frames subtracted by meanframe Frame Normalization Normalized frames 16 x 16 MB group Bi A i' i=1...n i=1...n Coefficient matrix group Normalization Coefficients Ai' 16x16 coefficient matrix group Mi 2D-SVD F i=1...n Mean Frame substraction Mean frame eigenvector matrix U li U ri of every MB Bi Amean Quantization & Entropy Coding JPEG Compression Group Information Headfile1 Operations • • • • Coefficient matrix of frame 1 Headfile2 Coefficient matrix of frame 2 Headfile 3 Coefficient matrix of frame 3 …... Operation Result SVD is 100 times faster than the H.264 video coding standard Suitable for mobile video phone & surveillance video coding Image quality gain 5.0dB compared to Motion J2K Better for random access and noisy transmission © 2010 Manoranjan Paul. All Rights Reserved. Super-High Definition (4096×2160) Video Coding Published in IEEE ICIP 2010 60 55 PSNR (dB) PSNR (dB) 55 50 H.264 FRExt 45 BS+SR+SKIP BS+SR BS Standard H.264 FRExt Motion JPEG2000 45 Motion JPEG2000 40 40 0 1 2 3 4 5 Bit per pixel (bpp) 6 7 8 0 56 60 51 55 46 50 PSNR (dB) PSNR (dB) 50 41 36 H.264 FRExt 31 Motion JPEG2000 1 2 3 4 bpp 0 1 2 3 4 5 Bit per pixel (bpp) 6 H.264 vs. JPEG 7 8 6 7 8 • Answer: Yes, if 45 BS+SR+SKIP BS+SR BS Standard H.264 FRExt Motion JPEG2000 40 35 30 26 5 • H.264 outperforms the JPEG 2000? • Answer: No 25 0 1 2 3 4 bpp 5 6 7 8 • Increase block size to 32×32 instead of 16×16 • Increase search range to 128 instead of 32 • Extend Skip block definition to 32×32 instead of 16×16 H.264* vs. JPEG © 2010 Manoranjan Paul. All Rights Reserved. Optimal Compression Plane for Video Coding Accepted in IEEE ICIP 2010 and IEEE Transactions on Image Processing 2010* adaptive compression plane transform for each PPU T X Y TX determine optimal compression plane source determine video PPU size video coded coding bits standards XY TY Figure: Different Compression Planes such as XY, TX, and TY 40 36 40 36 36 32 32 28 24 0 0.5 1 bpp 1.5 28 XY OCP(N=32) OCP(N=128) 20 2 0 28 24 20 24 XY OCP(N=32) OCP(N=128) 20 psnr psnr psnr 32 0.5 1 bpp 1.5 XY OCP 16 12 2 0 0.5 1 bpp Figure: Rate-distortion performance using JPEG 2000 and H.264 for Mobile, Tempete, and Tempete video sequences (left to right) © 2010 Manoranjan Paul. All Rights Reserved. 1.5 2 Occlusion Handling using Multiple Reference Frames (MRFs) • • Frame 1 To encode current frame, a video encoder uses previously encoded multiple frames for the best matched frame The MRFs (using up to 16 frames in the H.264) technique outperforms the technique of using only one reference frame when: • Repetitive motion • Uncovered background • Non-integer pixel displacement • Lighting change, etc. Frame 2 Motion estimation and compensation Frame 3 Frame 6 Frame 4 Frame 5 Silent video sequence Frame 18 Uncovered background Frame 18 Light changed Disadvantages – Required k (number of reference frames) times computations – Required k times more memory in encoder and decoder – No improvement if relevant features are missing What happens if the cycle of features exceeds the k reference frames © 2010 Manoranjan Paul. All Rights Reserved. Frame 30 Repetitive motion Dual Reference Frames: A Sub Set of MRFs • • • • • • Dual (short term and long term) reference frames technique instead of MRFs technique gains popularity instead of MRFs Short term reference frame (STR): Immediate previous frame for foreground. Long term reference (LTR) frame for stable background. • One of the previously coded frame is selected for LTR • Selecting and updating LTR frame is complicated • Cannot capture uncovered background for the most of the time • Cannot capture stable background • May not be effective for implicit background and foreground referencing To capture uncovered background, repetitive motions, etc. we need a ground truth background but in the changing environment it is almost impossible to get the background of a video thus, we need to model a Most Common Frame of A Scene (MCFIS) using video frames. McFIS: The Most Common Frame of a Scene published in IEEE ICASSP-10, ICIP-10, ISCAS-10 Observation: • • • A true background can capture a major portion of a scene Difference between the current frame and background provides object Coding only foreground provides coding efficiency Original ICE video sequence Background Generation © 2010 Manoranjan Paul. All Rights Reserved. McFIS for Background Areas Accepted in IEEE Transactions on Image Processing 2010* & ICASSP-10 • • • • • Instead of MRFs, a McFIS is enough A McFIS enables the possibility of capturing a whole cycle of features The immediate previous frame is for moving areas and the McFIS is for background regions. Less computation (60% saving) in ME&MC is required using McFIS Improvement around 1.0dB PSNR compared to the existing methods Implicit foreground (STR) and background (LTR, i.e., McFIS) referencing where black regions are referenced from the STR and other regions are referenced using McFIS © 2010 Manoranjan Paul. All Rights Reserved. Rate-distortion performance © 2010 Manoranjan Paul. All Rights Reserved. Scene Change Detection by McFIS • • • • Two mixture video sequences created A simple SCD detection algorithm using the McFIS has been developed (i.e., SAD between McFIS and Current frame) Compare our results with Ding et al. IET IP 2008 The proposed SCD algorithm is better © 2010 Manoranjan Paul. All Rights Reserved. Adaptive GOP Determination by SCD • • • • • I-frame requires two to three times more bits than P or B-frame if a sequence does not contain any scene changes or extremely high motion activity, insertion of I-frames reduces the coding performance Optimal I-frame insertion using adaptive GOP and SCD improves coding performance We have compared our results with other two latest algorithms The proposed AGOP method outperforms existing methods NUMBER OF I-FRAMES FOR MIXED VIDEO A AND B OF 700 FRAMES Number of I-frames Mixed Video A Mixed Video B Methods QPs SCD AGOP SCD AGOP 40 10 0 10 0 Proposed 28 10 0 10 0 Algorithm 20 10 0 10 0 40 10 0 10 11 Ding’s 28 10 0 10 4 Algorithm 20 10 0 10 4 40 9 21 40 21 Matsuoka’s 28 9 21 40 21 Algorithm 20 10 21 39 21 © 2010 Manoranjan Paul. All Rights Reserved. McFIS as an I-frame Accepted in IEEE Trans. on Cir. & Sys. for Video Technology 2010* & IEEE ISCAS-10 • • • A frame being the first frame is not the best I-frame An ideal I-frame should have the best similarity with other frames, thus better for P/B-frames in terms of bits and image quality Insertion of more than one I-frame within a scene degrades the coding performance Conventional Frame Format Proposed More background area using McFIS Computational Time Reductions © 2010 Manoranjan Paul. All Rights Reserved. McFIS for Reducing Quality Fluctuations • • Fluctuation of PSNRs Rate-Distortion performance • • Fluctuation of Bits per frame Rate-Distortion performance © 2010 Manoranjan Paul. All Rights Reserved. Better I-frames are generated for coding efficiency More consistent bit per frame and image quality by the proposed method The proposed method improved 1.5dB PSNR Better for Scene Change Detection Coded Videos using Different Schemes Original Video © 2010 Manoranjan Paul. All Rights Reserved. H.264 Mode Selection by Distortion Only published in IEEE Transactions on Multimedia 2009 H.264 mode selection • J LM D ( RMV RH RDCT ) mn arg min ( J LM (mi )) R(mi ) RT mi New H.264 mode selection J Dist D m ( RMV RH ) • • The H.264 supports a number of modes such as intra, skip, direct, 16×16 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 to encode a 16×16 block The H.264 requires around 10 time more computational times compared to the H.263 Thus, limited powered devices (mobile phone, PDA, etc.) can not use H.264 • • • Mode selection using only bits from motion vectors and headers with entropy bits provides 12% reductions of computational time Modified Lagrangian Multiplier is also derived Only 0.1dB PSNR may be degraded © 2010 Manoranjan Paul. All Rights Reserved. Mode Selection Using Phase Correlation Different peaks for different kinds of motions Relationship of motion inconsistence metric, δ with object movements Direct Mode Selection from Binary Matrix Phase correlation provides relative motion between two blocks To direct mode selection we exploit this object motion For example, o one big peak-> no motion o One small peak-> single motion o Two peaks-> multiple motions © 2010 Manoranjan Paul. All Rights Reserved. Direct Mode Selection for Efficient Coding published in IEEE Transactions on IP 2010 Mode Selection by the H.264 Mode Selection by the our method Rate-Distortion performance sometimes better compared to the H.264 • • • Time saving 60~90% compared to H.264 Rate-distortion performance is comparable to the H.264 Can be applied to other fast mode selection schemes Time saving 60~90% compared to H.264 © 2010 Manoranjan Paul. All Rights Reserved. Outline • • • • • • • • Personal Information in Brief Video Compression and Video Coding Video/Image Quality Assessment Computer Vision & Video Coding Eye Tracking Technology & Visual Attention Abnormal Event Detection New Research Areas Conclusions © 2010 Manoranjan Paul. All Rights Reserved. Video/Image Quality Assessment published in IEEE ICME 2010 and IEEE Transactions on CSVT 2010 Figure: Block diagram of pixel domain Just Noticeable Distortion (JND) model Figure: Image Decomposition; (a) Original Barbara Image, (b) Structural part, and (c) Texture part © 2010 Manoranjan Paul. All Rights Reserved. Outline • • • • • • • • Personal Information in Brief Video Compression and Video Coding Video/Image Quality Assessment Computer Vision & Video Coding Eye Tracking Technology & Visual Attention Abnormal Event Detections New Research Areas Conclusions © 2010 Manoranjan Paul. All Rights Reserved. Environment (objects, background, shadow, etc.) Modeling How to extract the active regions from surveillance video stream? Background Subtraction - Current frame = Background Moving foreground Challenges!! • Background initialization is not a practical approach in real-world • Dynamic nature of background environment due to illumination variation, local motion, camera displacement and shadow © 2010 Manoranjan Paul. All Rights Reserved. Modeling using Gaussian Mixtures 2 σ P(x) µ P(x) Sky Road Floor Cloud Shadow Shadow Leaf Moving Car Walking Moving Person People x 2 σ Cloud µ x P(x) Person Leaf Sky P(x) 2 σ µ x (Pixel intensity) x © 2010 Manoranjan Paul. All Rights Reserved. Moving Object Detection Frame 1 Frame N road shadow car road shadow Models are ordered by ω/σ K models ω12 σ1 µ1 road ω22 σ2 µ2 ω32 σ3 µ3 car shadow 65% Background Models Learning Rate α 15% 20% b B argminb ωk T k 1 T is minimum portion of data in the environment accounted for background. Matched model for a new pixel value Xt, |Xt - µ | < T σ T = 70% *σ © 2010 Manoranjan Paul. All Rights Reserved. Dynamic Background Modeling published in IEEE ICPR-08, AVSS-8, and MMSP-08 Observation: • • • Predefined background/foreground ratio does not work for all cases Background generation using mean makes tailing effect Unnecessary background models using existing modeling Contributions: • • • Make environment independent to work in any environment Provide emphasis on recent changes to avoid tailing effect New criteria to avoid multiple models for the same environment component Original ICE video sequence Background Generation © 2010 Manoranjan Paul. All Rights Reserved. Object Detections - 1 First Frame Test Frame Ground Truth Sfm Lf (1) (2) (3) (4) (5) (1) PETS2000; (2) PETS2006-B1; (3) PETS2006-B2; (4) PETS2006-B3; and (5) PETS2006-B4. © 2010 Manoranjan Paul. All Rights Reserved. Pf Object Detections - 2 First Frame Test Frame Ground Truth Sfm Lf Pf (6) (7) (8) (9) (10) (11) (12) (6) Bootstrap; (7) Camouflage; (8) Foreground Aperture; (9) Light Switch; (10) Moved Object; (11) Time Of Day; and (12) Waving Tree © 2010 Manoranjan Paul. All Rights Reserved. Object Detections - 3 First Frame Test Frame Ground Truth Sfm Lf (13) (14) (13) Football; and (14) Walk © 2010 Manoranjan Paul. All Rights Reserved. Pf Outline • • • • • • • • Personal Information in Brief Video Compression and Video Coding Video/Image Quality Assessment Computer Vision &Video Coding Eye Tracking Technology & Visual Attention Abnormal Event Detections New Research Areas Conclusions © 2010 Manoranjan Paul. All Rights Reserved. Eye Tracking Technology & Visual Attention Recent Research Figure: Three Types of Eye trackers Figure: Different heat maps for normal (left) and abnormal driving (right) Figure: Original image, eye movement and fixation when observers asked for free observations, when asked to determine age of the people, when asked to determine people positions (left to right picture). Figure: Heat map using eye tracker (left) and salience map (right) © 2010 Manoranjan Paul. All Rights Reserved. Outline • • • • • • • • Personal Information in Brief Video Compression and Video Coding Video/Image Quality Assessment Computer Vision & Video Coding Eye Tracking Technology & Visual Attention Abnormal Event Detections New Research Areas Conclusions © 2010 Manoranjan Paul. All Rights Reserved. On Going Project Abnormal event detection using vision friendly video coding This project contributes on: • • • • Interactive 3D Video Technology Eye Tracking technology Visual attention modeling Abnormal event detection © 2010 Manoranjan Paul. All Rights Reserved. New Research Areas • 3D Video – – – – – 3D Video Coding Multi-view Video Coding Depth Estimation Free-View Videos Distributed Video Coding © 2010 Manoranjan Paul. All Rights Reserved. Conclusions • Improve compression and video quality – – – – – • Pattern-based (regular and arbitrary-shaped) video coding Super-High Definition (SHD) video coding Video coding for uncovered background and occlusions Optimal compression plane (OCP) selection Singular Value Decomposition-based video coding Reduce computational complexity in video coding – Phase correlation-based direct mode selection – Simplified Lagrangian function for low-cost mode selection • Image Decomposition and Quality Assessment – Separating structure and texture areas of an image • Background modeling & Object Detection – Object detection from the challenging environments • Panic-driven event detection – Event detection using low-level features • Eye tracking technology & visual attention modeling – Human-centric regions of interest selection © 2010 Manoranjan Paul. All Rights Reserved. Thank you! Email: [email protected] http://sites.google.com/site/manoranjanpaulpersonalweb/ © 2010 Manoranjan Paul. All Rights Reserved.
© Copyright 2024 Paperzz