Computer Graphics and Visualization Master‘s Thesis Defense Improving JPEG Compression with Regression Tree Fields Nico Schertler 22.09.2014 Computer Graphics and Visualization Motivation Titelformat Einzeiler 2 Computer Graphics and Visualization Motivation Titelformat Einzeiler PB 1015 B 7 PB photo storage every month (2013) TB 1012 B GB 109 B MB 106 B kB 103 B 42 PB of transferred data in 2012 20 PB for satellite, aerial and street level images (2012) 3 Computer Graphics and Visualization Why Reduce Image Storage Size? Titelformat Einzeiler Reduce Costs for data centers, maintenance, energy ... Reduce Environmental Impact due to cooling, energy consumption … Reduce Transfer Time for uploads and downloads, back-ups … 4 Computer Graphics and Visualization Agenda Titelformat Einzeiler Theoretical Fundamentals JPEG RTFs General Idea Preliminary Considerations Opimization of various degrees of freedom Coding RTF Loss Function Prediction Strategy Conclusions 5 Computer Graphics and Visualization Titelformat Einzeiler JPEG 6 Computer Graphics and Visualization RGB Image Color Space Transformation Y‘CbCr Image Downsampling JPEG - Overview Titelformat Einzeiler Codec for still images. Standardized in 1992 Includes both lossless and lossy compression. Best suited for smooth images JPEG 2000 based on discrete wavelet transform Discrete Cosine Transform DCT Coefficients Quantization Entropy Coding JPEG Stream 7 Computer Graphics and Visualization RGB Image Color Space Transformation Y‘CbCr Image Downsampling JPEG – Color Space Transformation Titelformat Einzeiler RGB to Y’CbCr Closer match to human perception 0.299 𝑌′ 𝐶𝑏 = −0.168736 0.5 𝐶𝑟 0.587 −0.331264 −0.418688 RGB input Y 0.114 0.5 −0.081312 0 128 128 𝑅 𝐺 𝐵 1 Discrete Cosine Transform DCT Coefficients Quantization Cb Cr Entropy Coding JPEG Stream 8 Computer Graphics and Visualization RGB Image Color Space Transformation Y‘CbCr Image JPEG – Discrete Cosine Transform Titelformat Einzeiler Tile image in blocks of 8x8 pixels. Transform from spatial domain to frequency domain using DCT. Results in real-valued coefficients in [−8192, 8128] Downsampling Discrete Cosine Transform DCT Coefficients Quantization Entropy Coding DCT Base Functions Coefficient Images JPEG Stream 9 Computer Graphics and Visualization JPEG – Discrete Cosine Transform Titelformat Einzeiler RGB Image Color Space Transformation Y‘CbCr Image Downsampling Discrete Cosine Transform DCT Coefficients Quantization Entropy Coding JPEG Stream Coefficient Images 10 Computer Graphics and Visualization RGB Image Color Space Transformation Y‘CbCr Image JPEG – Quantization Titelformat Einzeiler Quantize coefficients to integers via 𝑞 𝑣 = 𝑟𝑜𝑢𝑛𝑑 𝑣 𝑐 Quantization factors 𝑐 depend on channel and frequency. Downsampling Discrete Cosine Transform DCT Coefficients Quantization Entropy Coding JPEG Stream 11 Computer Graphics and Visualization RGB Image Color Space Transformation Y‘CbCr Image JPEG – Entropy Coding Titelformat Einzeiler Encode image block-wise. Re-order coefficients in zig-zag order. Encode using combination of run-length encoding and Huffman code. Downsampling Discrete Cosine Transform DCT Coefficients Quantization Entropy Coding JPEG Stream 12 Computer Graphics and Visualization Titelformat Einzeiler Regression Tree Fields 13 Computer Graphics and Visualization Gaussian Random Fields TitelformatConditional Einzeiler Interpret each pixel as a node in a undirected graph. Assign a random variable to each node. random variable observation factor 𝑄 2𝜋 𝜋 𝑌=𝑦 = = 1 𝑇 𝑛 exp − 2 𝑦 − 𝜇 𝑄 𝑦 − 𝜇 𝜙𝑓 𝑌𝑓 𝑓 ∝ 𝜓𝑓 𝑌𝑓 = 𝐸 𝑦 𝑓 14 Computer Graphics and Visualization Regression Trees Titelformat Einzeiler Represents a function 𝑓: 𝐷 → 𝑅 Feature test: 𝑓𝑒𝑎𝑡: 𝐷 → 𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒 𝑥 ≥ 0.5 (𝑥 − 0.25)2 ≥ 1 2 0.01 𝑥 ≥ 0.75 3 4 With thresholded response: 𝑟: 𝐷 → ℝ, 𝑓𝑒𝑎𝑡: 𝑖 ↦ 𝑟 𝑖 ≥ 𝑡 Leaf nodes specify results ∈ 𝑅 15 Computer Graphics and Visualization Regression Tree Fields Titelformat Einzeiler Based on a conditional random field. Local energies are expressed in a quadratic form: 1 𝑇 𝐸 𝑦, 𝑥 = 𝑦 Θ 𝑥 𝑦 − 𝑦 𝑇 𝜃(𝑥) 2 Functions Θ (matrix) and 𝜃 (vector) are represented by a regression tree. Factors are grouped into factor types, which share the same tree for Θ and 𝜃 : 16 Computer Graphics and Visualization RTF Training Titelformat Einzeiler Optimization with respect to a loss function. Start with single leaf (containing all factors). Θ0 , 𝜃0 Θ10, 𝜃10 Optimize parameters with respect to loss function. Θ20 , 𝜃20 Sample a number of feature tests and choose the one with the greatest loss gradient. Copy parameters to new children. Optimize all leaf nodes. Sample feature tests. … 17 Computer Graphics and Visualization Titelformat Einzeiler Improving JPEG with RTFs 18 Computer Graphics and Visualization Original Image General Idea Titelformat Einzeiler Transformed Images RTF Model Quantization Entropy Coding 19 Computer Graphics and Visualization Predictive Titelformat Dependencies Einzeiler From which sources should a coefficient image be predicted? Calculate all 2-permutations and evaluate with PSNR: 𝑚𝑎𝑥 2 𝑃𝑆𝑁𝑅 𝑑𝐵 = 10 log10 𝑀𝑆𝐸 20 Computer Graphics and Visualization 0 Target Image Predictive Titelformat Dependencies Einzeiler 63 PSNR Source Image 50 dB 15 dB 1 factor type 3 factor types 5 factor types 21 Computer Graphics and Visualization Quantization and Entropy Coding Titelformat Einzeiler Quantization with constant factor: 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑓 𝑑 = 𝑟𝑜𝑢𝑛𝑑 𝑑 𝑓 𝑓 𝑏𝑖𝑡𝑑𝑒𝑝𝑡ℎ = 214−𝑏𝑖𝑡𝑑𝑒𝑝𝑡ℎ Separate RLE and Huffman encoding. Optimized stream order. Quality evaluated with PSNR. 22 Computer Graphics and Visualization Quantization and Entropy Coding Titelformat Einzeiler 120 100 2 bit non-diff 3 bit non-diff 4 bit non-diff 5 bit non-diff 80 PSNR [dB] 6 bit non-diff 7 bit non-diff 8 bit non-diff 60 9 bit non-diff 10 bit non-diff 11 bit non-diff 40 12 bit non-diff 13 bit non-diff 14 bit non-diff 15 bit non-diff 20 16 bit non-diff JPEG 0 0 0.5 1 1.5 2 2.5 Rate [byte / px] 23 Computer Graphics and Visualization Loss – Distance Functions Titelformat Einzeiler 𝑀𝑆𝐸(𝑑) = 𝑑 2 𝑀𝐴𝐷(𝑑) = 𝑑2 𝐿𝑜𝑔𝐷𝑖𝑠𝑡 𝑑 = +𝜖 log 2 𝑑 + 1 2 +𝜖 1 2 𝐿𝑜𝑟𝑒𝑛𝑡𝑧𝑖𝑎𝑛 𝑑 = log 1 + 𝑑 2 24 Computer Graphics and Visualization Loss - Entropy Titelformat Einzeiler Information entropy is a lower bound for encoding data: 𝐻=− 𝑓𝑠 log 2 𝑓𝑠 𝑠∈𝑆 𝑓𝑏 (𝑃) = 1 ∗ 𝑝𝑥 ∈ 𝑃 | 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒 𝑑𝑖𝑓𝑓𝑝𝑥 = 𝑏 𝑛 Transfer to continuous frequency calculation: Weight Weight Sample 1 𝑓𝑏 𝑃 = 𝑛 𝑏 𝑝∈𝑃 𝑏 Bin 𝑔𝑝 − 𝑝𝑟𝑝 𝑤 𝑥− 𝑑𝑥 𝑞 25 Computer Graphics and Visualization Loss - Entropy Titelformat Einzeiler Loss function should be twice continuously differentiable. Use a C1-continuous window function (Hann window): 𝑤 𝑥 = 1 2𝜋𝑥 1 + cos 𝑒 𝑒 0 𝑒 𝑒 − ≤𝑥≤ 2 2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 26 Computer Graphics and Visualization Loss - Comparison Titelformat Einzeiler 80 70 PSNR [dB] 60 50 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rate [byte / px] MSE MAD LogDist Lorentzian Entropy Entropy Linked Entropy Linked (1 RTF per channel) JPEG 27 Computer Graphics and Visualization Prediction Titelformat Strategies Einzeiler 28 Computer Graphics and Visualization Prediction Titelformat Strategies Einzeiler 29 Computer Graphics and Visualization Prediction Titelformat Strategies Einzeiler ... with Additional Data Prediction Strategies 90 90 80 80 70 70 C0 To All (Separate RTFs) PSNR [dB] Row-wise 60 60 50 50 40 40 30 30 11 RTFs C0 To All (1 RTF per channel) 20 20 C0 To All (1 RTF Overall) 10 10 Predict Plain Chroma 0 0 0 1 2 Rate [byte / px] 3 0 1 2 3 JPEG Rate [byte / px] 30 Computer Graphics and Visualization Scalability Titelformat Einzeiler Sample image has 748 pixels. At least depth-8 trees are necessary to produce usable predictions (≘ 384 leaves max). Experiments with more pixels have shown: A single model cannot be applied to several images. If the leave : pixel ratio drops below 1:2, predictions become unusable. The RTF model does not infer common image characteristics but outsources data into the trees. 31 Computer Graphics and Visualization Conclusions Titelformat Einzeiler 32 Computer Graphics and Visualization Conclusions Titelformat Einzeiler Good predictions can improve compression performance significantly. RTF models are not suitable for predictions in the frequency domain. Future work: Usage of different image representations and machine learning models. Application to other media (e.g. point clouds) Optimization of entropy loss 33 DCT Titelformat Einzeiler Computer Graphics and Visualization 1D: 1 2 𝑓 𝑥 = 𝑐0 + 𝑁 𝑁 𝑁−1 𝑐𝑘 = 𝑥=0 𝑁−1 𝑐𝑘 ∗ cos 𝑘=1 𝜋 1 𝑘 𝑥+ 𝑁 2 𝜋 1 𝑓 𝑥 cos 𝑘 𝑥 + 𝑁 2 2D: 𝑓 𝑥, 𝑦 = 1 𝐶𝑢 = 1 4 7 7 𝐶𝑢 𝐶𝑣 𝑐𝑢,𝑣 cos 𝑢=0 𝑣=0 2𝑥 + 1 𝑢𝜋 cos 16 2𝑦 + 1 𝑣𝜋 16 𝑢=0 2 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 34 Computer Graphics and Visualization Predictive Titelformat Dependencies Einzeiler Y 1 Factor Type 3 Factor Types 5 Factor Types Cb 35 Maximum Depth TitelformatTree Einzeiler Computer Graphics and Visualization ... with Additional Data Maximum Tree Depth 80 80 70 70 60 60 10 9 PSNR [dB] 8 50 50 7 40 40 30 30 20 20 10 10 6 5 4 3 2 1 JPEG 0 0 0 0.2 0.4 0.6 Rate [byte / px] 0.8 1 0 2 4 6 8 10 Rate [byte / px] 36
© Copyright 2026 Paperzz