Defense Slides (, 2.5 MB)

Computer Graphics
and Visualization
Master‘s Thesis Defense
Improving JPEG Compression with
Regression Tree Fields
Nico Schertler
22.09.2014
Computer Graphics
and Visualization
Motivation
Titelformat Einzeiler
2
Computer Graphics
and Visualization
Motivation
Titelformat Einzeiler
PB 1015 B
7 PB photo storage every month (2013)
TB 1012 B
GB
109 B
MB
106 B
kB
103 B
42 PB of transferred data in 2012
20 PB for satellite, aerial and street level images (2012)
3
Computer Graphics
and Visualization
Why
Reduce
Image Storage Size?
Titelformat
Einzeiler
Reduce Costs
for data centers, maintenance, energy ...
Reduce Environmental Impact
due to cooling, energy consumption …
Reduce Transfer Time
for uploads and downloads, back-ups …
4
Computer Graphics
and Visualization
Agenda
Titelformat Einzeiler
Theoretical Fundamentals
JPEG
RTFs
General Idea
Preliminary Considerations
Opimization of various degrees of freedom
Coding
RTF Loss Function
Prediction Strategy
Conclusions
5
Computer Graphics
and Visualization
Titelformat Einzeiler
JPEG
6
Computer Graphics
and Visualization
RGB Image
Color Space
Transformation
Y‘CbCr Image
Downsampling
JPEG
- Overview
Titelformat
Einzeiler
Codec for still images.
Standardized in 1992
Includes both lossless and lossy compression.
Best suited for smooth images
JPEG 2000 based on discrete wavelet transform
Discrete Cosine
Transform
DCT Coefficients
Quantization
Entropy Coding
JPEG Stream
7
Computer Graphics
and Visualization
RGB Image
Color Space
Transformation
Y‘CbCr Image
Downsampling
JPEG
– Color
Space Transformation
Titelformat
Einzeiler
RGB to Y’CbCr
Closer match to human perception
0.299
𝑌′
𝐶𝑏 = −0.168736
0.5
𝐶𝑟
0.587
−0.331264
−0.418688
RGB input
Y
0.114
0.5
−0.081312
0
128
128
𝑅
𝐺
𝐵
1
Discrete Cosine
Transform
DCT Coefficients
Quantization
Cb
Cr
Entropy Coding
JPEG Stream
8
Computer Graphics
and Visualization
RGB Image
Color Space
Transformation
Y‘CbCr Image
JPEG
– Discrete
Cosine Transform
Titelformat
Einzeiler
Tile image in blocks of 8x8 pixels.
Transform from spatial domain to frequency domain
using DCT.
Results in real-valued coefficients in [−8192, 8128]
Downsampling
Discrete Cosine
Transform
DCT Coefficients
Quantization
Entropy Coding
DCT Base Functions
Coefficient Images
JPEG Stream
9
Computer Graphics
and Visualization
JPEG
– Discrete
Cosine Transform
Titelformat
Einzeiler
RGB Image
Color Space
Transformation
Y‘CbCr Image
Downsampling
Discrete Cosine
Transform
DCT Coefficients
Quantization
Entropy Coding
JPEG Stream
Coefficient Images
10
Computer Graphics
and Visualization
RGB Image
Color Space
Transformation
Y‘CbCr Image
JPEG
– Quantization
Titelformat
Einzeiler
Quantize coefficients to integers via
𝑞 𝑣 = 𝑟𝑜𝑢𝑛𝑑
𝑣
𝑐
Quantization factors 𝑐 depend on channel and
frequency.
Downsampling
Discrete Cosine
Transform
DCT Coefficients
Quantization
Entropy Coding
JPEG Stream
11
Computer Graphics
and Visualization
RGB Image
Color Space
Transformation
Y‘CbCr Image
JPEG
– Entropy
Coding
Titelformat
Einzeiler
Encode image block-wise.
Re-order coefficients in zig-zag order.
Encode using combination of run-length encoding
and Huffman code.
Downsampling
Discrete Cosine
Transform
DCT Coefficients
Quantization
Entropy Coding
JPEG Stream
12
Computer Graphics
and Visualization
Titelformat Einzeiler
Regression Tree Fields
13
Computer Graphics
and Visualization
Gaussian
Random Fields
TitelformatConditional
Einzeiler
Interpret each pixel as a node in a undirected graph.
Assign a random variable to each node.
random variable
observation
factor
𝑄
2𝜋
𝜋 𝑌=𝑦 =
=
1
𝑇
𝑛 exp − 2 𝑦 − 𝜇 𝑄 𝑦 − 𝜇
𝜙𝑓 𝑌𝑓
𝑓
∝
𝜓𝑓 𝑌𝑓 = 𝐸 𝑦
𝑓
14
Computer Graphics
and Visualization
Regression
Trees
Titelformat Einzeiler
Represents a function 𝑓: 𝐷 → 𝑅
Feature test:
𝑓𝑒𝑎𝑡: 𝐷 → 𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒
𝑥 ≥ 0.5
(𝑥
− 0.25)2 ≥
1
2
0.01
𝑥 ≥ 0.75
3
4
With thresholded response:
𝑟: 𝐷 → ℝ, 𝑓𝑒𝑎𝑡: 𝑖 ↦ 𝑟 𝑖 ≥ 𝑡
Leaf nodes specify results ∈ 𝑅
15
Computer Graphics
and Visualization
Regression
Tree Fields
Titelformat Einzeiler
Based on a conditional random field.
Local energies are expressed in a quadratic form:
1 𝑇
𝐸 𝑦, 𝑥 = 𝑦 Θ 𝑥 𝑦 − 𝑦 𝑇 𝜃(𝑥)
2
Functions Θ (matrix) and 𝜃 (vector) are represented by a regression
tree.
Factors are grouped into factor types, which share the same tree for Θ
and 𝜃 :
16
Computer Graphics
and Visualization
RTF
Training
Titelformat
Einzeiler
Optimization with respect to a loss function.
Start with single leaf (containing all factors).
Θ0 , 𝜃0
Θ10, 𝜃10
Optimize parameters with respect to loss
function.
Θ20 , 𝜃20
Sample a number of feature tests and choose
the one with the greatest loss gradient.
Copy parameters to new children.
Optimize all leaf nodes.
Sample feature tests.
…
17
Computer Graphics
and Visualization
Titelformat Einzeiler
Improving JPEG with RTFs
18
Computer Graphics
and Visualization
Original Image
General
Idea
Titelformat
Einzeiler
Transformed Images
RTF Model
Quantization
Entropy Coding
19
Computer Graphics
and Visualization
Predictive
Titelformat Dependencies
Einzeiler
From which sources should a coefficient image be predicted?
Calculate all 2-permutations and evaluate with PSNR:
𝑚𝑎𝑥 2
𝑃𝑆𝑁𝑅 𝑑𝐵 = 10 log10
𝑀𝑆𝐸
20
Computer Graphics
and Visualization
0
Target Image
Predictive
Titelformat Dependencies
Einzeiler
63
PSNR
Source Image
50 dB
15 dB
1 factor type
3 factor types
5 factor types
21
Computer Graphics
and Visualization
Quantization
and Entropy Coding
Titelformat Einzeiler
Quantization with constant factor:
𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑓 𝑑 = 𝑟𝑜𝑢𝑛𝑑
𝑑
𝑓
𝑓 𝑏𝑖𝑡𝑑𝑒𝑝𝑡ℎ = 214−𝑏𝑖𝑡𝑑𝑒𝑝𝑡ℎ
Separate RLE and Huffman encoding.
Optimized stream order.
Quality evaluated with PSNR.
22
Computer Graphics
and Visualization
Quantization
and Entropy Coding
Titelformat Einzeiler
120
100
2 bit non-diff
3 bit non-diff
4 bit non-diff
5 bit non-diff
80
PSNR [dB]
6 bit non-diff
7 bit non-diff
8 bit non-diff
60
9 bit non-diff
10 bit non-diff
11 bit non-diff
40
12 bit non-diff
13 bit non-diff
14 bit non-diff
15 bit non-diff
20
16 bit non-diff
JPEG
0
0
0.5
1
1.5
2
2.5
Rate [byte / px]
23
Computer Graphics
and Visualization
Loss
– Distance
Functions
Titelformat
Einzeiler
𝑀𝑆𝐸(𝑑) = 𝑑 2
𝑀𝐴𝐷(𝑑) =
𝑑2
𝐿𝑜𝑔𝐷𝑖𝑠𝑡 𝑑 =
+𝜖
log 2 𝑑 + 1
2
+𝜖
1 2
𝐿𝑜𝑟𝑒𝑛𝑡𝑧𝑖𝑎𝑛 𝑑 = log 1 + 𝑑
2
24
Computer Graphics
and Visualization
Loss
- Entropy
Titelformat
Einzeiler
Information entropy is a lower bound for encoding data:
𝐻=−
𝑓𝑠 log 2 𝑓𝑠
𝑠∈𝑆
𝑓𝑏 (𝑃) =
1
∗ 𝑝𝑥 ∈ 𝑃 | 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒 𝑑𝑖𝑓𝑓𝑝𝑥 = 𝑏
𝑛
Transfer to continuous frequency calculation:
Weight
Weight
Sample
1
𝑓𝑏 𝑃 =
𝑛
𝑏
𝑝∈𝑃 𝑏
Bin
𝑔𝑝 − 𝑝𝑟𝑝
𝑤 𝑥−
𝑑𝑥
𝑞
25
Computer Graphics
and Visualization
Loss
- Entropy
Titelformat
Einzeiler
Loss function should be twice continuously differentiable.
Use a C1-continuous window function (Hann window):
𝑤 𝑥 =
1
2𝜋𝑥
1 + cos
𝑒
𝑒
0
𝑒
𝑒
− ≤𝑥≤
2
2
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
26
Computer Graphics
and Visualization
Loss
- Comparison
Titelformat
Einzeiler
80
70
PSNR [dB]
60
50
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rate [byte / px]
MSE
MAD
LogDist
Lorentzian
Entropy
Entropy Linked
Entropy Linked (1 RTF per channel)
JPEG
27
Computer Graphics
and Visualization
Prediction
Titelformat Strategies
Einzeiler
28
Computer Graphics
and Visualization
Prediction
Titelformat Strategies
Einzeiler
29
Computer Graphics
and Visualization
Prediction
Titelformat Strategies
Einzeiler
... with Additional Data
Prediction Strategies
90
90
80
80
70
70
C0 To All (Separate
RTFs)
PSNR [dB]
Row-wise
60
60
50
50
40
40
30
30
11 RTFs
C0 To All (1 RTF per
channel)
20
20
C0 To All (1 RTF
Overall)
10
10
Predict Plain Chroma
0
0
0
1
2
Rate [byte / px]
3
0
1
2
3
JPEG
Rate [byte / px]
30
Computer Graphics
and Visualization
Scalability
Titelformat Einzeiler
Sample image has 748 pixels.
At least depth-8 trees are necessary to produce usable predictions (≘
384 leaves max).
Experiments with more pixels have shown:
A single model cannot be applied to several images.
If the leave : pixel ratio drops below 1:2, predictions become unusable.
The RTF model does not infer common image characteristics but
outsources data into the trees.
31
Computer Graphics
and Visualization
Conclusions
Titelformat Einzeiler
32
Computer Graphics
and Visualization
Conclusions
Titelformat Einzeiler
Good predictions can improve compression performance significantly.
RTF models are not suitable for predictions in the frequency domain.
Future work:
Usage of different image representations and machine learning models.
Application to other media (e.g. point clouds)
Optimization of entropy loss
33
DCT
Titelformat Einzeiler
Computer Graphics
and Visualization
1D:
1
2
𝑓 𝑥 = 𝑐0 +
𝑁
𝑁
𝑁−1
𝑐𝑘 =
𝑥=0
𝑁−1
𝑐𝑘 ∗ cos
𝑘=1
𝜋
1
𝑘 𝑥+
𝑁
2
𝜋
1
𝑓 𝑥 cos 𝑘 𝑥 +
𝑁
2
2D:
𝑓 𝑥, 𝑦 =
1
𝐶𝑢 =
1
4
7
7
𝐶𝑢 𝐶𝑣 𝑐𝑢,𝑣 cos
𝑢=0 𝑣=0
2𝑥 + 1 𝑢𝜋
cos
16
2𝑦 + 1 𝑣𝜋
16
𝑢=0
2
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
34
Computer Graphics
and Visualization
Predictive
Titelformat Dependencies
Einzeiler
Y
1 Factor Type
3 Factor Types
5 Factor Types
Cb
35
Maximum
Depth
TitelformatTree
Einzeiler
Computer Graphics
and Visualization
... with Additional Data
Maximum Tree Depth
80
80
70
70
60
60
10
9
PSNR [dB]
8
50
50
7
40
40
30
30
20
20
10
10
6
5
4
3
2
1
JPEG
0
0
0
0.2
0.4
0.6
Rate [byte / px]
0.8
1
0
2
4
6
8
10
Rate [byte / px]
36