Pyramid Vector Quantization

Pyramid Vector Quantization
Mozilla
What is Pyramid Vector
Quantization?
2
●
A Vector Quantizer
●
That has a simple algebraic structure
●
To perform gain-shape quantization
Mozilla
Motivation
3
Mozilla
Why Vector Quantization?
●
3 classic advantages (Lookabaugh et al. 1989):
–
Space filling advantage: VQ codepoints tile space
more efficiently
●
●
–
Shape advantage: VQ can use more points where
PDF is higher
●
–
4
Example: 2-D, squares vs. hexagons
Maximum possible gain for large dimension: 1.53 dB
1.14 dB gain for 2-D Gaussian, 2.81 for high dimension
Memory advantage: exploit statistical dependence
between vector components
Mozilla
Why Vector Quantization?
●
3 classic advantages (Lookabaugh et al. 1989):
–
Space filling advantage: VQ codepoints tile space
more efficiently
●
●
–
Shape advantage: VQ can use more points where
PDF is higher
●
–
Can be mitigated with entropy coding
Memory advantage: exploit statistical dependence
between vector components
●
5
Example: 2-D, squares vs. hexagons
Maximum possible gain for large dimension: 1.53 dB
Transform coefficients are not strongly correlated
Mozilla
Why Vector Quantization
●
●
Important: Space advantage applies even when
values are totally uncorrelated
Another important advantage
–
6
Can have codebooks with less than 1 bit per
dimension
Mozilla
Why Algebraic VQ?
●
Trained VQ impractical for high rates, large
dimensions
–
High dimension → large LUTs, lots of memory
●
–
●
Structured codebook: no LUTs, fast search
Space-filling lattice for arbitrary dimension
unknown: have to approximate
–
7
No codebook structure → slow search
“Algebraic” VQ solves these problems
–
●
Exponential in bitrate
PVQ asymptotically optimal for Laplacian sources
Mozilla
Why Gain-Shape Quantization?
●
Separate “gain” (energy) from “shape” (spectrum)
–
●
Vector = Magnitude × Unit Vector (point on sphere)
Potential advantages
–
Can give each piece different rate allocations
●
●
–
Implicit activity masking
●
–
8
Preserve energy (contrast) instead of low-passing
Scalar can only add energy by coding ±1’s
Can derive quantization resolution from the explicitly
coded energy
Better representation of coefficients
Mozilla
How it Works (High-Level)
9
Mozilla
Simple Case: PVQ without a
Predictor
●
Scalar quantize gain
●
Place K unit pulses in N dimensions
–
Up to N = 1024 dimensions for large blocks
–
Only has N-1 degrees of freedom
●
Normalize to unit norm
●
K is derived implicitly from the gain
●
Can also code K and derive gain
10
Mozilla
Codebook for N=3 and
different K
11
Mozilla
PVQ vs. Scalar Quantization
12
Mozilla
PVQ with a Predictor
●
●
Video provides us with useful predictors
We want to treat vectors in the direction of the
prediction as “special”
–
●
●
13
They are much more likely!
Subtracting and coding the residual would lose
energy preservation
Solution: align the codebook axes with the
prediction, and treat one dimension differently
Mozilla
2-D Projection Example
●
Input
Input
14
Mozilla
2-D Projection Example
●
Input + Prediction
Prediction
Input
15
Mozilla
2-D Projection Example
●
●
Input + Prediction
Compute Householder
Reflection
Prediction
Input
16
Mozilla
2-D Projection Example
●
●
●
Input + Prediction
Compute Householder
Reflection
Apply Reflection
Prediction
Input
17
Mozilla
2-D Projection Example
●
●
●
●
Input + Prediction
Compute Householder
Reflection
Apply Reflection
Compute &
code angle
Prediction
θ
Input
18
Mozilla
2-D Projection Example
●
●
●
●
●
19
Input + Prediction
Compute Householder
Reflection
Apply Reflection
Compute &
code angle
Code other
dimensions
Prediction
θ
Input
Mozilla
What does this accomplish?
●
●
Creates another “intuitive” parameter, θ
–
“How much like the predictor are we?”
–
θ = 0 → use predictor exactly
θ determines how many pulses go in the
“prediction” direction
–
●
Remaining N-1 dimensions have N-2 degrees
of freedom (no redundancy)
–
20
K (and thus bitrate) for remaining N-1 dimensions
adjusted down
Can repeat for more predictors
Mozilla
Details...
21
Mozilla
Band Structure
●
DC coded separately with scalar quantization
●
AC coefficients grouped into bands
●
22
–
Gain, theta, etc., signaled separately for each
band
–
Layout ad-hoc for now
Scan order in each band optimized for
decreasing average variance
Mozilla
Band Structure
4x4
●
23
8x8
16x16
Scan order is possibly over-fit...
Mozilla
To Predict or Not to Predict...
●
●
θ > π/2 → Prediction not helping
–
Could code large θ’s, but doesn’t seem that useful
–
Need to handle zero predictors anyway
Current approach: code a “noref” flag
–
Currently jointly code up to 4 flags at once, with
fixed order-0 probability per band (5% of KF rate)
–
Patches in review cut this down this a lot
●
●
●
24
Force noref=1 when predictor is zero in keyframes
Separate probabilities for each block size
Adapt the probabilities
Mozilla
Quantization Matrix
●
Simple approach (what we’re doing now)
–
Separate quantization resolution for each band
●
●
Advanced approach?
–
Scaling after normalization complicated
●
●
25
Keep flat quantization within bands
Unit pulses no longer “unit” (how to sum to K?)
Householder reflection scrambles things further
–
Better(?): Pre-scale vector by quantization factors
–
Effects on energy preservation?
Mozilla
Quantization Matrix Example
Flat Quantizer (base Q=35)
26
Adjusted Per-Band (base Q=23)
Mozilla
Quantization Matrix Example
Flat Quantizer (base Q=35)
●
27
Metrics: +15% PSNR, +12% SSIM,
-18% PSNR-HVS
Adjusted Per-Band (base Q=23)
Mozilla
Activity Masking
●
Goal: Use better resolution in flat areas
–
Low contrast → low energy (gain)
–
Derivations in doc/video_pvq.lyx,
doc/theoretical_results.lyx
●
28
Currently wrong/incomplete, working on updates...
Mozilla
Activity Masking
●
Step 1: Compand gain (g)
–
Goal: Q ∝ g2α (x264 uses α = 0.173, we start with
1/6)
–
Quantize ĝ = (Qgĥ)β, encode ĥ
–
29
●
β = 1/(1-2α)
●
Qg = (Q/β)β
Offset steps so at least one value of ĥ gives same
gain as the prediction
Mozilla
Activity Masking cotd.
●
Step 2: Choose θ resolution
–
Polar coordinates:
●
●
ĝ · √(cos θ – cos ϑ)2 + (sin θ – sin ϑ)2 = dĝ/dĥ
√(cos θ – cos ϑ)2 + (sin θ – sin ϑ)2 = 2 – 2cos(θ – ϑ)
≈ arcdistance(θ, ϑ) ≈ θ – ϑ
–
–
30
At least for small θ – ϑ
Qθ = (dĝ/dĥ)/ĝ = β/ĥ
●
Make sure Qθ evenly divides π/2
●
When ĝ is small, force Qθ = π/2
Mozilla
Activity Masking cotd.
●
Step 3: Choose K
–
D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ Dpvq)
●
Dθ = 2 – 2cos(θ – ϑ) = distortion due to θ quant.
●
Dpvq = distortion due to PVQ on last N – 1 dimensions
–
Distortion due to scalar quantizing gain: (dĝ/dĥ)2/12
–
High-rate distortion due to PVQ: (N – 1)2/(24K2)
●
●
–
Assume g = ĝ, θ = ϑ, solve for K...
●
31
Derived experimentally, far too high at low rate
(N – 2) DOF → should be (N – 2) times gain distortion
K = (ĥ/β) sin ϑ (N – 1)/√2(N – 2) ≈ (ĥ/β) sin ϑ √N/2
Mozilla
Loss Robustness
●
●
K ≈ (ĥ/β) sin ϑ √N/2
–
ĥ is offset by the companded reference gain, so
can be wrong if there are losses
–
But if K is wrong, we’ll decode the wrong number
of pulses, totally desyncing the bitstream
Remove dependence on ĥ
–
sin ϑ ≈ ϑ → ĥ sin ϑ ≈ ĥϑ = ĥQθ(ϑ/Qθ) = (ϑ/Qθ)/β
–
(ϑ/Qθ) is the index encoded in the bitstream
●
32
Since Qθ not exact, can’t cap ϑ ≤ π/2 in bitstream
Mozilla
Inter-band Masking
●
ĝ is per-band, but traditional activity masking is
per-block
–
Could just sum ĝ over all bands
–
Actual model is that energy in one band masks
energy in another
●
●
–
ρ = (ĝh2/(ĝh2 + ηĝl2))α, η controls amount of masking
●
33
Lower bands appear to mask higher, but not other
way around
Still very early... not much is tuned
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla
Calibration
●
Activity masking always increases rate
–
34
Scale base quantizer in each band to reduce rate
●
Q = Q0L(1/β – 1)
●
L is the maximum luma value
–
Just an approximation, seems to work okay
–
AM currently disabled for chroma
Mozilla
Activity Masking Example
35
No activity masking (base Q=23
)
Mozilla
Activity Masking Example
36
Activity masking (base Q=23)
Mozilla
Open Issues
●
●
Better entropy coding
–
Everything order-0
–
Take advantage of correlation in gain/θ/noref/etc.
Better RDO
–
Currently iterating over small range of gains, θs
–
Rate estimates very approximate
●
Reducing overhead of loss-robust case
●
Noise injection/folding
●
Bit-exact implementation, tuning, etc.
37
Mozilla