Pyramid Vector Quantization Mozilla What is Pyramid Vector Quantization? 2 ● A Vector Quantizer ● That has a simple algebraic structure ● To perform gain-shape quantization Mozilla Motivation 3 Mozilla Why Vector Quantization? ● 3 classic advantages (Lookabaugh et al. 1989): – Space filling advantage: VQ codepoints tile space more efficiently ● ● – Shape advantage: VQ can use more points where PDF is higher ● – 4 Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: 1.53 dB 1.14 dB gain for 2-D Gaussian, 2.81 for high dimension Memory advantage: exploit statistical dependence between vector components Mozilla Why Vector Quantization? ● 3 classic advantages (Lookabaugh et al. 1989): – Space filling advantage: VQ codepoints tile space more efficiently ● ● – Shape advantage: VQ can use more points where PDF is higher ● – Can be mitigated with entropy coding Memory advantage: exploit statistical dependence between vector components ● 5 Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: 1.53 dB Transform coefficients are not strongly correlated Mozilla Why Vector Quantization ● ● Important: Space advantage applies even when values are totally uncorrelated Another important advantage – 6 Can have codebooks with less than 1 bit per dimension Mozilla Why Algebraic VQ? ● Trained VQ impractical for high rates, large dimensions – High dimension → large LUTs, lots of memory ● – ● Structured codebook: no LUTs, fast search Space-filling lattice for arbitrary dimension unknown: have to approximate – 7 No codebook structure → slow search “Algebraic” VQ solves these problems – ● Exponential in bitrate PVQ asymptotically optimal for Laplacian sources Mozilla Why Gain-Shape Quantization? ● Separate “gain” (energy) from “shape” (spectrum) – ● Vector = Magnitude × Unit Vector (point on sphere) Potential advantages – Can give each piece different rate allocations ● ● – Implicit activity masking ● – 8 Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding ±1’s Can derive quantization resolution from the explicitly coded energy Better representation of coefficients Mozilla How it Works (High-Level) 9 Mozilla Simple Case: PVQ without a Predictor ● Scalar quantize gain ● Place K unit pulses in N dimensions – Up to N = 1024 dimensions for large blocks – Only has N-1 degrees of freedom ● Normalize to unit norm ● K is derived implicitly from the gain ● Can also code K and derive gain 10 Mozilla Codebook for N=3 and different K 11 Mozilla PVQ vs. Scalar Quantization 12 Mozilla PVQ with a Predictor ● ● Video provides us with useful predictors We want to treat vectors in the direction of the prediction as “special” – ● ● 13 They are much more likely! Subtracting and coding the residual would lose energy preservation Solution: align the codebook axes with the prediction, and treat one dimension differently Mozilla 2-D Projection Example ● Input Input 14 Mozilla 2-D Projection Example ● Input + Prediction Prediction Input 15 Mozilla 2-D Projection Example ● ● Input + Prediction Compute Householder Reflection Prediction Input 16 Mozilla 2-D Projection Example ● ● ● Input + Prediction Compute Householder Reflection Apply Reflection Prediction Input 17 Mozilla 2-D Projection Example ● ● ● ● Input + Prediction Compute Householder Reflection Apply Reflection Compute & code angle Prediction θ Input 18 Mozilla 2-D Projection Example ● ● ● ● ● 19 Input + Prediction Compute Householder Reflection Apply Reflection Compute & code angle Code other dimensions Prediction θ Input Mozilla What does this accomplish? ● ● Creates another “intuitive” parameter, θ – “How much like the predictor are we?” – θ = 0 → use predictor exactly θ determines how many pulses go in the “prediction” direction – ● Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy) – 20 K (and thus bitrate) for remaining N-1 dimensions adjusted down Can repeat for more predictors Mozilla Details... 21 Mozilla Band Structure ● DC coded separately with scalar quantization ● AC coefficients grouped into bands ● 22 – Gain, theta, etc., signaled separately for each band – Layout ad-hoc for now Scan order in each band optimized for decreasing average variance Mozilla Band Structure 4x4 ● 23 8x8 16x16 Scan order is possibly over-fit... Mozilla To Predict or Not to Predict... ● ● θ > π/2 → Prediction not helping – Could code large θ’s, but doesn’t seem that useful – Need to handle zero predictors anyway Current approach: code a “noref” flag – Currently jointly code up to 4 flags at once, with fixed order-0 probability per band (5% of KF rate) – Patches in review cut this down this a lot ● ● ● 24 Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities Mozilla Quantization Matrix ● Simple approach (what we’re doing now) – Separate quantization resolution for each band ● ● Advanced approach? – Scaling after normalization complicated ● ● 25 Keep flat quantization within bands Unit pulses no longer “unit” (how to sum to K?) Householder reflection scrambles things further – Better(?): Pre-scale vector by quantization factors – Effects on energy preservation? Mozilla Quantization Matrix Example Flat Quantizer (base Q=35) 26 Adjusted Per-Band (base Q=23) Mozilla Quantization Matrix Example Flat Quantizer (base Q=35) ● 27 Metrics: +15% PSNR, +12% SSIM, -18% PSNR-HVS Adjusted Per-Band (base Q=23) Mozilla Activity Masking ● Goal: Use better resolution in flat areas – Low contrast → low energy (gain) – Derivations in doc/video_pvq.lyx, doc/theoretical_results.lyx ● 28 Currently wrong/incomplete, working on updates... Mozilla Activity Masking ● Step 1: Compand gain (g) – Goal: Q ∝ g2α (x264 uses α = 0.173, we start with 1/6) – Quantize ĝ = (Qgĥ)β, encode ĥ – 29 ● β = 1/(1-2α) ● Qg = (Q/β)β Offset steps so at least one value of ĥ gives same gain as the prediction Mozilla Activity Masking cotd. ● Step 2: Choose θ resolution – Polar coordinates: ● ● ĝ · √(cos θ – cos ϑ)2 + (sin θ – sin ϑ)2 = dĝ/dĥ √(cos θ – cos ϑ)2 + (sin θ – sin ϑ)2 = 2 – 2cos(θ – ϑ) ≈ arcdistance(θ, ϑ) ≈ θ – ϑ – – 30 At least for small θ – ϑ Qθ = (dĝ/dĥ)/ĝ = β/ĥ ● Make sure Qθ evenly divides π/2 ● When ĝ is small, force Qθ = π/2 Mozilla Activity Masking cotd. ● Step 3: Choose K – D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ Dpvq) ● Dθ = 2 – 2cos(θ – ϑ) = distortion due to θ quant. ● Dpvq = distortion due to PVQ on last N – 1 dimensions – Distortion due to scalar quantizing gain: (dĝ/dĥ)2/12 – High-rate distortion due to PVQ: (N – 1)2/(24K2) ● ● – Assume g = ĝ, θ = ϑ, solve for K... ● 31 Derived experimentally, far too high at low rate (N – 2) DOF → should be (N – 2) times gain distortion K = (ĥ/β) sin ϑ (N – 1)/√2(N – 2) ≈ (ĥ/β) sin ϑ √N/2 Mozilla Loss Robustness ● ● K ≈ (ĥ/β) sin ϑ √N/2 – ĥ is offset by the companded reference gain, so can be wrong if there are losses – But if K is wrong, we’ll decode the wrong number of pulses, totally desyncing the bitstream Remove dependence on ĥ – sin ϑ ≈ ϑ → ĥ sin ϑ ≈ ĥϑ = ĥQθ(ϑ/Qθ) = (ϑ/Qθ)/β – (ϑ/Qθ) is the index encoded in the bitstream ● 32 Since Qθ not exact, can’t cap ϑ ≤ π/2 in bitstream Mozilla Inter-band Masking ● ĝ is per-band, but traditional activity masking is per-block – Could just sum ĝ over all bands – Actual model is that energy in one band masks energy in another ● ● – ρ = (ĝh2/(ĝh2 + ηĝl2))α, η controls amount of masking ● 33 Lower bands appear to mask higher, but not other way around Still very early... not much is tuned ρĥ then used to derive Qθ and K instead of ĥ Mozilla Calibration ● Activity masking always increases rate – 34 Scale base quantizer in each band to reduce rate ● Q = Q0L(1/β – 1) ● L is the maximum luma value – Just an approximation, seems to work okay – AM currently disabled for chroma Mozilla Activity Masking Example 35 No activity masking (base Q=23 ) Mozilla Activity Masking Example 36 Activity masking (base Q=23) Mozilla Open Issues ● ● Better entropy coding – Everything order-0 – Take advantage of correlation in gain/θ/noref/etc. Better RDO – Currently iterating over small range of gains, θs – Rate estimates very approximate ● Reducing overhead of loss-robust case ● Noise injection/folding ● Bit-exact implementation, tuning, etc. 37 Mozilla
© Copyright 2026 Paperzz