Lossy Distributed Source Coding

Lossy Distributed Source Coding
John MacLaren Walsh, Ph.D.
Multiterminal Information Theory, Spring Quarter, 2012
1
Lossy Distributed Source Coding Problem
XN
1
S1 ∈ {1, . . . , 2N R1 }
XN
2
S2 ∈ {1, . . . , 2N R2 }
XN
M
SM ∈ {1, . . . , 2N RM }
N
Ẑ 1
N
�
1 � �
E d(Z1,n , Ẑ1,n ) ≤ D1
N n=1
N
�
�
�
N 1
E d(Z2,n , Ẑ2,n ) ≤ D2
Ẑ 2 N
n=1
N
Ẑ L
N
�
1 � �
E d(ZL,n , ẐL,n ) ≤ DL
N n=1
YN
iid
(X1,n , . . . , XM,n , Yn , Z1,n , . . . , ZL,n ) ∼ pX1 ,...,XM ,Y,Z1 ,...,ZL
n
Figure 1: The general multiterminal lossy source coding problem.
The general multiterminal lossy source coding problem depicted in Fig. 1 is not solved, however we have inner and outer
bounds. As we will discuss in section 1.3, these inner and outer bounds do not match on even very simple variants of this
structure.
Important special cases of this problem include when L = M and Zm = Xm (the lossy variant of Slepian-Wolf), as well
as the case L = 1 which is known as the CEO problem.
1.1
Berger Tung Inner Bound
The Berger Tung Inner bound uses the familiar random binning technique. A series of conditional distributions for a collection of message random variables U1 , . . . , UM and reconstructions Ẑ1 , . . . , ẐL are created according to the Markov chain
relationships (X \m , U \m , Y, Z 1:L ) ↔ Xm ↔ Um and (X 1:M , Z 1:L ) ↔ (Y, U 1:M ) ↔ Ẑ 1:L that also obey the distortion
constraints
E[dm (Z` , Ẑ` )] ≤ D` , ` ∈ {1, . . . , L}
(1)
Encoder m generates a 2N R̂m × N dimensional matrix with elements IID according to pUm , where the ith row is denoted
N Rm
by U N
bins Bm (j), j ∈ {1, . . . , 2N Rm }. The bins and codebooks
m (i). Each codeword is randomly assigned to one of 2
are shared with the decoder. Encoder m operates by selecting a codeword that is jointly typical with its observations:
N
N
(U N
m (i), X m ) ∈ T (Um , Xm ), then transmitting the bin index jm of the bin to which this codeword was assigned. If there
was no such codeword, it simply transmits the first bin index.
The decoder operates by looking in the bins for (i1 , . . . , iM ) ∈ B1 (j1 ) × · · · × BM (jM ) giving a collection of codewords
N
N
N
jointly typical with each other and the side information: (U N
1 (i1 ), . . . , U M (iM ), Y ) ∈ T (U1 , . . . , UM , Y ). If it can find no
such collection of codewords, it selects im = 1 for all m ∈ {1, . . . , M }.
The following error events are useful in analyzing the rates necessary to achieve the desired distortions:
1
N
N
• Em
: There does not exists im ∈ {1, . . . , 2N R̂m } such that (U N
m (im ), Xm ) ∈ T (Um , Xm )
1
N
N
• E 2 : For every m ∈ {1, . . . , M } there exists a im ∈ {1, . . . , 2N R̂m } such that (U N
m (im ), Xm ) ∈ T (Um , Xm ) however
N
N
N
N
N
(U 1:M (i), X1:M , Y , Z1:L ) ∈
/ T (U 1:M , X 1:M , Y, Z 1:L ).
3
N
N
• EA
: For every m ∈ {1, . . . , M } there exists a im ∈ {1, . . . , 2N R̂m } such that (U N
m (im ), Xm ) ∈ T (Um , Xm ) and
N 0
N
N
N
N
N
N
(U N
(i),
X
,
Y
,
Z
)
∈
T
(U
,
X
,
Y,
Z
).
However,
for
the
set
A
⊆
{1,
.
.
.
,
M
},
(U
(i
),
1:M
1:M
1:L
1:M
A A U Ac (iAc ), Y ) ∈
1:M
1:L
TN (U A , U Ac , Y ) for some i0m 6= im , ∀m ∈ A with i0m ∈ Bm (jm ) for each m ∈ A.
The first error event involves the same style of analysis as in the rate distortion proof:
1
P[Em
]
=
R̂m
2N
Y
i=1
≤
2N R̂m
N
N
N
P[(U N
/ TN (Um , Xm )] = 1 − P[(U N
m (i), Xm ) ∈
m (i), Xm ) ∈ T (Um , Xm )]
N R̂m
(1 − (1 − )2−N (I(Um ;Xm )+c) )2
≤ 1 − (1 − ) + exp −2N R̂m 2−N (I(Xm ;Um )+c)
(2)
(3)
From this we observe that the probability of error will be arbitrarily small as N → ∞ provided that
R̂i > I(Xi ; Ui )
(4)
Error E2 has a probability that is arbitrarily small as N → ∞ via an extension of the Markov Lemma. Finally, turning our
attention to the third collection of error events, we have
h
i
X
N
N
N
N
0
N
N
0
N
3
P (U N
P[EA
] =
A (iA ), U Ac (iAc ), Y ) ∈ T (U A , U Ac , Y ), im ∈ Bm (jm ) ∀m ∈ A (U 1:M (i), X 1:M , Y ) ∈ T
i0A ,i0m 6=im ∀m∈A
X
=
i0A ,i0m 6=im ∀m∈A
≤
≤
X
h
i
N
N
N
N
0
N
N
N
P (U N
P [i0m ∈ Bm (jm ) ∀m ∈ A]
A (iA ), U Ac (iAc ), Y ) ∈ T (U A , U Ac , Y ) (U 1:M (i), X 1:M , Y ) ∈ T
2−N (I(U A ;(Y,U Ac ))−c)
i0A ,i0m 6=im ∀m∈A
N
2
X
m∈A
Y
m∈A
2−N Rm ≤ 2−N (I(U A ;(Y,U Ac ))−c)
!
Y
m∈A
2−N Rm (2N R̂m − 1)
R̂m − Rm − I(U A ; (Y, U Ac )) + c
This can be made arbitrarily small as N → ∞ if
X
i∈A
(5)
R̂i − Ri − I(Y ; UA |UAc ) < 0
Putting the two sets of inequalities together, we find that the requirements on the rates are
X
Ri > I(XA ; UA |Y, UAc )
(6)
(7)
i∈A
When its union over all possible choices of auxiliary variables U 1:M obeying the constraints is taken, this region may not be
convex, but any two achievable strategies can be time shared to get a convex combination in (R, D) space, hence can take a
convex hull. One way to show this convex hull in the math is to introduce a time sharing variable Q, as shown in the text.
1.2
Berger Tung Outer Bound
The Berger Tung outer bound involves the same rate and distortion expressions as the Berger Tung inner bound, but uses
changes the message Markov Chain conditions: (X \m , Y, Z 1:L ) ↔ Xm ↔ Um . (Note that the other messages U \m are absent
in the condition.)
1.3
Tightness & Other Bounds
As noted in the text, both the Berger Tung Inner and the Berger Tung Outer bounds have cases where they are not tight.
There is a gap between the Berger Tung inner bound and the Berger Tung Outer Bound for the scalar Gaussian squared
error CEO problem, and the Berger Tung inner bound was shown to give the rate distortion region in this case (Oohama ’05
and Prabhakaran, Tse, Ramachandran ’04). As discussed in 12.3, the Berger Tung inner bound is also tight for two Gaussian
source and squared error distortions. Wagner and Anatharam have given a tighter outer bound than the Berger Tung Outer
bound (2008). These problems are an area of active work in the community.
2
2
Multiple Descriptions Problem
S1 ∈ {1, . . . , �2N R1 �}
X̂1N
N
1 �
E[d(Xn , X̂1,n )] ≤ D1
N n=1
X̂0N
N
1 �
E[d(Xn , X̂0,n )] ≤ D0
N n=1
S1
X
N
S2
S2 ∈ {1, . . . , �2N R2 �}
X̂2N
N
1 �
E[d(Xn , X̂2,n )] ≤ D2
N n=1
Figure 2: The still generally unsolved multiple descriptions problem.
Surprisingly, this problem is not solved. We presented a pair of achievable regions and discussed when one of them was
tight.
2.1
El Gamal Cover Region
The El Gamal cover region splits each of the two descriptions S1 , S2 up into two parts S1 = (S10 , S11 ) and S2 = (S20 , S22 ).
The parts S10 and S20 are discarded by the decoders receiving only one description, and are concatenated into S0 = (S10 , S20 )
by the decoder receiving both descriptions.
• Codebook Generation: Generate a 2N R11 × N matrix with elements IID according to pX1 , denote the ith row of
N R2
this by XN
× N matrix with elements IID according to pX2 , denote the jth row of this
1 (i). Similarly, generate a 2
N
N R1
} × {1, . . . , 2N R22 } generate a 2N R0 × N matrix, with kth row XN
by X2 (j). For each (i, j) ∈ {1, . . . , 2
0 (i, j, k), and
with elements distributed according to the conditional distribution
N
Y
pX0 |X1 ,X2 (x0,n (i, j, k)|x1,n (i), x2,n (j))
(8)
n=1
Share these matrices with the encoder and decoder.
N
N
N
• Encoder: Select an i, j, k such that (X1 (i)N , XN
2 (j), X0 (i, j, k), X ) ∈ T (X1 , X2 , X0 , X). Split the bits in the index
k up into two parts k = (k1 , k2 ), and transmit the messages S1 = (i, k1 ) and S2 = (j, k2 ).
• Decoder: The decoder receiving only S1 = (i, k1 ) discards k1 and reproduces XN
1 (i). The decoder receiving only
N
S2 = (j, k2 ) discards k2 and reproduces XN
2 (j). The decoder receiving both S1 , S2 reproduces X0 (i, j, k).
Error events
• Ex : X N ∈
/ TN (X)
N
• E1 : for every i (XN
/ Tn (X1 , X)
1 (i), X ) ∈
• E2 : for every j (X2 N (j), X N ) ∈
/ Tn (X2 , X)
N
N
• E12 : for every i, j (XN
/ TN (X1 , X2 , X)
1 (i), X2 (j), X ) ∈
N
N
N
N R0
• E0 : there exists i, j such that (XN
},
1 (i), X2 (j), X ) ∈ T (X1 , X2 , X), but for every k ∈ {1, . . . , 2
N
N
N
N
N
(X1 (i), X2 (j), X0 (i, j, k), X ) ∈
/ T (X1 , X2 , X0 , X)
Properties of the typical set imply P[Ex ] → 0 as N → ∞.
A now standard proof bounding the probability that two independent sequences with matching marginals to a joint
distribution wind up in the typical set shows that if R11 > I(X1 ; X) and if R22 > I(X2 ; X), P[E1 ] and P[E2 ] go to 0 as
N → ∞.
3
The novel idea in the proof is the bounding of P[E12 ] via a clever application of Chebyshev’s inequality. In this vein, for
each x ∈ X N , define the random set
N
N
(9)
Gx = (i, j) ∈ {1, . . . , 2N R11 } × {1, . . . , 2N R22 } (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X)
and denote, naturally, the cardinality of this set by |Gx |
Noting that the event E12 is equivalent to the event |GXN | = 0, and applying the Chebyshev’s inequality
which implies by selecting = √E[X]
var(X)
P [E12 |Exc ]
h
i
p
1
P |X − E[X]| ≥ var(X) ≤ 2 ,
that P [|X − E[X]| ≥ E[X]] ≤
var(X)
(E[X])2 ,
(10)
we have
= P [|GXN | = 0 |Exc ] ≤ P [||GXN | − E[|GXN |]| ≥ E[|GXN |]] ≤
var(|GXN |)
(E[|GXN |])2
(11)
The mean and the variance can be bounded by viewing the cardinality as a sum of indicator functions. For x ∈ TN (X),
X
X
X
N
N
pX 1 (i) (x1 )pX 2 (j) (x2 )
(12)
E[|Gx |] =
P[(XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X)] =
i,j
i,j x1 ,x2 |(x1 ,x2 ,x)∈TN (X1 ,X2 ,X)
N (R11 +R22 −H(X1 )−H(X2 )+H(X1 ,X2 |X)−c1 )
≥
(1 − )2
while the variance can be bounded via
X N
N
N 0
N 0
N
var(|Gx |) =
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X), (X1 (i ), X2 (j ), x) ∈ T (X1 , X2 , X)
i,j,i0 ,j 0
−
X
i,j,i0 ,j 0
X
=
N 0
N
N
N 0
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i ), X2 (j ), x) ∈ T (X1 , X2 , X)
i,j,i0 6=i,j 0 6=j
+
X
i,j,i0 6=i
+
X
i,j,j 0 6=j
−
=
X
i,j,i0 ,j 0
X
i,j,i0 6=i
+
−
−
X
≤
X
i,j,j 0 6=j
X
i,j,i0 6=i
X
i,j
X
(14)
N 0
N
N
N 0
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i ), X2 (j ), x) ∈ T (X1 , X2 , X)
N
N
N 0
N
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X), (X1 (i ), X2 (j), x) ∈ T (X1 , X2 , X)
N
N
N
N 0
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X), (X1 (i), X2 (j ), x) ∈ T (X1 , X2 , X)
N 0
N
N
N 0
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i ), X2 (j ), x) ∈ T (X1 , X2 , X)
(15)
N
N 0
n
N
N
N
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i ), X2 (j), x) ∈ T (X1 , X2 , X) (X2 (j), x) ∈ T (X2 , X)
i,j,j 0 6=j
−
(13)
N
N
N
N
N 0
N
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i), X2 (j ), x) ∈ T (X1 , X2 , X) (X1 (i), x) ∈ T (X1 , X)
N
N
N
N 0
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i), X2 (j ), x) ∈ T (X1 , X2 , X)
N 0
N
N
N
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X) P (X1 (i ), X2 (j), x) ∈ T (X1 , X2 , X)
2
N
N
P (XN
1 (i), X2 (j), x) ∈ T (X1 , X2 , X)
(16)
2N (−H(X1 )−H(X2 )+H(X1 ,X2 |X)+c1 ) 2N (−H(X1 )−H(X2 |X)+H(X1 ,X2 |X)+c2 )
i,j,i0 6=i
+
X
2N (−H(X1 )−H(X2 )+H(X1 ,X2 |X)+c1 ) 2N (−H(X1 |X)−H(X2 )+H(X1 ,X2 |X)
i,j,j 0 6=j
−2N (R11 +R22 ) (2N R11 − 1) + (2N R22 − 1) + 1 (1 − )2 22N (−H(X1 )−H(X2 )+H(X1 ,X2 |X)−c1 )
4
(17)
(18)
≤
≤

22N (−H(X1 )−H(X2 )+H(X1 ,X2 |X)+c1 ) 
X
2N I(X2 ;X) +
i,j,i0 6=i
X
2N I(X1 ;X)
i,j,j 0 6=j
(19)
−(1 − )2 2−4N c1 (2−N R11 + 2−N R22 − 2−N (R11 +R22 ) )
22N (R11 +R22 −H(X1 )−H(X2 )+H(X1 ,X2 |X)+c1 ) 2−N R22 (1 − 2−N R11 )2N I(X2 ;X) + 2−N R11 (1 − 2−N R22 )2N I(X1 ;X)
−(1 − )2 2−4N c1 (2−N R11 + 2−N R22 − 2−N (R11 +R22 ) )
(20)
Substituting these bounds (13) and (19) into our bound (11), we have
P [||Gx | − E[|Gx |]| ≥ E[|Gx |]] ≤ (1 − )−2 2N 4c1 2−N R22 (1 − 2−N R11 )2N I(X2 ;X) + 2−N R11 (1 − 2−N R22 )2N I(X1 ;X)
−2−2N c1 (2−N R11 + 2−N R22 − 2−N (R11 +R22 ) )
(21)
Which approaches 0 as N → ∞ for a fixed, but small enough , since R11 > I(X1 ; X) and R22 > I(X2 ; X). Then selecting
R11 + R22 > H(X1 ) + H(X2 ) − H(X1 , X2 |X)
(22)
in (13) will ensure that the associated E[|Gx |] will grow large and not shrink to zero as N → ∞.
Finally considering the last error event
N
N
N
N
N
N
P[E0 ] ≤ P ∀k ∈ {1, . . . , 2N R0 }(XN
/ TN (X1 , X2 , X0 , X) (XN
1 (i), X2 (j), X0 (i, j, k), X ) ∈
1 (i), X2 (j), X ) ∈ T (X1 , X2 , X)
Y
N
N
N
N
N
N
N
N
≤
1 − P (XN
(23)
1 (i), X2 (j), X0 (i, j, k), X ) ∈ T (X1 , X2 , X0 , X) (X1 (i), X2 (j), X ) ∈ T (X1 , X2 , X)
k
2N R0
1 − (1 − )2−N (I(X0 ;X|X1 ,X2 )+c)
≤ 1 − (1 − ) + exp −2N R0 2−N (I(X0 ;X|X1 ,X2 )+c)
≤
(24)
(25)
which can be made arbitrarily small if
R0 > I(X0 ; X|X1 , X2 )
(26)
Putting the requirements (21), and (25) together we have
R11
>
I(X1 ; X)
(27)
R22
>
I(X2 ; X)
(28)
R11 + R22
>
(29)
R0
>
H(X1 ) + H(X2 ) − H(X1 , X2 |X)
I(X0 ; X|X1 , X2 )
(30)
which implies for the rates R1 = R01 + R11 and R2 = R02 + R22 the region
R1
R2
R1 + R2
≥ I(X1 ; X)
(31)
≥ I(X1 ; X2 ) + I((X0 , X1 , X2 ); X)
(33)
≥ I(X2 ; X)
(32)
where the last inequality was derived through
H(X1 ) + H(X2 ) − H(X1 , X2 |X) + I(X0 ; X|X1 , X2 ) = I(X1 ; X2 ) + I((X0 , X1 , X2 ); X)
(34)
Adding a time sharing variable Q to convexify the region, we have
R1
R2
R1 + R2
≥ I(X1 ; X|Q)
(35)
≥ I(X1 ; X2 |Q) + I((X0 , X1 , X2 ); X|Q)
(37)
≥ I(X2 ; X|Q)
5
(36)
2.2
Cases where EGC is Tight
• No Excess Rate: Regarding the decoder obtaining both descriptions, it is clear that is can not require less rate than
a decoder receiving a single message requiring this distortion. When the rates are selected so that this bound is tight,
that is for those rates such that R1 + R2 = R(D0 ), the (Ahlswede)
• Gaussian Squared Error. For Xn scalar Gaussian distributed and d the squared error, the EGC region is tight.
2.3
Zhang Berger Region
The Zhang Berger region grows the El Gamal cover region by allowing a certain part of each of the two descriptions to be
identical, with the remaining descriptions generated conditionally on this part. The achievability construction is as follows
• Codebook Generation: Generate a 2N Rc × N matrix with elements IID according to pU , and denote the ith row
of this matrix by UN (i). For each i ∈ {1, . . . , 2N Rc } generate a 2N R11 × N matrix whose rows XN
1 (i, j) are IID and
distributed according to
N
Y
pX1 |U (x1,n (i, j)|un (i))
(38)
n=1
and a 2N R22 × N matrix whose rows XN
2 (i, k) are IID and distributed according to
N
Y
pX2 |U (x2,n (i, k)|un (i))
(39)
n=1
For each (i, j, k) generate a 2N R0 × N matrix whose rows XN
0 (i, j, k, `) are IID according to
N
Y
pX0 |X1 ,X2 ,U (x0,n (i, j, k, `)|x1,n (i, j), x2,n (i, k), un (i))
(40)
n=1
Share the codebook matrices with the encoders and decoders.
• Encoder: Find (i, j, k, `) ∈ {1, . . . , 2N Rc } × {1, . . . , 2N R11 } × {1, . . . , 2N R22 } × {1, . . . , 2N R0 } such that
N
N
N
N
(UN (i), XN
1 (i, j), X2 (i, k), X0 (i, j, k, `), X ) ∈ T (U, X1 , X2 , X0 , X). The bits of the index ` is broken up into two
parts ` = (`1 , `2 ), and the two descriptions that are sent are S1 = (i, j, `1 ) and S2 = (i, k, `2 ).
• Decoder: The decoder receiving only S1 discards `1 and produces XN
1 (i, j). The decoder receiving only S2 discards `2
and produces XN
(i,
k).
The
decoder
receiving
both
descriptions
produces
XN
2
0 (i, j, k, `).
While we did not go into the details of the proof, we noted that this rate region obtains some extra points that are not in the
El Gamal Cover region for the binary source Hamming distortion case.
6
3
Successive Refinements
S1 ∈ {1, . . . , �2N R1 �}
S1
X̂1N
XN
X̂0N
S2
N
1 �
E[d(Xn , X̂1,n )] ≤ D1
N n=1
N
1 �
E[d(Xn , X̂0,n )] ≤ D0
N n=1
S2 ∈ {1, . . . , �2N R2 �}
Figure 3: The problem of successive refinement of information.
In successive refinements, we neglect one of the two decoders receiving an individual description in the multiple descriptions
problem, and we obtain the problem shown in Fig. 3. The El Gamal Cover inner bound for the multiple descriptions problem
can be specialized to this case by selecting X2 = 0. The region so obtained is
R1
R1 + R2
3.1
≥ I(X1 ; X)
(41)
≥ I((X1 , X0 ); X)
(42)
EGC Tightness: Converse Proof
Successive refinements represent a special case of the more general multiple descriptions problem for which the El Gamal
Cover region is tight. We presented the proof in class. Suppose that R1 and R2 are the rates of the successive descriptions
achieving distortions D1 and D0 , respectively. Following the normal rate distortion converse
N R1
≥ H(X̂1N ) ≥ H(X̂1N ) − H(X̂1N |X N ) = I(X̂1N ; X N ) = H(X N ) − H(X N |X̂1N )
=
N
X
H(Xn ) − H(Xn |X̂1N , X n−1 ) ≥
N
X
H(Xn ) − H(Xn |X̂0N , X̂1N , X
n=1
N (R1 + R2 ) ≥
≥
H(X̂1N , X̂0N ) ≥ H(X̂1N , X̂0N ) −
n=1
N
X
H(Xn ) − H(Xn |X̂1,n ) =
n=1
H(X̂1N , X̂0N |X N )
N
X
n−1
)≥
n=1
=
N
X
I(Xn ; X̂1,n )
n=1
I(X N ; (X̂0N , X̂1N ))
(43)
= H(X N ) − H(X N |X̂0N , X̂1N )
H(Xn ) − H(Xn |X̂0,n , X̂1,n ) =
N
X
I(Xn ; (X̂0,n , X̂1,n )) (44)
n=1
Rearranging, we obtain the following inequalities
R1
≥
R1 + R2
≥
D1
≥
D0
≥
N
1 X
I(Xn ; X̂1,n )
N n=1
N
1 X
I(Xn ; (X̂0,n , X̂1,n ))
N n=1
N
1 X
E[d(Xn , X̂1,n )]
N n=1
N
1 X
E[d(Xn , X̂0 )]
N n=1
(45)
(46)
(47)
(48)
showing that the vector (R1 , R2 , D1 , D2 ) is elementwise superior to a point in the convex hull of the rate distortion region
given by (40) and (41).
3.2
Successive Refinability of a Source and Distortion Measure
Regarding each of the two decoders individually in Figure 3, it is quite clear that they can not require less rate than a decoder
which must only reproduce the source at their associated distortion. Hence, if R(D) is the rate distortion function for the
7
source X and the distortion metric d(·, ·), the rates in the successive refinements problem must obey the bound R1 ≥ R(D1 )
and R1 + R2 ≥ R(D0 ). A source is successively refineable if these bounds are achievable. That is, for a successively refineable
source, the minimum average amount of extra information per symbol necessary to refine a description of the source at
distortion D1 to a description of the source at D0 is R(D0 ) − R(D1 ). A source is successively refineable if and only if for
every D0 ≤ D1 there is a conditional distribution pX̂0 ,X̂1 |X = pX̂0 |X pX̂1 |X̂0 (i.e. obeying the Markov chain X ↔ X̂0 ↔ X̂1
with pX1 |X and pX0 |X attaining the rate distortion bounds: I(X; X̂1 ) = R(D1 ) and I(X; X̂0 ) = R(D0 )
8