Error Exponents for Block Markov Superposition
Encoding with Varying Decoding Latency
Glenn J. Bradford and J. Nicholas Laneman
Department of Electrical Engineering
University of Notre Dame, Notre Dame, Indiana 46556
Email: {gbradfor, jnl}@nd.edu
Abstract—Block Markov superposition encoding has been used
on a number of channels to enable transmitter cooperation,
including the decode-and-forward (DF) relaying scheme on the
full-duplex relay channel. We analyze the error performance of
DF with regular encoding and sliding window decoding as the
window size of the decoder is allowed to grow. Specifically, we use
Gallager’s random coding exponent to analyze the behavior of
DF in the finite block length regime where the error probability
cannot be made arbitrarily small for a fixed rate and block length.
Although using a larger decoding window may not result in a
better achievable rate in the infinite block length regime, doing so
for finite block lengths enables a higher rate of transmission for
a given error probability. In particular, these rate enhancements
can lead to a larger range of operating scenarios in which relaying
can outperform direct transmission.
I. I NTRODUCTION
The fundamental analysis of a number of network communication problems relies on coding strategies that employ
block Markov superposition encoding, e.g., the full-duplex
relay channel [1] and the multiple-access channel (MAC) with
generalized feedback [2]. Such encoding allows nodes in a network to cooperatively send information, potentially enlarging
the rate region beyond that possible with independent transmissions. The available channel uses are divided into blocks, and
in each block new information is sent to the cooperating node
superimposed on information from the previous block, which
is being sent jointly to the destination. Several variations on
the approach exist (see [3] for a summary), of which we focus
on Carleial’s regular encoding and sliding window decoding
[2].
In regular encoding, codebooks for the new and old information in a given block of channel uses are the same size.
A sliding window decoder uses a window of L contiguous
blocks to decode a given chunk of information, the window
starting with the first block in which the information appears.
We use Gallager’s random coding exponent [4] to assess the
finite block length performance of decode-and-forward (DF)
using regular encoding/sliding window decoding on the relay
channel. In particular, we are interested in the behavior of the
decoding error probability as we vary the size of the sliding
window and thus the decoding delay for a given chunk of
information. Even though a window size of two is sufficient to
achieve the full rate of DF in the infinite block length regime,
we will see that allowing the window to be larger when the
block length is finite can result in superior probability of error
decay and a thus a better rate-reliability tradeoff.
Others have examined error exponents for block Markov
superposition coding on the relay channel [5] but have not
addressed the relationship between performance and sliding
window size. Error exponents for half-duplex relays have also
been investigated for several relaying schemes [6]–[8] and are
relevant to the case at hand, but such schemes do not employ
block Markov superposition coding. The work addressing the
error exponents of multihop networks [9]–[12] is also closely
related, as the number of blocks into which the channel uses
are divided affects both the rate and reliability of a given
scheme. We will see that there are a number of connections
between the block Markov error exponents and those of the
MAC [13]–[16].
In the sequel, bold variables will denote vectors with 1
being the all ones vector. Capital letters in calligraphic script
will denote sets (e.g. X ), the complement of which will be
denoted by c (e.g. X c ). N (µ, σ 2 ) will denote a Gaussian
random variable with mean µ and variance σ 2 .
II. S YSTEM M ODEL
The memoryless relay channel (X × U, p(y, v|x, u), Y × V)
[1] has two input alphabets at the source and relay, X and
U, two output alphabets at the destination and relay, Y and V,
and channel transition probability p(·, ·|·, ·). The alphabets can
be either discrete or continuous. We are allowed a total of D
channel uses to send a message W representing DReff nats of
information, where Reff is the effective rate of transmission in
nats per channel use. Block Markov superposition encoding is
the foundation for proving the DF achievable rate of the relay
channel. We focus on Carleial’s regular encoding and sliding
window decoding [2] for which the encoding and decoding
processes can be described as follows [3].
1) Encoding: The D channel uses are split into B blocks of
equal length N . Next, the message W is split into B − 1 submessages, each of which represents N R nats of information
B
Reff . Fix an input distribution p(x|u)p(u).
where R = B−1
In each block b of N channel uses, randomly generate
M = exp(N R) codewords for the relay of length
QN N , ub (i1 ),
i1 ∈ {1, . . . , M }, according to p(u) =
k=1 p(uk ). For
each ub (i1 ), generate M codewords for the source,
QN xb (i2 |i1 ),
i2 ∈ {1, . . . , M }, according to p(x|u) =
k=1 p(xk |uk ).
This method of codebook generation results in a superposition
code with cloud centers and satellites. Block Markov encoding
1
2
3
u1 (1)
u2 (W1 )
u3 (W2 )
x1 (W1 |1)
x2 (W2 |W1 )
x3 (W3 |W2 )
Un
Xn
Fig. 1.
B−1
B
uB-1 (WB-2 )
uB (WB−1 )
xB-1 (WB-1 |WB-2 )
xB (1|WB-1 )
...
Block Markov superposition encoding.
dictates that in block b, message Wb−1 selects the cloud center
and message Wb selects the satellite. Thus the source transmits
xb (Wb |Wb−1 ) and the relay ub (W̃b−1 ), where W̃b−1 is the
relay’s estimate of Wb−1 . Note that W0 := WB := 1. Figure 1
graphically depicts the mapping from messages to codewords.
2) Decoding: Maximum Likelihood (ML) decoding is performed at both relay and destination. Because the relay must
cooperate in the next block, it is forced to estimate message
Wb after receiving the observations of block b. Its ML decoding rule is
We are interested in determining a lower bound on E(Reff ) and
thus a rate-reliability tradeoff for the relay channel using the
techniques of [4]. Furthermore, we are interested in how this
rate-reliability tradeoff is affected by the sliding window size
employed at the destination. Although L = 2 is sufficient for
proof of the DF achievable rate, we will see that by allowing
a larger window size we can potentially obtain a larger error
exponent and thus superior probability of error decay.
W̃b = arg max p(vb |xb (i|W̃b−1 ), ub (W̃b−1 ))
To begin, following closely the analysis of [1] we upper
bound ǫ in terms of the decoding error probabilities at the
relay and destination in each block. Define the events for the
incorrect decoding of message Wb at the relay and destination
as Fb = {W̃b 6= Wb } and Gb = {Ŵb 6= Wb }, respectively, and
their union as Hb = {Fb ∪ Gb }. Then we have the following.
"B−1 #
[
(5)
ǫ = Pr
Gb
(1)
i
where its estimate of the previous message, W̃b−1 , is used
in the decoding. Note that it is not obvious ML decoding is
optimal at the relay.
At the destination, an ML sliding window decoder of
length L jointly decodes (JD) messages (Wb , . . . , Wb+L−1 )
by choosing the indices î = (î0 , . . . , îL−1 ) that maximize the
likelihood of the L blocks of observations,
JD
L (i) =
L−1
Y
l=0
L−2
Y
l=0
OF
E RROR A NALYSIS
b=1
p(yb+l |xb+l (il |il−1 ), ub+l (il−1 ))
(2)
where i−1 := Ŵb−1 , but only keeps the estimate of message
Wb , i.e., Ŵb = î0 . Subsequent messages are decoded after
the reception of future blocks, a distinction that will prove
important. Individual ML decoding of Wb could result in better
performance, but the analysis is more difficult [14]. Note that
(2) does not account for error propagation at the relay.
The structure of the superposition code gives us the flexibility to treat the satellite in the last block of the window
as noise (NS). In this case, the ML decoding rule is slightly
modified, with one less message decoded, î = (î0 , . . . , îL−2 ),
and a likelihood function of
LNS (i) = p(yb+L−1 |ub+L−1 (iL−2 ))
·
III. P ROBABILITY
p(yb+l |xb+l (il |il−1 ), ub+l (il−1 )).
(3)
3) Problem Statement: Our interest lies in the finite block
length regime, in which the error probability cannot be made
arbitrarily small for fixed rate and number of channel uses.
Under the DF scheme with regular encoding/sliding window
decoding, define the probability of decoding error averaged
over all codebooks and codewords as ǫ = Pr[Ŵ 6= W ]. Define
E(Reff ) as the exponential decay of ǫ with D,
log(ǫ)
.
E(Reff ) := − lim
D→∞
D
(4)
≤ Pr
≤
#
(6)
c
Pr[Hb ∩ Hb−1
].
(7)
"B−1
[
B−1
X
b=1
b=1
Hb
The probability in the sum of (7) is that of incorrectly decoding
Wb at either the relay or destination given that Wb−1 was
correctly decoded at both nodes. This probability can be
further bound as
c
c
Pr[Hb ∩ Hb−1
] = Pr[{Fb ∪ Gb } ∩ Hb−1
]
(8)
c
c
c
= Pr[Fb ∩ Hb−1 ] + Pr[Fb ∩ Gb ∩ Hb−1 ]
(9)
c
c
≤ Pr[Fb |Fb−1
] + Pr[Gb |Fbc ∩ Gb−1
] (10)
= ǫR + ǫD
(11)
where the definitions of ǫR and ǫD should be clear from the
context. Similar to (4), we define ER (R) and ED (R) as the
exponential decay of ǫR and ǫD , respectively, but with respect
to N , the codeword length in a given block. Recall that R =
B
B−1 Reff . Continuing the bounding of ǫ starting at (7)
ǫ ≤ (B − 1)(ǫR + ǫD )
≤ (B − 1)[exp(−N ER (R)) + exp(−N ED (R)]
≤ 2(B − 1) exp(−N min[ER (R), ED (R)]),
(12)
(13)
(14)
which gives a lower bound on the error exponent of the relay
channel employing DF of
log 2(B − 1)
1
min{ER (R), ED (R)} −
. (15)
B
D
Note that if D is large relative to B, the last term of (15)
vanishes. What remains is to determine ER (R) and ED (R).
E(Reff ) ≥
A. Derivation of ER (R)
We seek to lower bound the exponential decay of ǫR ,
the probability that the relay incorrectly decodes Wb when
Wb−1 is perfectly known. By the symmetry of the random
codebook generation, we can focus on Wb = 1 for all b. Event
c
{Fb |Fb−1
} occurs if for some index i 6= 1 the likelihood
in (1) with W̃b−1 = Wb−1 is greater than that of the true
index i = 1. There are M − 1 such indices. The superposition
structure of the codebook prevents a direct application of the
coding theorem [4, Thm. 5.6.1] to bound ǫR , as xb (i|1) and
xb (1|1) both depend on ub (1).
This dependence can be accounted for by following a
similar technique employed for the MAC [13]: first condition
on a given realization of the cloud center ub (1); apply the
coding theorem of [4, Thm. 5.6.1] for input distribution p(x|u)
to upper bound ǫR when ub (1) is the cloud center; and then
finally take the expectation of the bound on ǫR over the
distribution on the cloud centers, p(u). The resulting upper
bound for ǫR can be rearranged to determine a lower bound
on ER (R), given by
ER (R) ≥
max
0≤ρ≤1,p(x,u)
[E01 (ρ, p(x, u)) − ρR]
(16)
where
E01 (ρ, p(x, u)) = − log
X
U ,V
p(u)
"
X
X
p(x|u)p(v|x, u)
1
1+ρ
#1+ρ
(17)
The optimization in (16) is over all possible joint input
distributions p(x, u) and a tilting parameter ρ.
B. Derivation of ED (R)
At the destination, the analysis is more involved due to the
sliding window decoder. Recall that ǫD is the probability Wb is
incorrectly decoded given that Wb and Wb−1 were correctly
decoded at the relay and destination, respectively. Although
the JD decoder estimates messages (Wb , . . . , Wb+L−1 ), we are
only concerned with the probability that it incorrectly decodes
Wb . That is, the maximizing set of indices of (2) has î0 6= 1
where we again assume Wb = 1 for all b. This error event
can be divided into 2L−1 disjoint events, which we label t ∈
{1, . . . , 2L−1 }, each of which entails incorrectly decoding Wb
and some unique combination of errors in decoding the other
L−1 messages. This partitioning of the error is reminiscent of
the treatment of the MAC channel in [13] except, as in [14],
we are only concerned with the decoding outcomes for one of
the messages.
Error event t can be represented by a unique set of indices
T ⊆ {1, . . . , L − 1}, where l ∈ T implies the event results in
îl 6= 1. Now ǫD can be bounded by the sum of the probabilities
of each possible error event
ǫD =
X
Pr[type t error occurs]
(18)
Pr[L(i of type t) ≥ L(1)],
(19)
t
≤
X
t
and we define ǫt := Pr[L(i of type t) ≥ L(1)]. If L is small
relative to N , then ED (R) is well approximated by the slowest
exponential decay among the error event probabilities
ED (R) ≥ min[EtJD (R)] −
t
log(2L−1 )
N
(20)
where EtJD (R) is the exponential decay of ǫt under the
decoding rule (2).
We now proceed to lower bound EtJD (R) for a given error
type t by again applying [4, Thm. 5.6.1]. A decoding error will
occur whenever the likelihood (2) for some i of error type t
is larger than that of the correct message sequence i = 1. For
type t error, there are (M −1)T with T = |T |+1 such possible
i. Furthermore, referring to Fig. 1 we see that, if in block l
the error type decodes both messages correctly, the error event
and correct message sequences have the same codewords,
xb+l (1|1) and ub+l (1). Thus the likelihood contribution of
block l, p(yb+l |xb+l , ub+l ), is identical for both and can be
dropped from the product in (2).
The coding theorem [4, Thm. 5.6.1] can now be applied to
the
remaining terms in (2) to upper bound ǫt . If t incorrectly
.
decodes the cloud center in block l, i.e. îl−1 6= 1, then the
codewords of the error sequence in that block were generated independently of the true codewords. As in Sec. III-A,
when t differs only in the satellite codeword in block l, the
corresponding codewords in l are no longer independent from
the true codewords. Again, we must condition on the cloud
centers of all such blocks before applying [4, Thm. 5.6.1]
and then take the expectation over the cloud centers on which
we have conditioned. After applying the coding theorem and
rearranging the result to expose the exponential dependence on
N , the random coding bound on EtJD (R) can be summarized
by the following proposition.
Proposition 1. Let T ⊆ {1, . . . , L − 1} be the subset
representing error event t under the JD decoding rule (2). The
exponential decay of probability for this event can be lower
bound as
EtJD (R) ≥ max [|Q0 |E0 + |Q1 |E01 − T ρR],
ρ,p(x,u)
(21)
where T = |T | + 1, Q0 = {l : l ∈ T ∪ {0} \ {L − 1}}, and
Q1 = {l : l ∈ T ∪ {0}, l − 1 ∈
/ T ∪ {0}}. Furthermore,
1+ρ
X X
1
E0 (ρ, p(x, u)) = − log
p(x, u)p(y|x, u) 1+ρ
E01 (ρ, p(x, u)) = − log
X
1.4
U ,X
p(u)
U ,Y
"
X
1
p(x|u)p(y|x, u) 1+ρ
X
(22)
#1+ρ
(23)
When the destination uses the NS decoding rule (3), it does
not attempt to decode message Wb+L−1 but rather treats the
satellite codeword in block b + L − 1 as noise. This leads to
a slightly different result for EtNS (R).
Proposition 2. Let T ⊆ {1, . . . , L − 2} be the subset
representing error event t under the NS decoding rule (3). The
exponential decay in probability for this event can be lower
bound as
EtNS (R) ≥ max [|Q0 |E0 + |Q1 |E01 + Q2 E02 − T ρR], (24)
JD L=1
JD L=2
JD L=3
JD L=4
NS L=2
NS L=3
NS L=4
1.6
.
Error Exponent ED(R)
Y
1.8
1.2
1
0.8
0.6
0.4
0.2
0
Increasing L
0
0.2
0.4
0.6
0.8
R (nats/channel use)
1
1.2
1.4
Fig. 2. Error exponents at destination (ED (R)) for joint decoding of message
Wb+L−1 (JD) and treating it as noise (NS) on the AWGN channel with
γSD = γRD = 7 dB.
ρ,p(x,u)
where T , Q1 , E0 and E01 are as in Prop. 1. Furthermore,
Q0 = {l : l ∈ T ∪ {0} \ {L − 2}}, and
1, if L − 2 ∈ T
Q2 =
(25)
0, otherwise
#1+ρ
"
X X
1
. (26)
E02 (ρ, p(u)) = − log
p(u)p(y|u) 1+ρ
Y
U
IV. E XAMPLE : AWGN R ELAY C HANNEL
For illustration, we apply the derived results to the additive
white Gaussian noise (AWGN) relay channel. Assuming a unit
power constraint at both source and relay, the input-output
relationship can be expressed as
√
√
Y = γSD X + γRD U + Z
(27)
√
(28)
V = γSR X + ZR
with the noise terms Z, ZR ∼ N (0, 1) and γSD , γSR , and
γRD being the signal-to-noise ratios on the source-destination,
source-relay, and relay-destination links, respectively. The
input distributions
are chosen such that U ∼ N (0, 1) and
√
X|U ∼ N ( αU, (1 − α)). The parameter 0 ≤ α < 1
controls the amount of cooperation between source and relay.
Independent and identically distributed Gaussian inputs are
suboptimal when evaluating random coding exponents on the
AWGN channel but are often used as they provide convenient
expressions [12].
We first investigate the effect of varying the sliding window
size L on ED (R) when jointly decoding Wb+L−1 (JD) and
treating it as noise (NS). Figure 2 plots ED (R) versus R for
the AWGN channel in which γSD = γRD = 7 dB and α =
0. It is clear that by increasing L we obtain superior error
decay for both JD and NS decoding schemes at the expense of
additional decoding delay and complexity. From Fig. 1, we see
that the presence of Wb+1 in block b + 1 causes interference
in the decoding of Wb . By including block b + 2 (L = 3)
in the decoding window, we get more information about the
structure of the interference induced by Wb+1 and are better
able to eliminate this interference. Continuing to increase L
leads to this effect being repeated. More information about
Wb+2 means Wb+1 is more reliably decoded, which in turn
means Wb is more reliably decoded, and so forth.
As L increases (L = 3, 4), Fig. 2 indicates that two distinct
error events are dominate at low and high rates, respectively.
At low rates this event is that of correctly decoding all messages except for Wb , i.e., T = ∅. At high rates this event is that
of incorrectly decoding all messages, i.e., T = {1, . . . , L − 1}
for JD. For the former case, the error event codewords differ
from those of the true message sequence only in blocks b and
b + 1. Allowing L to grow will not change the number of
blocks in which the codewords of the correct and incorrect
sequence differ and thus has no effect on EtJD (or EtNS ) for
this t. For the latter case, the error event codewords differ from
those of the true message sequence in all blocks. Allowing L
to grow will thus increase the distance between the codewords
and lead to an improved EtJD (or EtNS ) for this t.
These two constraints are reminiscent of the MAC error
exponent [13] and diversity-multiplexing tradeoff [16], where
the single-user and sum-rate terms dominate over the low and
high rate regions, respectively. In Fig. 2 we see that the first
error event (T = ∅) limits our performance as it does not
change with L. It is only advantageous to increase L when
we are operating at a rate R such that this bound is not active.
Finally, we note that JD has a superior bound on the second
dominant error event at higher rates and NS a superior bound
at lower rates. JD and NS have identical bounds for the first
dominant error event.
Now we return to the original problem, that of the rate-
11
0.7
10
0.6
9
8
7
0.4
Increasing L
0.3
Direct
6
Increasing L
5
4
0.2
3
Direct
NS L=2
NS L=3
NS L=4
NS L=5
0.1
0
Blocks B
Reff (nats/channel use)
0.5
500
1000
1500
2000 2500 3000
D (channel uses)
3500
4000
4500
5000
Fig. 3. Effective rate of transmission (Reff ) versus total channel uses (D)
for direct transmission and DF relaying (NS L = 2, 3, 4, 5) with ǫ ≤ 10−4
on an AWGN channel with γSD = γRD = 0 dB, γSR = 15 dB.
reliability tradeoff for the relay channel with a sliding window
decoder. We wish to know the maximum rate Reff we can
send information for a fixed number of channel uses D and
constraint on the allowable decoding error probability ǫ. We
use (13) to bound ǫ in terms of ER (R) and ED (R), which
are in turn bound by (16) and (20), respectively. Recall that
the evaluation of bound (20) requires the use of either Prop. 1
or 2. We have the freedom to optimize over the cooperation
parameter α, the number of blocks B, and the tilting parameter
ρ. Given that D = N B is fixed, N must be large enough to
satisfy the constraint on ǫ, but B should be as large as possible
to reduce the rate penalty, Reff = B−1
B R. For certain network
geometries, allowing a larger decoding window size L enables
a smaller N to obtain the same ǫ and thus enables a larger B
and increased Reff .
Figure 3 plots Reff versus D using NS decoding with L =
2, 3, 4, 5 for one such network geometry where γSD = γRD = 0
dB, γSR = 15 dB, and ǫ ≤ 10−4 . Direct transmission with
twice the power, γSD = 3 dB, is also plotted. Note that NS
with L = 2 is the scheme used to prove the achievable rate for
DF in the infinite block length regime, but is inferior to direct
transmission up to D = 2000. However, allowing just one
additional block in the window, L = 3, results in relaying
being preferable to direct transmission at a much smaller
number of channel uses, D = 500. From this D onward, NS
L = 5 dominates all other schemes. At D = 2000, it gives
approximately a 17 percent increase in Reff over both direct
transmission and NS L = 2 relaying.
Figure 4 plots the optimum B for each scheme as a
function of D, giving an indication of the mechanism by which
larger window sizes are able to provide higher effective rates.
Roughly speaking, from the plot it can be seen that a larger
window size typically has a larger optimum number of blocks.
2
NS L=2
NS L=3
NS L=4
NS L=5
1
0
500
1000
1500
2000 2500 3000
D (channel uses)
3500
4000
4500
5000
Fig. 4. Optimum number of blocks B versus total channel uses (D) for
DF relaying (NS L = 2, 3, 4, 5) with ǫ ≤ 10−4 on an AWGN channel with
γSD = γRD = 0 dB, γSR = 15 dB.
A larger L enables a higher R and/or shorter N for the same
ǫD , both of which lead to a higher Reff . Note that the optimum
B is not monotonic due to the integer nature of B.
R EFERENCES
[1] T. Cover and A. E. Gamal, “Capacity theorems for the relay channel,”
Info. Theory, IEEE Trans. on, Sep. 1979.
[2] A. Carleial, “Multiple-access channels with different generalized feedback signals,” Info. Theory, IEEE Trans. on, Nov. 1982.
[3] G. Kramer, M. Gastpar, and P. Gupta, “Cooperative strategies and
capacity theorems for relay networks,” Info. Theory, IEEE Trans. on,
Sept. 2005.
[4] R. Gallager, Information Theory and Reliable Communication. Wiley,
1968.
[5] Q. Li and C. N. Georghiades, “On the error exponent of the wideband
relay channel,” in EUSIPCO 2006, Sep. 2006.
[6] E. Yilmaz, R. Knopp, and D. Gesbert, “Error exponents for backhaulconstrained parallel relay networks,” in IEEE PIMRC 2010, Sept. 2010.
[7] H. Q. Ngo, T. Quek, and H. Shin, “Amplify-and-forward two-way relay
networks: Error exponents and resource allocation,” Comm., IEEE Trans.
on, Sept. 2010.
[8] P. Deng, Y. Liu, G. Xie, X. Wang, B. Tang, and J. Sun, “Error exponents for two-hop Gaussian multiple source-destination relay channels,”
SCIENCE CHINA Info. Sciences, 2012.
[9] M. Sikora, J. Laneman, M. Haenggi, D. Costello, and T. Fuja,
“Bandwidth- and power-efficient routing in linear wireless networks,”
Info. Theory, IEEE Trans. on, June 2006.
[10] W. Zhang and U. Mitra, “Multihopping strategies: An error-exponent
comparison,” in IEEE ISIT 2007, June 2007.
[11] N. Wen and R. Berry, “Reliability constrained packet-sizing for linear
multi-hop wireless networks,” in IEEE ISIT 2008, July 2008.
[12] O. Oyman, “Reliability bounds for delay-constrained multi-hop networks,” Available at: http://arxiv.org/abs/0810.5098, 2008.
[13] R. Gallager, “A perspective on multiaccess channels,” Info. Theory, IEEE
Trans. on, Mar. 1985.
[14] L. Weng, S. Pradhan, and A. Anastasopoulos, “Error exponent regions
for Gaussian broadcast and multiple-access channels,” Info. Theory,
IEEE Trans. on, July 2008.
[15] T. Guess and M. Varanasi, “Error exponents for maximum-likelihood
and successive decoders for the gaussian cdma channel,” Info. Theory,
IEEE Trans. on, July 2000.
[16] D. Tse, P. Viswanath, and L. Zheng, “Diversity-multiplexing tradeoff in
multiple-access channels,” Info. Theory, IEEE Trans. on, Sept. 2004.
© Copyright 2026 Paperzz