A Short Course on Polar Coding

Table of Contents
L1: Information theory review
A Short Course on Polar Coding
L2: Gaussian channel
Theory and Applications
L3: Algebraic coding
L4: Probabilistic coding
Prof. Erdal Arıkan
L5: Channel polarization
Electrical-Electronics Engineering Department,
Bilkent University, Ankara, Turkey
L6: Polar coding
L7: Origins of polar coding
Center for Wireless Communications,
University of Oulu, 23-25 May 2016
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
Lecture 1 – Information theory review
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
◮
L4: Probabilistic coding
L5: Channel polarization
◮
L6: Polar coding
Objective
◮
Establish notation
◮
Review the channel coding theorem
Reference for this part: T. Cover and J. Thomas, Elements of
Information Theory, 2nd ed., Wiley: 2006.
L7: Origins of polar coding
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
L1: Information theory review
1/20
L1: Information theory review
2/20
Notation and conventions - I
Notation and conventions - II
◮
Upper case letters X , U, Y , . . . denote random variables
◮
Lower case letters x, u, y , . . . denote realization values
◮
Script letters X , Y, · · · denote alphabets
◮
◮
◮
◮
◮
X N = (X1 , . . . , XN ) denotes a vector of random variables
Xij = (Xi , . . . , Xj ) denotes a sub-vector of X N
Similar notation applies to realizations:
L1: Information theory review
xN
and
◮
xij
Notation
3/20
Entropy
PX (x) denotes the probability mass function (PMF) on a
discrete rv X ; we also write X ∼ PX (x)
Likewise, we use the standard notation PX ,Y (x, y ), PX |Y (x|y )
to denote the joint and conditional PMF on pairs of discrete
rvs
For simplicity, we drop the subscripts and write P(x), P(x, y ),
etc., when there is no risk of ambiguity
L1: Information theory review
Notation
Binary entropy function
For X ∼ Bern(p), i.e.,
(
1, with prob. p,
X =
0, with prob. 1 − p
Entropy of X ∼ P(x) is defined as
H(X ) =
X
x∈X
1
P(x) log
P(x)
entropy is given by
◮
H(X ) is a non-negative convex function of the PMF PX
◮
H(X ) = 0 iff X is deterministic
◮
H(X ) ≤ log |X | with equality iff PX is uniform over X
L1: Information theory review
4/20
Entropy
H(p)
1.0
0.5
H(X ) = H(p)
∆
= −p log2 (p) − (1 − p) log2 (1 − p)
5/20
L1: Information theory review
Entropy
0
p
0
0.5
1.0
6/20
Joint Entropy
◮
Fano’s inequality
Joint entropy of (X , Y ) ∼ P(x, y )
H(X , Y ) =
X
(x,y )∈X ×Y
◮
◮
∆
Pe = Pr(X 6= Y )
Conditional entropy of X given Y
H(X |Y ) = H(X , Y ) − H(Y ) =
◮
For any pair of jointly distributed rvs (X , Y ) over a common
alphabet X , the “probability of error”
1
P(x, y ) log
P(x, y )
satisfies
X
P(x, y ) log
(x,y )∈X ×Y
1
P(x|y )
H(X |Y ) ≤ H(Pe ) + Pe log(|X | − 1) ≤ 1 + log |X |.
H(X |Y ) ≥ 0 with eq. iff X if a function of Y
• Thus, if H(X |Y ) is bounded away from zero, so is Pe .
H(X |Y ) ≤ H(X ) with eq. iff X and Y are independent
L1: Information theory review
Entropy
7/20
Chain rule
L1: Information theory review
Entropy
8/20
Chanin rule - II
For any random vector X N = (X1 , . . . , XN )
◮
◮
◮
◮
H(X N ) = H(X1 ) + H(X2 |X1 ) + · · · + H(XN |X N−1 )
For any pair of rvs (X , Y ),
H(X , Y ) = H(X ) + H(Y |X )
=
H(X , Y ) = H(Y ) + H(X |Y )
H(X , Y ) ≤ H(X ) + H(Y ) with equality iff X and Y are
independent.
≤
N
X
i =1
N
X
H(Xi |X i −1 )
H(Xi )
i =1
with equality iff X1 , . . . , XN are independent.
L1: Information theory review
Entropy
9/20
L1: Information theory review
Entropy
10/20
Mutual information
◮
Mutual information bounds
For any (X , Y ) ∼ P(x, y ), the mutual information between
them is defined as
We have
I (X ; Y ) = H(X ) − H(X |Y ).
◮
0 ≤ I (X ; Y ) ≤ min{H(X ), H(Y )}
with
Alternatively,
I (X ; Y ) = H(Y ) − H(Y |Y )
or
◮
I (X ; Y ) = 0 iff X and Y are independent
◮
I (X ; Y ) = min{H(X ), H(Y )} iff X is a function of Y or vice
versa
I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )
L1: Information theory review
Mutual information
11/20
Conditional mutual information
◮
12/20
For any three-part ensemble (X , Y , Z ) ∼ P(x, y , z), the
mutual information between X and Y conditional on Z is
defined as
For any ensemble (X N , Y ) ∼ P(x1 , . . . , xN , y ), we have
I (X N ; Y ) = I (X1 ; Y ) + I (X2 ; Y |X1 ) + · · · + I (XN ; Y |X N−1 )
Alternatively,
I (X ; Y |Z ) = H(Y |Z )−H(Y |YZ ) = H(X |Z )+H(Y |Z )−H(X , Y |Z )
◮
Mutual information
Chain rule of mutual information
I (X ; Y |Z ) = H(X |Z ) − H(X |YZ )
◮
L1: Information theory review
=
N
X
i =1
I (Xi ; Y |X i −1 )
Examples exist for both
I (X ; Y |Z ) < I (X ; Y ) and I (X ; Y |Z ) > I (X ; Y )
L1: Information theory review
Mutual information
13/20
L1: Information theory review
Mutual information
14/20
Data processing theorem
Discrete memoryless channels (DMC)
If X → Y → Z form a Markov chain, i.e., if P(z|yx) = P(z|y ) for
all x, y , z, then
I (X ; Z ) ≤ I (X ; Y ).
A DMC is a conditional probability assignment
{W (y |x) : x ∈ X , y ∈ Y} for two discrete alphabets X , Y.
Proof: Use the chain rule to expand I (X ; YZ ) in two different
ways.
◮
I (X ; YZ ) = I (X ; Y ) + I (X ; Z |Y ) = I (X ; Y )
◮
◮
by Markov property
◮
I (X ; YZ ) = I (X ; Z ) + I (X ; Y |Z ) ≥ I (X ; Z )
L1: Information theory review
Mutual information
15/20
Channel coding
We write W : X → Y or simply W to denote a DMC
X is called the channel input alphabet
Y is called the channel output alphabet
W is called the channel transition probability matrix
L1: Information theory review
Channel coding theorem
16/20
Block code
Given a channel W : X → Y, a block code with length N and rate
R is such that
◮
◮
Channel coding is an operation to achieve reliable communication
over an unreliable channel. It has two parts.
◮
An encoder that maps messages to codewords
◮
A decoder that maps channel outputs back to messages
the message set consists of integers {1, . . . , M = 2NR }
the codeword for each message m is a sequence x N (m) of
length N over X N
◮
the decoder operates on channel output blocks y N over Y N
and produces estimates m̂ of the transmitted message m.
◮
the performance is measured by the probability of frame
(block) error, also called frame error rate (FER), which is
defined as
Pe = Pr(m̂ 6= m)
where m is the transmitted message which is assumed
equiprobable over the message set and m̂ denotes the decoder
output.
L1: Information theory review
Channel coding theorem
17/20
L1: Information theory review
Channel coding theorem
18/20
Channel capacity
Channel capacity theorem
The capacity C (W ) of a DMC W : X → Y is defined as the
maximum of I (X ; Y ) over all probability assignments of the form
For any fixed rate R < C (W ) and ǫ > 0, there exist block coding
schemes with rate R and Pe < ǫ provided the code block length N
can be chosen as large as desired.
PX ,Y (x, y ) = Q(x)W (y |x)
where Q is an arbitrary probability assignment over the channel
input alphabet X , or briefly,
C (W ) = maxQ(x) I (X ; Y ).
L1: Information theory review
Channel coding theorem
19/20
L1: Information theory review
Channel coding theorem
20/20
Lecture 2 – Additive White Gaussian Noise (AWGN)
channel
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
◮
L4: Probabilistic coding
◮
L5: Channel polarization
L6: Polar coding
◮
L7: Origins of polar coding
L8: Coding for bandlimited channels
Objective: Review the basic AWGN channel
Topics
◮
Discrete-time and continuous-time Gaussian channel
◮
Signaling over a Gaussian channel
◮
The union bound
Reference for this part: David Forney, Lecture Notes for
Course 6.452 Principles of Digital Communication II, Spring
2005, Available online: http://ocw.mit.edu.
L9: Polar codes for selected applications
L2: Gaussian channel
1/34
L2: Gaussian channel
Outline
2/34
Discrete-time (DT) AWGN channel
Capacity of the DT-AWGN channel
If a block code {x N (m) : 1 ≤ m ≤ M} is employed subject to a
“power constraint”
The input at time i is a real number xi , the output is given by
N
X
yi = xi + z i
i =1
where the noise sequence {zi } over the entire time frame is iid
Gaussian ∼ N(0, σ 2 ).
L2: Gaussian channel
Capacity
xi2 (m) ≤ NP,
the capacity is given by
P
1
C = log2 1 + 2
2
σ
3/34
Continuous-time (CT) AWGN channel
L2: Gaussian channel
Capacity
4/34
If signaling over the CT-AWGN channel is restricted to waveforms
{x(t) that are time-limited to [0, T ], band-limited to [−W , W ],
and power-limited to P, i.e.,
Z
y (t) = x(t) + w (t)
0
where x(t) is the channel input and w (t) is white Gaussian noise
with power spectral density No /2.
T
x 2 (t)dt ≤ PT ,
then the capacity is given by
C[b/s] = W log2
Capacity
bits.
Capacity of the CT-AWGN channel
This is a waveform channel whose output is given by
L2: Gaussian channel
1 ≤ m ≤ M,
5/34
L2: Gaussian channel
P
1+
No W
Capacity
bits/sec.
6/34
DT model for the CT-AWGN model
◮
Signal-to-Noise Ratio
By Nyquist theory, each use of the CT-AWGN channel with
signals of duration T and bandwidth W gives rise to 2WT
independent DT-AWGN channels.
◮
It is customary to use the DT channels in pairs of “in-phase”
and “quadrature” components of a complex number
◮
Accordingly, the capacity of the two-dimensional (2D)
DT-AWGN channels derived from a CT-AWGN channel are
given by
Es
bits/2D or bits/Hz
C2D = log2 1 +
No
◮
Primary parameters in an AWGN channel are: Signal
bandwidth W (Hz), signal power P (Watt), noise power
spectral density N0 /2 (Joule/Hz).
◮
Capacity equals C[b/s] = W log2 (1 + P/N0 W ).
◮
Define SNR = P/N0 W to write C[b/s] = W log2 (1 + SNR).
◮
Writing SNR = (P/2W )/(N0 /2), SNR can be interpreted as
the signal energy per real dimension divided by the noise
energy per real dimension.
◮
For 2D complex signalling, one may write SNR = (P/W )/N0
and interpret SNR as signal energy per 2D divided by the
noise energy per 2D.
where Es is the signal energy per 2D,
∆
Es =
P
= PT
2W
L2: Gaussian channel
∆
J/2D or J/Hz.
Capacity
7/34
Signal energy per 2D: Es
L2: Gaussian channel
Signalling
8/34
Spectral efficiency ρ and data rate R
∆
◮
Definition: Es = P/W (joules)
◮
Es can be interpreted as signal energy per two dimensions.
◮
For 2D (complex) signalling Es is the signal energy.
◮
For 1D (real) signalling, Es /2 is the energy per signal.
◮
Note that SNR = Es /N0 and one may write
◮
ρ is defined as the number of bits per two dimension over the
AWGN channel. Units: bits/two-dimension or b/2D.
◮
R is defined as the number of bits per second sent over the
AWGN channel. Units: bits/sec or b/s.
◮
Since there are W (2D/s) 2D dimensions per second, we have
R = ρW .
C[b/2D] = log2 (1 + Es /N0 )
◮
L2: Gaussian channel
Signalling
9/34
Since ρ = R/W , the units of ρ can also be expressed as
b/s/Hz (bits per second per Hertz).
L2: Gaussian channel
Signalling
10/34
Normalized SNR
◮
Another measure of signal-to-noise ratio: Eb /N0
Shannon’s law says that for reliable communication one has to
have
ρ < log2 (1 + SNR)
◮
Energy per bit is defined as
∆
Eb = Es /ρ,
or
SNR > 2ρ − 1.
◮
This motivates the definition
∆
Eb /N0 = Es /ρN0 = SNR/ρ.
∆
SNRnorm =
◮
and signal-to-noise ratio per information bit as
SNR
.
2ρ − 1
Shannon’s limit can be written in terms of Eb /N0 can be
written as
2ρ − 1
.
Eb /N0 >
ρ
◮
The function (2ρ − 1)/ρ is an increasing function of ρ > 0,
and as ρ → 0, approaches ln 2 ≈ 0.69 (1.59 dB), which is
called the ultimate Shannon limit on Eb /N0 .
Shannon limit now reads
SNRnorm > 1 (0dB).
◮
◮
The value of SNRnorm (in dB) for an operational system
measures “gap to capacity”, indicating how much room there
is for improvement.
L2: Gaussian channel
Signalling
11/34
Power-limited and band-limited regimes
L2: Gaussian channel
Signalling
12/34
Band-limited regime
Capacity and Bandwidth Tradeoff
25
◮
◮
W=1
W=2
Operation over an AWGN channels is classified as
“power-limited” if SNR ≪ 1 and “band-limited” if SNR ≫ 1.
20
Band−limited
regime
The Shannon limit on the spectral efficiency can be
approximated as
(
SNR log2 e, SNR ≪ 1;
ρ < log2 (1 + SNR) ≈
log2 SNR,
SNR ≫ 1.
Capacity (b/s)
◮
Signalling
10
5
0
−10
In the power-limited regime, the Shannon limit on ρ is
doubled by doubling the SNR (a 3 dB increase); while in the
band-limited case, doubling the SNR increases the Shannon
limit by only 1 b/2D.
L2: Gaussian channel
15
13/34
−5
0
5
10
15
SNR (dB)
20
25
30
35
40
◮
Doubling the bandwidth almost doubles the capacity in the
deep band-limited regime.
◮
Doubling the bandwidth has small effect if the SNR is low
(power-limited regime).
L2: Gaussian channel
Signalling
14/34
Power-limited regime
Signal constellations
Capacity and Bandwidth Tradeoff
◮
3
P/N0 = 1
P/N0=2
2.5
Capacity (b/s)
◮
2
Power
Limited
Regime
An N-dimensional signal constellation with size M is a set
A = {a1 , . . . , aM } ⊂ RN , where each element
aj = (aj1 , . . . , ajN ) ∈ RN is called a signal point.
The average energy of the constellation is defined as
E (A) =
1.5
M
M N
1 X
1 XX 2
||aj ||2 =
aji .
M
M
j=1
1
◮
0.5
0
1
2
3
4
5
W (dBHz)
6
7
8
9
2 (A) is defined as
The minimum squared distance dmin
10
◮
Doubling the SNR almost doubles the capacity in the deep
power-limited regime.
◮
Doubling the SNR increases the capacity by not more than 1
b/2D in the band-limited regime.
L2: Gaussian channel
j=1 i =1
Signalling
2
dmin
(A) = min ||ai − aj ||2 .
i 6=j
◮
15/34
Signal constellation parameters
The average number of nearest neighbors Kmin (A) is defined
as the average number of nearest neighbors (at distance
dmin (A)).
L2: Gaussian channel
Signalling
Uncoded 2-PAM
−α
16/34
+α
Some important derived parameters for each constellation are:
◮
Bit rate (nominal spectral efficiency)
ρ = (2/N) log 2 M
◮
(b/2D)
◮
Eb = α2
N=1
◮
SNR = Es /N0 = α2 /N0
◮
M=2
◮
SNRnorm = SNR/3
◮
ρ=2
◮
dmin = 2α
◮
Kmin = 1
◮
2 /E = 2
dmin
s
◮
Average energy per two dimensions:
Es = (2/N)E (A)
A = {−α, +α}
◮
(J/2D)
◮
◮
◮
Average energy per bit:
Eb = E (A)/ log2 (M) = Es /ρ
◮
Signalling
Es =
2α2
Probability of bit error
(J/b)
Z∞
√
1
2
√ e −u /2 du
Pb (E ) = Q( SNR) =
2π
√
2 (A)/E (A),
Energy-normalized figure of merits such as: dmin
2
2
dmin (A)/Es , or dmin (A)/Eb , which are independent of scale.
L2: Gaussian channel
E (A) =
α2
SNR
17/34
L2: Gaussian channel
Pulse Amplitude Modulation
18/34
Uncoded 2-PAM
Uncoded M-PAM
Uncoded 2−PAM
0
10
Uncoded 2−PAM
Ultimate Shannon limit
Shannon limit at ρ = 2
−1
10
◮
◮
−2
10
Signal set: A = α{±1, ±3, . . . , ±(M − 1)}
Parameters:
Pb(E)
◮
−3
10
◮
−4
10
◮
Coding Gain 7.8 dB
−5
10
◮
−6
10
−2
0
2
4
6
8
10
◮
12
Eb/N0 (dB)
◮
◮
Spectral efficiency: ρ = 2 b/2D
◮
Shannon limit: Eb /N0 > (2ρ − 1)/ρ = 3/2 (1.76 dB)
◮
◮
◮
Es = 2E (A) = 2α2 (M 2 − 1)/3 J/2D
SNR = Es /N0 = 2α2 (M 2 − 1)/3N0
SNRnorm = SNR/(2ρ − 1) = 2α2 /3
Probability of symbol error, Ps (E ), is given by
p
2(M − 1)
Q(α/σ) ≈ 2Q(α/σ) = 2Q( 3SNRnorm )
M
p
where σ = N0 /2.
Potential coding gain is 9.6 − 1.76 = 7.84 dB
Ultimate coding gain is 9.6 − (−1.59) = 10 dB with ρ → 0
Pulse Amplitude Modulation
E (A) = α2 (M 2 − 1)/3 J/D
Ps (E ) =
Target Pb (E ) = 10−5 achieved at Eb /N0 = 9.6 dB
L2: Gaussian channel
ρ = 2 log2 M b/2D
19/34
Uncoded M-PAM Performance
L2: Gaussian channel
Pulse Amplitude Modulation
20/34
Uncoded 4-QAM
Uncoded M−PAM, M>>1
0
10
Uncoded PAM
Shannon limit
−1
10
A = {(−α, −α), (−α, α), (α, −α), (α, α)}.
−2
s
P (E)
10
−3
10
−4
10
−5
10
−6
10
0
1
2
3
4
SNR
5
norm
◮
◮
◮
6
7
8
9
10
(dB)
Parameters:
◮ N =2
◮ M =4
◮ ρ=2
◮ E (A) = 2α2
◮ Es = 2α2
◮ Eb = α2
◮ dmin = 2α
◮ Kmin = 2
◮ d 2 /Es = 2
min
This curve is valid for any M-PAM with M ≫ 1.
Target Ps (E ) = 10−5 is achieved at SNRnorm = 8.1 dB.
Shannon limit is SNRnorm = 0 dB.
L2: Gaussian channel
Pulse Amplitude Modulation
21/34
L2: Gaussian channel
Quadrature Amplitude Modulation
22/34
Uncoded M × M-QAM
Uncoded QAM performance
Uncoded QAM
0
10
◮
◮
◮
◮
◮
◮
−1
10
−2
ρ = log2 M 2 = 2 log2 M b/2D
10
E (A) = 2α2 (M 2 − 1)/3 J/2D
2
s
◮
Uncoded QAM
Shannon limit
The signal constellation is A = AM-PAM × AM-PAM
Parameters:
P (E)
◮
2
Es = E (A) = 2α (M − 1)/3 J/2D
SNR = Es /N0 = 2α2 (M 2 − 1)/3N0
−4
10
2
SNRnorm = SNR/(2 − 1) = 2α /3
ρ
−5
10
Probability of symbol error, Ps (E ), is given by (see notes)
p
Ps (E ) ≈ 4Q( 3SNRnorm )
−6
10
◮
◮
Quadrature Amplitude Modulation
23/34
Cartesian product constellations
◮
◮
Given a constellation A, define a new constellation
K th Cartesian power of A:
as the
A′ = AK = A
| ×A×
{z· · · × A}
E.g., 4 − QAM is the second Cartesian power of 2 − PAM.
The parameters of A′ are related to those of A as follows:
◮
N ′ = KN
◮
M′ = MK
◮
E (A′ ) = K E (A)
◮
′
Kmin
= K Kmin
◮
Es′ = Es
Eb′ = Eb
′
dmin
= dmin
′
◮
◮
◮
2
3
4
SNR
5
6
7
8
9
10
(dB)
Curve valid for M × M-QAM with M ≫ 1.
Target Ps (E ) = 10−5 achieved at SNRnorm = 8.4 dB.
Gap to Shannon limit is 8.4 dB.
L2: Gaussian channel
Quadrature Amplitude Modulation
24/34
◮
Consider transmission over an AWGN channel using a
constellation A = {a1 , . . . , aM }. Suppose in each use of the
system a signal aj ∈ A is selected with probability p(aj ) and
sent over the channel.
◮
Given the channel output y, the receiver needs to makes a
decision â on which of the signal points aj was sent. There
are various decision rules.
◮
The Maximum A-Posteriori Probability (MAP) rule sets
âMAP = argmaxa∈A [p(a|y)] = argmaxa∈A [p(a)p(y|a)/p(y)].
◮
The Maximum Likelihood (ML) rule sets
âML = argmaxa∈A [p(y|a)].
◮
ρ =ρ
L2: Gaussian channel
1
MAP and ML decision rules
A′
K
◮
0
norm
◮
L2: Gaussian channel
−3
10
Quadrature Amplitude Modulation
25/34
ML and MAP rules are equivalent for the important special
case where p(aj ) = 1/M for all j.
L2: Gaussian channel
Decision rules
26/34
Minimum Distance decision rule
◮
Decision regions
Given an observation y, the Minimum Distance (MD) decision
rule is defined as
◮
âMD = argmina∈A ||y − a||.
◮
On an AWGN channel the ML rule is equivalent to the MD
rule. This is because on an AWGN channel, with input-output
relation y = a + n, the transition probability density is given by
◮
Consider a decision rule for a given N-dimensional
constellation A with size M. Let Rj ⊂ RN be the set of
observation points y ∈ RN which are decided as aj .
For a complete decision rule, the decision regions partition the
observation space:
RN =
1
2
e −||y−a|| /N0 .
p(y|a) =
N/2
(πN0 ) )
j=1
Thus, the ML rule âML = argmaxa∈A [p(y|a)] simplifies to
◮
âML = argmina∈A ||y − a||.
L2: Gaussian channel
Decision rules
27/34
Probability of decision error
◮
i 6= j.
Decision rules
28/34
M
X
j=1
◮
◮
p(aj ) Pr(E |aj ).
The regions Rj are also called the Voronoi regions.
Each region Rj is the intersection of M − 1 pairwise decision
regions Rji defined as
Rji = {y ∈ RN : ||y − aj ||2 ≤ ||y − ai ||2 }.
T
In other words, Rj = i 6=j Rji .
MAP rule minimizes Pr(E ).
Decision rules
Under the MD decision rule, the decision regions are given by
Rj = {y ∈ RN : ||y − aj ||2 ≤ ||y − ai ||2 for all i 6= j}
while the average probability of error equals
L2: Gaussian channel
Rj ∩ Ri = ∅,
Conversely, any partition of RN into M regions defines a
decision rule for N-dimensional signal constellations of size M.
L2: Gaussian channel
◮
Pr(E |aj ) = Pr(y ∈
/ Rj |aj ),
◮
Rj ;
Decision regions under the MD decision rule
Let E be the decision error event. For a receiver with decision
regions Rj , the conditional probability of E given that aj is
sent is given by
Pr(E ) =
M
[
29/34
L2: Gaussian channel
Decision rules
30/34
Probability of error under MD rule on AWGN
◮
◮
Under any rigid motion (translation or rotation) of a
constellation A, the Voronoi regions also move in the same
way.
◮
◮
Under the MD decision rule, on any additive AWGN channel
we have
Z
Z
pN (n)dn
p(y|aj )dy = 1 −
Pr(E |aj ) = 1 −
◮
This probability of error is invariant under rigid motions.
(Proof is left as exercise.) (Is this true for any additive noise?)
◮
Likewise, Pr(E ) is invariant under rigid motions.
1 P
If the mean m = M
j aj of a constellation A is not zero, we
may translate it by −m to reduce the mean energy from E (A)
to E (A) − ||m||2 without changing Pr(E ).
◮
L2: Gaussian channel
Decision rules
31/34
Pairwise error probabilities
However, for general constellations it becomes impractical to
determine the exact error probability. Often one uses some
bounds and approximations instead of the exact forms.
L2: Gaussian channel
32/34
The conditional probability of error is bounded (under the MD
decision rule on an AWGN channel) as
X
X ||ai − aj || √
.
Pr(E |aj ) ≤
Pr(aj → ai ) =
Q
2N0
i 6=j
i 6=j
◮
This leads to
M
||ai − aj ||
1 XX
√
Pr(E ) ≤
Q
.
M
2N0
j=1 i 6=j
it can be shown that
1
πN0
Decision rules
◮
Rji = {y ∈ RN : ||y − aj ||2 ≤ ||y − ai ||2 },
Pr(aj → ai ) = √
2
√1 e −u /2 du.
2π
One can express exact error probabilities for M-PAM and
(M × M)-QAM in terms of the Q function. (Exercise)
Recalling the pairwise error regions
Z
x
p
p
Pr(E ) = 1 − (1 − Q( 2Eb /N0 ))2 ≈ 2Q( 2Eb /N0 ).
L2: Gaussian channel
The pairwise error probability Pr(aj → ai ) is defined as the
probability that, conditional on aj being transmitted, the
received point y is closer to ai than to aj . In other words
Pr(aj → ai ) = Pr(||y − ai || ≤ ||y − aj || | aj )
◮
For 4-QAM
R∞
p
Pr(E ) = Q( 2Eb /N0 )
The union bound
We consider MD decision rules and AWGN channels here.
◮
For 2-PAM
where Q(x) =
Rj −aj
Rj
◮
Probability of decision error for some constellations
∞
e −x
d(ai ,aj )/2
Union bound
2 /N
0
dx = Q
||ai − aj ||
√
2N0
◮
One may also use the approximation
dmin (A)
√
.
Pr(E ) ≈ Kmin (A)Q
2N0
◮
The union bound is tight at sufficiently high SNR.
.
33/34
L2: Gaussian channel
Union bound
34/34
Lecture 3 – Algebraic coding
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
◮
L4: Probabilistic coding
◮
L5: Channel polarization
Objective: Introduce the rationale for coding, discuss some
important algebraic codes
Topics
◮
◮
L6: Polar coding
L7: Origins of polar coding
Why coding?
Some important algebraic codes
◮
Reed-Muller codes
◮
Reed-Solomon codes
◮
BCH codes
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
L3: Algebraic coding
1/35
Motivation for coding
Simple contellations such as PAM and QAM are far from
delivering Shannon’s promise. They have a large gap to
Shannon limit.
◮
Signaling schemes such as orthogonal, bi-orthogonal, simplex
achieve Shannon capacity when one can expand the
bandwidth indefinitely; however, after a certain point they
become impractical both in terms of complexity per bit and
bandwidth limitations.
binary data
Motivation
channel
encoder
modulator
binary
interface
binary data
Shannon’s proof shows that in the power-limited regime, the
key to achieving capacity is to begin with a simple 1D or 2D
constellation A, consider Cartesian powers AN of increasingly
high orders, and select a subset A′ ⊂ AN to improve the
minimum distance of the constellation at the expense of
spectral efficiency.
L3: Algebraic coding
2/35
Coding and modulation
◮
◮
L3: Algebraic coding
3/35
L3: Algebraic coding
channel
decoder
channel
demodulator
Motivation
4/35
Coding and Modulation
Spectral efficiency with coding and modulation
◮
Design codes in a finite field F taking advantage of the
algebraic structure to simplify encoding and decoding.
◮
Algebraic codes typically map a binary data sequence
N
u K ∈ FK
2 into a codeword x ∈ F2m for some m ≥ 1.
◮
◮
Modulation maps F2m into a signal set A ⊂ Rn for some
n ≥ 1 (typically n = 1, 2).
For example, if A = {−α, α}, one may use the mapping
0 → +α and 1 → −1.
L3: Algebraic coding
Motivation
5/35
Binary block codes
◮
For a typical 2D signal set A ⊂ R2 (such as a QAM scheme)
and a binary code of rate K /N, the spectral efficiency is
K
(b/2D)
ρ = log2 |A| ·
N
◮
Thus, coding reduces the spectral efficiency of the uncoded
constellation by a factor of K /N.
◮
It is hoped that coding will make up for the deficit in spectral
efficiency by improving the distance profile of the signal set.
◮
Goal: Design codes that have large minimum Hamming
distances in FN
2 (Hamming metric) and modulate them to
have correspondingly large Euclidean distances.
L3: Algebraic coding
6/35
Generators of a binary linear block code
◮
Definition
Let C ⊂ Fn2 be a binary linear code. Since C is a vector space,
it has a dimension k and there exists a set of basis vectors
G = {g1 , . . . , gk } that generate C in the sense that
k
X
C={
aj gj : aj ∈ F2 , 1 ≤ j ≤ k}.
A binary block code of length n is any subset C ⊂ {0, 1}n of the
set of all binary n-toples of length n.
j=1
Definition
◮
A code C is called linear if C is a subspace of the vector space Fn2 . .
◮
L3: Algebraic coding
Motivation
Binary block codes
7/35
Such a code C is called an (n, k) binary linear code. The set
G is called the set of generators of C.
An encoder for a code C with generators G can implemented
as a matrix multiplication x = aG where G is the generator
matrix whose i th row is gi , a ∈ Fk2 is the information word,
and x is the code word.
L3: Algebraic coding
Binary block codes
8/35
The Hamming weight
The Hamming distance
Definition
For x, y ∈ Fn2 , the Hamming distance between x and y is defined as
Definition
dH (x, y) = wH (x − y)
For x ∈ Fn2 , the Hamming weight of x is defined as
wH (x) = number of ones in x
The Hamming distance has the following properties for any
x, y, z ∈ Fn2 :
The Hamming weight has the following properties:
◮
◮
◮
◮
Non-negativity: wH (x) ≥ 0 with equality iff x = 0.
◮
Symmetry: wH (−x) = wH (x).
◮
Triangle inequality: wH (x + y) ≤ wH (x) + wH (y).
L3: Algebraic coding
Binary block codes
Non-negativity: dH (x, y) ≥ 0 with equality iff x = y.
Symmetry: dH (x, y) = dH (y, x).
Triangle inequality: dH (x, y) ≤ dH (x, z) + dH (z, y).
Thus, the Hamming distance is a metric in the mathematical sense
of the word and the space Fn2 with this metric is called the
Hamming space.
9/35
Distance invariance
L3: Algebraic coding
Binary block codes
10/35
Minimum distance
Definition
Theorem
The code minimum distance d of a code C is defined as the
minimum of d(x, y) over all x, y ∈ C with x 6= y.
The set of Hamming distance dH (x, y) from any codeword x ∈ C
to all codewords y ∈ C is independent of x, and is equal to the set
of Hamming weights wH (y) of all codewords y ∈ C.
Remark
The minimum distance d equals the minimum of wH (x over all
non-zero codewords x ∈ C.
Proof.
The set of distances from x is {dH (x, y) : y ∈ C}. This set can be
written as {wH (x + y : y ∈ C} = x + C. But x + C = C for a linear
code (why?). Taking x = 0, we obtain the proof.
L3: Algebraic coding
Binary block codes
Remark
We refer to an (n, k) code with minimum distance d as an
(n, k, d) code. For example, an (n, 1) repetition code has d = n
and is an (n, 1, d) code.
11/35
L3: Algebraic coding
Binary block codes
12/35
Euclidean Images of Binary Codes
Minimum distances
Binary codes C are mapped to signal constellations by the mapping
s : Fn2 → Rn
◮
which takes x → s so that
(
+α, if xi = 0,
si =
−α, if xi = 1.
When a code C is mapped to a signal constellation s(C) by
the mapping s defined above, the Hamming distances
translate to Euclidean distances as follows:
||s(x) − s(y)||2 = 4α2 dH (x, y)
◮
Thus, minimum code distance translates to a minimum signal
distance of
2
dmin
(s(C)) = 4α2 dH (C) = 4α2 d.
L3: Algebraic coding
Coding gain
13/35
Nominal coding gain, union bound
◮
◮
2 (s(C))
dmin
kd
=
4Eb
n
◮
Every signal has the same number of nearest neighbors
Kmin (x) = Nd .
◮
Union bound:
p
γc (s(C))2Eb /N0
Pb (E ) ≈ Kb /s(C))Q
Nd p
2d R Eb /N0
=
Q
k
where R = k/n is the code rate.
L3: Algebraic coding
Coding gain
Coding gain
14/35
Decision rules
When a code C is mapped to a signal constellation s(C), the
nominal coding gain of the constellation is given by
γc (s(C)) =
L3: Algebraic coding
15/35
Minimum distance (MD) decoding. Given a received vector
r ∈ Rn , find the signal point s(x) over all x ∈ C such that
||r − s(x)||2 is minimized.
◮
Hard-decision decoding. Given a received vector r ∈ Rn ,
quantize r into y ∈ F2n and find the codeword x ∈ C closest to
y in the Hamming metric.
◮
Erasure-and-error decoding. Map the received word r into a
word y ∈ {0, 1, ?}n and find the codeword x closest to y
ignoring the erased coordinates (where yk =?).
◮
Generalized minimum distance (GMD) decoding. Apply
erasures and errors decoding by erasing successively
s = d − 1, d − 3, . . . positions, using the reliability metric |rk |
to prioritize erasure locations. Pick the best candidate.
L3: Algebraic coding
Coding gain
16/35
Hard-decision decoding
Performance of some early codes
Hard-decisions are obtained by the mapping r → y such that
(
0, r > 0,
y=
1, r ≤ 0.
L3: Algebraic coding
Coding gain
17/35
Reed-Muller codes (Reed, 1954), (Muller, 1954)
◮
◮
◮
◮
◮
◮
n
m
◮
Performance limited both by the short block length and
hard-decision decoding.
L3: Algebraic coding
◮
m
Coding gain
18/35
Let
1 0
U1 =
,
1 1
RM(m, m) = {0, 1} with (n, k, d) = (2 , 2 , 1).
∆
∆
RM(0, m) = {0n , 1n } with (n, k, d) = (2m , 1, n).
∆
RM(−1, m) = {0n } with (n, k, d) = (2m , 0, ∞).
Define the remaining RM codes for m ≥ 1 and 0 ≤ r ≤ m
recursively by
◮
RM(r , m) = {(u, u+v)|u ∈ RM(r , m−1), v ∈ RM(r −1, m−1)}.
◮
Performance of some well-known codes under with
hard-decision decoding.
Generator matrices of RM codes
For every m ≥ 0 and 0 ≤ r ≤ m, there exists an RM code
RM(r , m).
Define the RM codes with extreme parameters as follows.
∆
◮
Um−1
0
Um =
,
Um−1 Um−1
∆
m ≥ 2.
The generator matrix of RM(r , m) is the submatrix of Um
consisting of rows of Hamming weight 2r or greater.
For any m ≥ 1, the matrix Um has m( , r ) rows with
Hamming weight 2m−r , 0 ≤ r ≤ m.
This construction of RM codes is called the Plotkin
construction.
L3: Algebraic coding
Reed-Muller codes
19/35
L3: Algebraic coding
Reed-Muller codes
20/35
Properties of RM codes
◮
◮
Tableaux of RM codes
RM(r , m) is a binary linear block
with parameters with
P code
parameters (n, k, d) = (2m , ri=0 ri , 2m−r ).
The dimensions satisfy the relation
k(r , m) = k(r , m − 1) + k(r − 1, m − 1).
◮
◮
◮
The codes are nested: RM(r − 1, m) ⊂ RM(r , m).
The minimum distance of RM(r , m) is d = 2m−r if r ≥ 0.
No of nearest neighbors is given by
Nd = 2r
Y
0≤i ≤m−r −1
2m−i − 1
.
2m−r −i − 1
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
L3: Algebraic coding
Reed-Muller codes
21/35
Coding gains of various RM codes
◮
RM(m − 1, m) are single parity-check codes with nominal
coding gains 2k/n which goes to 2 (3 dB) as n → ∞.
However, Nd = 2m (2m − 1)/2 and Kb = 2m−1 , which limits
the coding gain.
◮
RM(m − 2, m) are Hamming codes extended by an overall
parity. These codes have d = 4. The nominal coding gain is
4k/n which goes to 6 dB as n → ∞. The actual coding gain
is severely limited since Nd = 2m (2m − 1)/24 and Kb → ∞.
◮
L3: Algebraic coding
Reed-Muller codes
22/35
Reed-Muller codes
24/35
Reed-Muller coding gains
RM(1, m) (first-order RM codes) have parameters
(2m , m + 1, 2m−1 ). They have a nominal coding gain of
(m + 1)/2, which goes to infinity. These codes can achieve
the Shannon limit as m → ∞. RM(1, m) generates the
bi-orthogonal signal set of dimension 2m and size 2m+1 .
L3: Algebraic coding
Reed-Muller codes
23/35
L3: Algebraic coding
Decoding algorithms for RM codes
◮
Majority-logic decoding (Reed, 1964): A form of
successive-cancellation (SC) decoding. Sub-optimal but fast.
◮
Soft-decision SC decoding (Schnabl-Bossert, 1995): Superior
to Reed’s algorithm, but slower.
◮
ML decoding by using trellis representations: Feasible for
small code sizes.
Linear codes over finite fields
◮
An (n, k) linear code C over a finite field Fq is a k-dimensional
subspace of the vector space Fqn = (Fq )n of all n-tuples over
Fq . For q = 2, this reduces to our previous definition of binary
linear codes.
◮
As a linear subspace C has k linearly independent codewords
(g1 , . . . , gk ) that generate C, in the sense that
k
X
C={
aj gj : aj ∈ Fq , 1 ≤ j ≤ k}
j=1
Thus C has q k distinct codewords.
L3: Algebraic coding
Reed-Muller codes
25/35
Reed-Solomon (RS) codes
Introduced by Irving S. Reed and Gustave Solomon in 1960
◮
Can be defined over any field Fq
◮
A (n, k) RS code over Fq exists for any 0 ≤ k ≤ n ≤ q
Encoding: Given k data symbols (f0 , . . . , fk−1 ) over Fq ,
◮
◮
form the polynomial
f (z) = f0 + f1 z + · · · + fk−1 z k−1
◮
evaluate f (z) at each field element βi , 1 ≤ i ≤ q, namely,
Pk−1
compute f (βi ) = j=0 fj βij , to obtain the code symbols
◮
Typically constructed over Fq with q = 2m with each symbol
consisting of m bits
◮
Very effective against correcting burst errors confined to a
small number of symbols
◮
Major applications: Consumer electronics, outer code in
concatenated coding schemes
Decoding is usually by hard-decision:
◮
◮
◮
truncate if necessary to obtain a code of length n < q
L3: Algebraic coding
26/35
Minimum distance separable (MDS): A (n, k) RS code has
dmin = n − k, meeting the Singleton bound with equality
(f (β1 ), . . . , f (βq ))
◮
Reed-Solomon codes
Properties of RS codes
◮
◮
L3: Algebraic coding
Reed-Solomon codes
27/35
Berlekamp-Massey algorithm can correct any pattrern of
t ≤ n − k errors
Sudan-Guruswami (1999) algorithm can go beyond the
minimum distance bound
L3: Algebraic coding
Reed-Solomon codes
28/35
RS code application: G.975 optical transmission standard
BER performance under hard-decision decoding
ITU-T G.975 standard (year 2000) for long-distance
submarine optical transmission systems specified RS(255,239)
code as the forward error correction (FEC) method.
◮
In bits, this is a (2040, 1912) code with rate R = 0.9373.
◮
This RS code has dmin = 16 (in bytes) and can correct any
pattern of 8 byte errors.
◮
The BER requirement in this application is 10−12
◮
Data throughput 1 - 100 Gbps are supported
◮
G.975 RS codes continue to serve but are being superseded
lately by more powerful proprietary solutions (“3rd Generation
FEC”) that use soft-decision decoders and provide better
coding gains with higher redundancy
100
10-1
RS(255,239)
Uncoded
10-2
10-3
10-4
10-5
10-6
BER
◮
Performance of RS(255,239) code
10-7
10-8
10-9
10-10
10-11
10-12
10-13
10-14
10-15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
E b /N0 (dB)
L3: Algebraic coding
Reed-Solomon codes
29/35
Performance of RS(255,239) code
L3: Algebraic coding
Reed-Solomon codes
30/35
RS coding with concatenation
Over memoryless channels such as the AWGN channel powerful
codes may be obtained by concatenating an inner code consisting
of q = 2m codewords or signal points with an outer code over Fq .
Input BER vs output BER
100
10-1
10-2
RS(255,239)
10-3
10-4
Output BER
10-5
10-6
10-7
10-8
10-9
10-10
10-11
10-12
10-13
10-14
10-15
10-4
10-3
10-2
The inner code is typically a binary block or convolutional code.
The outer code is typically an RS code.
10-1
Input BER
L3: Algebraic coding
Reed-Solomon codes
31/35
L3: Algebraic coding
Concatenated coding
32/35
Interleaving
RS concatenated code application: NASA standard
In a concatenated coding scheme an error in the inner code
appears as a burst of errors to the outer code. To make the symbol
errors made by the inner decoder look memoryless “interleaving” is
used. A two dimensional array is prepared where outer coding is
applied on the rows and inner coding is applied on the columns.
◮
In 1970s NASA used an RS/CC concatenated code
◮
The inner code is a CC with rate-1/2 and 64 states
◮
The outer code is an RS(255,223) code over F256
◮
The code has an overall code rate 0.437 and a coding gain of
7.3 dB at 10−6
When an error occurs in the inner code, a column is affected,
which appears only as a single symbol error in the outer code.
L3: Algebraic coding
Concatenated coding
33/35
Performance of NASA concatenated code
L3: Algebraic coding
Concatenated coding
34/35
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
L4: Probabilistic coding
L5: Channel polarization
L6: Polar coding
L7: Origins of polar coding
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
L3: Algebraic coding
Concatenated coding
35/35
L4: Probabilistic coding
1/40
Lecture 4 – Probabilistic approach to coding
◮
◮
Convolutional codes
Objective: Review codes based on random-looking structures
Topics
◮
Convolutional codes
◮
Turbo codes
◮
Low-density parity-check (LPDC) codes
L4: Probabilistic coding
State diagram representation
For an encoder with memory ν, the number of states is 2ν .
◮
For the above example, the state diagram is
◮
Code performance improves with the size of the state
diagram, but decoding complexity also increases.
L4: Probabilistic coding
Convolutional codes
Introduced by Peter Elias in 1955
◮
In the example, a data sequence, represented by a polynomial
u(D), is multiplied by fixed generator polynomials to obtain
two codeword polynomials
y1 (D) = g1 (D)u(D),
2/40
◮
◮
L4: Probabilistic coding
y2 (D) = g2 (D)u(D)
Convolutional codes
3/40
Trellis representation
Including time in the state, we obtain the trellis diagram
representation.
4/40
L4: Probabilistic coding
Convolutional codes
5/40
Maximum-Likelihood decoding of convolutional codes
◮
ML decoding is equivalent to finding a shortest path from the
beginning to the end of the trellis.
◮
A dynamic programming problem, with complexity
exponential in the encoder memory.
◮
The trellis is usually truncated to make the search more
reliable.
Decoder error events
Errors occur when a path diverging from the correct path appears
more likely to the ML decoder.
dfree is defined as the minimum Hamming weight between any two
distinct paths through the trellis.
L4: Probabilistic coding
Convolutional codes
6/40
Union bound
L4: Probabilistic coding
Convolutional codes
7/40
Union bound example
Rate-1/2 convolutional code with 64 states (ν = 6)
100
The union bound for a rate R convolutional code
r
2Eb
Pb ≈ Kb Q
γc
N0
BER
10-2
where
◮
Kb is the average density of errored bits on an error path of
weight dfree
◮
γc = dfree R is the nominal coding gain.
ML decoding: Theoretical Upper Bound
ML decoding (unquantized): Simulation
Uncoded
10-1
10-3
10-4
10-5
10-6
0
2
4
6
8
10
12
Eb/No (dB)
Union bound is tight at high SNR
L4: Probabilistic coding
Convolutional codes
8/40
L4: Probabilistic coding
Convolutional codes
9/40
Effective coding gain: γeff
Best known convolutional codes
Rate-1/2 binary convolutional codes
ν
1
2
3
4
5
6
6
7
8
The effective coding gain for a coding system on an AWGN
channel with 2-PAM modulation is defined as
Eb ∆ Eb −
γeff =
N0 coded 2-PAM
N0 uncoded 2-PAM
where the EbNo are the values (in dB) required to achieve a target
BER.
L4: Probabilistic coding
Convolutional codes
10/40
Best known convolutional codes
L4: Probabilistic coding
dfree
3
5
6
7
8
10
9
10
12
γc
1.5
2.5
3
3.5
4
5
4.5
5
6
dB
1.8
4.0
4.8
5.2
6.0
7.0
6.5
7.0
7.8
◮
ν = log2 (no of states)
◮
γeff calculated at Pb = 10−6
L4: Probabilistic coding
Kb
1
1
2
4
5
46
4
6
10
γeff (dB)
1.8
4.0
4.6
4.8
5.6
5.6
6.1
6.7
7.1
Convolutional codes
11/40
Best known convolutional codes
Rate-1/3 binary convolutional codes
Rate-1/4 binary convolutional codes
ν
1
2
3
4
5
6
7
8
ν
1
2
3
4
5
6
7
8
dfree
5
8
10
12
13
15
16
18
γc
1.67
2.67
3.33
4
4.33
5
5.33
6
dB
2.2
4.3
5.2
6.0
6.4
7.0
7.3
7.8
Kb
1
3
6
12
1
11
1
5
Convolutional codes
γeff (dB)
2.2
4.0
4.7
5.3
6.4
6.3
7.3
7.4
12/40
L4: Probabilistic coding
dfree
7
10
13
16
18
20
22
24
γc
1.75
2.5
3.25
4
4.5
5
5.5
6
dB
2.4
4.0
5.1
6.0
6.5
7.0
7.4
7.8
Kb
1
2
4
8
6
37
2
2
Convolutional codes
γeff (dB)
2.4
3.8
4.7
5.6
6
6.0
7.2
7.6
13/40
Performance of convolutional codes
Tailbiting convolutional codes
Binary antipodal signalling (from Clark & Cain, Springer, 1981)
L4: Probabilistic coding
Convolutional codes
14/40
Application: WiMAX Standard
g2 (D) = 1+D 2 +D 3 +D 6 +D 7 .
Codes of various other rates are obtained by puncturing this
code.
Rate
dfree
Punc. pat. x
Punc. pat. y
Enc. output
L4: Probabilistic coding
◮
Look at the final state and start the encoder in that state.
1/2
10
1
1
x1 y1
2/3
6
10
11
x1 y1 y2
L4: Probabilistic coding
Convolutional codes
15/40
WiMAX convolutional code and modulation options
IEEE 802.16e (WiMAX) standard specifies a mandatory
tailbiting convolutional code with rate 1/2 and the generator
polynomials
g1 (D) = 1+D+D 2 +D 3 +D 7 ,
◮
To eliminate the overhead due to truncation, one may use a
tailbiting convolutional code.
Rate-1/2
Rate-1/3
◮
◮
3/4
5
101
110
x1 y1 y2 x3
Convolutional codes
5/6
4
10101
11010
x1 y1 y2 x3 y4 x5
16/40
Modulation
Rate
Payload options (bytes)
QPSK
QPSK
16-QAM
16-QAM
64-QAM
64-QAM
64-QAM
1/2
3/4
1/2
3/4
1/2
2/3
3/4
6, 12, 18, 24, 30, 36
9, 18, 27, 36
12, 24, 36
18, 36
18, 36
27
36
L4: Probabilistic coding
Convolutional codes
Spect. eff.
(bits/2D)
1
1.5
2
3
3
4
4.5
17/40
Effect of length on performance
Effect of length on performance
BER performance is insensitive to code length.
FER performance deteriorates with code length.
100
100
10-1
10-1
10-2
10
FER
BER
10-2
-3
10-4
10-5
10-6
0
0.5
1
1.5
2
2.5
Rate 1/2, QPSK, 6 Bytes, depth 6
Rate 1/2, QPSK, 12 Bytes, depth 6
Rate 1/2, QPSK, 18 Bytes, depth 6
Rate 1/2, QPSK, 24 Bytes, depth 6
Rate 1/2, QPSK, 30 Bytes, depth 6
Rate 1/2, QPSK, 36 Bytes, depth 6
10-3
Rate 1/2, QPSK, 6 Bytes, depth 6
Rate 1/2, QPSK, 12 Bytes, depth 6
Rate 1/2, QPSK, 18 Bytes, depth 6
Rate 1/2, QPSK, 24 Bytes, depth 6
Rate 1/2, QPSK, 30 Bytes, depth 6
Rate 1/2, QPSK, 36 Bytes, depth 6
10-4
3
3.5
4
10-5
4.5
0
Eb/No in dB
Convolutional codes
1
1.5
2
2.5
3
3.5
4
4.5
Eb/No in dB
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
L4: Probabilistic coding
0.5
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
18/40
Turbo codes
L4: Probabilistic coding
Convolutional codes
19/40
Turbo code with parallel concatenation of convolutional
codes
Convolutional codes are in recursive systematic form to facilitate
exchange of soft information.
Invented in early 1990s by Claude Berrou.
◮
Created by concatenating two (or more) codes with an
interleaver between the codes
◮
At least one of the encoders is systematic
◮
Each constituent code has its own decoder
◮
Decoders exchange soft information with each other in an
iterative manner
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
L4: Probabilistic coding
Turbo codes
20/40
L4: Probabilistic coding
Turbo codes
21/40
Turbo decoder
Turbo code performance
Turbo codes improved the state-of-the-art by a wide margin!
Turbo decoder for parallel concatenated turbo code uses two
separate decoders that exchange soft information.
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
L4: Probabilistic coding
Turbo codes
22/40
WiMAX Convolutional Turbo Codes (CTC)
Turbo codes
Turbo codes
23/40
WiMAX CTC Adaptive Modulation and Coding (AMC)
IEEE 802.16e (WiMAX) specifies a CTC with constituent codes of
rate 2/3 (“duobinary”).
L4: Probabilistic coding
L4: Probabilistic coding
WiMAX CTC offers a number of AMC options with various
payload sizes.
24/40
Rate
Modulation
1/2
QPSK
Spect. Eff.
(b/2D)
1
3/4
1/2
3/4
1/2
2/3
3/4
5/6
QPSK
16-QAM
16-QAM
64-QAM
64-QAM
64-QAM
64-QAM
1.5
2
3
3
4
4.5
5
L4: Probabilistic coding
Turbo codes
Payload options
(bytes)
12, 24, 36, 48, 60, 72,
96, 108, 120
9, 18, 27, 36, 45, 54
24, 48, 72, 96, 120
18, 36, 54
36, 72, 108
36, 72
36, 72
36, 72
25/40
WiMAX CTC performance: QPSK, Rate 1/2
WiMAX CTC performance vs spectral efficiency
The figure shows the WiMAX CTC performance at half-rate with
QPSK (4-QAM) modulation with payload ranging from 6 to 120
bytes. (Shannon limit is EbNo = 0.188 dB.)
The figure shows the WiMAX CTC performance as the spectral
efficiency ranges over 1, 1.5, 2, 3, 4, 4.5, 5 b/2D.
100
10
(120,60) QPSK AWGN
(72,54) QPSK AWGN
(120,60) 16-QAM AWGN
(72,54) 16-QAM AWGN
(108,54) 64-QAM AWGN
(72,48) 64-QAM AWGN
(72,54) 64-QAM AWGN
(72,60) 64-QAM AWGN
0
10-1
10-1
10-2
BER
BER
10-2
10-3
10-4
10-4
10-5
10
10-3
10-5
-6
10-6
10-7
0
0.5
1
1.5
2
2.5
3
3.5
4
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
Turbo codes
2
4
6
8
10
12
Eb/No in dB
Eb/No in dB
L4: Probabilistic coding
0
4.5
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
26/40
CCSDS (space telemetry) turbo code standard (1999)
L4: Probabilistic coding
Turbo codes
27/40
CCSDS turbo code payload and frame size options
CCSDS turbo code supports a wide range of payload and frame
sizes as shown in the table (all lengths are in bits). Note that there
are 8 bits of termination.
L4: Probabilistic coding
Turbo codes
28/40
L4: Probabilistic coding
Turbo codes
29/40
CCSDS turbo code performance
Low-Density Parity-Check (LDPC) codes
◮
CCSDS turbo code provides a performance leap over the
previous standard
◮
... but has an error floor
Invented in 1960s by Robert Gallager. The codewords are defined
as solutions of the equation
xH T = 0
where H is a sparse parity-check matrix, such as
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
L4: Probabilistic coding
Turbo codes
30/40
Belief Propagation (BP) decoding algorithm
L4: Probabilistic coding
LDPC codes
31/40
LDPC performance
Rate-1/2, length 107 LDPC codes with symbol degree bound dℓ .
◮
Gallager gave a
low-complexity decoding
algorithm based on passing
log-likelihood ratios (LLRs)
or “beliefs” along branches
of a graph.
◮
BP decoding algorithm
converges after a number
of iterations that is roughly
logarithmic in the code
block length
◮
BP algorithm is well-suited
to parallel implementation,
which makes LDPC codes
preferable in applications
requiring high throughput
and low latency.
(Figure credit: Forney and Costello, Proc. IEEE, June 2007.)
L4: Probabilistic coding
LDPC codes
32/40
L4: Probabilistic coding
LDPC codes
33/40
Application: WiMAX LDPC codes
WiMAX LDPC performance
The figure shows the performance of WiMAX LDPC coding and
modulation options (“max-log-map” decoding).
◮
◮
Rate
5/6
3/4
2/3
1/2
5/6
3/4
2/3
1/2
WiMAX offers a number of
LDPC code alternatives.
These codes may require a
maximum of 30 - 100
iterations for best
performance.
LDPC codes are not very
suitable for rate adaptation.
100
Length
2304
2304
2304
2304
576
576
576
576
10-1
10-2
BER
◮
10-3
r=5/6 L=2304
r=3/4 L=2304A
r=2/3 L=2304A
r=1/2 L=2304
r=5/6 L=576
r=3/4 L=576A
r=2/3 L=576A
r=1/2 L=576
10-4
10-5
10-6
10-7
-1
0
1
2
3
4
5
Eb/No in dB
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
L4: Probabilistic coding
LDPC codes
34/40
WiMAX LDPC performance
L4: Probabilistic coding
The figure shows the effect of the effect of the number of
iterations on the LDPC code performance (“max-log-map”).
100
100
10-1
10-1
BER
BER
10-3
10-4
10-4
10-5
10-5
10-6
10-6
-0.5
0
0.5
1
1.5
2
10-7
-1
2.5
Eb/No in dB
LDPC codes
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
Eb/No in dB
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
L4: Probabilistic coding
r=1/2 L=576 max 30 iter
r=1/2 L=576 max 100 iter
10-2
r=1/2 L=2304
r=1/2 L=2304 - min-sum
10-3
10-7
-1
35/40
WiMAX LDPC performance
The figure shows the effect of the effect of the “min-sum”
approximation on the LDPC code performance.
10-2
LDPC codes
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
36/40
L4: Probabilistic coding
LDPC codes
37/40
WiMAX LDPC/CTC performance comparison
WiMAX CTC/CC performance comparison
The figure shows the relative performance of WiMAX LDPC and
CTC codes.
The figure shows the relative performance of WiMAX CTC and
WiMAX CC codes.
0
100
10
CC(12,6), QPSK
CC(72,36) QPSK
CTC(12,6) QPSK
CTC(72,36) QPSK
10-1
−1
10
10-2
−2
10
−3
10
BER
FER
Turbo (576,288)
Turbo (960,480)
LDPC,(2304,1162)
LDPC, (576,288)
10-3
10-4
−4
10
10-5
−5
10
−1
−0.5
0
0.5
1
1.5
Eb/No in dB
2
2.5
3
10-6
3.5
WiMAX code comparisons
0.5
1
1.5
2
3.5
4
4.5
38/40
L4: Probabilistic coding
WiMAX code comparisons
39/40
L1: Information theory review
◮
Turbo and LDPC codes solve the coding problem for most
engineering purposes.
◮
Convolutional codes still have a place for very short payloads
(up to 100 bits) that need to be protected well (control
channel).
L2: Gaussian channel
L3: Algebraic coding
L4: Probabilistic coding
LDPC codes perform better at long block lengths where high
reliability, high throughput is required (optical channels, video
channels).
L5: Channel polarization
◮
Turbo codes are superior for applications where packet sizes
are moderate and the reliability requirement is not too high
(voice applications)
L7: Origins of polar coding
◮
Algebraic codes (RS and BCH in particular) have a role as
external codes in concatenated schemes.
L4: Probabilistic coding
3
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
Summary
◮
2.5
Eb/No in dB
(Simulations by Iterative Solutions Coded Modulation Library, 2007)
L4: Probabilistic coding
0
WiMAX code comparisons
L6: Polar coding
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
40/40
L5: Channel polarization
1/26
Lecture 5 – Channel polarization
The channel
Let W : X → Y be a binary-input discrete memoryless channel
◮
◮
Objective: Explain channel polarization
Topics:
X
◮
Channel codes as polarizers of information
◮
Low-complexity polarization by channel combining and splitting
◮
The main polarization theorem
◮
Rate of polarization
◮
◮
◮
W
input alphabet: X = {0, 1},
output alphabet: Y,
transition probabilities:
W (y |x),
L5: Channel polarization
2/26
Symmetry assumption
Y
L5: Channel polarization
x ∈ X,y ∈ Y
The setup
3/26
Capacity
Assume that the channel has “input-output symmetry.”
For channels with input-output symmetry, the capacity is given by
Examples:
∆
BSC(ǫ)
1−ǫ
0
BEC(ǫ)
0
0
ǫ
L5: Channel polarization
1−ǫ
ǫ
1−ǫ
1
1
The setup
1−ǫ
with X ∼ unif. {0, 1}
Use base-2 logarithms:
0
0 ≤ C (W ) ≤ 1
ǫ
ǫ
1
C (W ) = I (X ; Y ),
?
1
4/26
L5: Channel polarization
The setup
5/26
The main idea
The method: aggregate and redistribute capacity
New channels
(polarized)
Original channels
(uniform)
W
◮
◮
Channel coding problem trivial for two types of channels
◮
Perfect: C (W ) = 1
◮
Useless: C (W ) = 0
b
b
Wvec
b
b
W
WN−1
W
WN
Split
Combine
The method
6/26
Combining
◮
◮
L5: Channel polarization
The method
7/26
Conservation of capacity
Begin with N copies of W ,
use a 1-1 mapping
N
U1
X1
U2
X2
N
GN : {0, 1} → {0, 1}
◮
b
b
Transform ordinary W into such extreme channels
L5: Channel polarization
W1
Vector
channel
W
W
Combining operation is lossless:
◮ Take U1 , . . . , UN i.i.d. unif. {0, 1}
◮ then, X1 , . . . , XN i.i.d. unif. {0, 1}
◮ and
Y1
Y2
X1
U2
X2
C (Wvec ) = I (U N ; Y N )
GN
to create a vector channel
N
W
W
Y1
Y2
GN
N
= I (X ; Y )
Wvec : U N → Y N
UN
XN
W
= NC (W )
YN
Wvec
L5: Channel polarization
U1
The method
UN
XN
W
YN
Wvec
8/26
L5: Channel polarization
The method
9/26
Splitting
Polarization is commonplace
U1
Ui −1
C (Wvec ) = I (U N ; Y N )
=
N
X
U1
N
I (Ui ; Y , U
i −1
Y1
◮
)
◮
i =1
=
N
X
Ui −1
C (Wi )
Ui
i =1
Wvec
GN : {0, 1}N → {0, 1}N
Yi
Ui +1
Define bit-channels
◮
N
Wi : Ui → (Y , U
Polarization is the rule not the
exception
A random permutation
i −1
)
UN
is a good polarizer with high
probability
Equivalent to Shannon’s random
coding approach
U1
X1
U2
X2
W
W
Y1
Y2
GN
UN
XN
W
YN
YN
Wi
L5: Channel polarization
The method
10/26
Random polarizers: stepwise, isotropic
L5: Channel polarization
The method
11/26
The complexity issue
1
0.9
0.8
Capacity
0.7
0.6
◮
Random polarizers lack structure, too complex to implement
0.5
◮
Need a low-complexity polarizer
0.4
◮
May sacrifice stepwise, isotropic properties of random
polarizers in return for less complexity
0.3
0.2
0.1
0
5
10
15
20
25
30
Bit channel index
Isotropy: any redistribution order is as good as any other.
L5: Channel polarization
The method
12/26
L5: Channel polarization
The method
13/26
Basic module for a low-complexity scheme
The first bit-channel W1
Combine two copies of W
U1
+
U2
X1
X2
W
W
W1 : U1 → (Y1 , Y2 )
Y1
Y2
U1
G2
and split to create two bit-channels
+
W
random U2
W
Y1
Y2
W1 : U1 → (Y1 , Y2 )
W2 : U2 → (Y1 , Y2 , U1 )
C (W1 ) = I (U1 ; Y1 , Y2 )
L5: Channel polarization
Recursive method
14/26
The second bit-channel W2
L5: Channel polarization
Recursive method
Capacity conserved but redistributed unevenly
U1
W2 : U2 → (Y1 , Y2 , U1 )
U1
U2
+
15/26
W
W
U2
◮
Y1
+
X1
X2
W
W
Y1
Y2
Conservation:
C (W1 ) + C (W2 ) = 2C (W )
Y2
◮
Extremization:
C (W1 ) ≤ C (W ) ≤ C (W2 )
C (W2 ) = I (U2 ; Y1 , Y2 , U1 )
with equality iff C (W ) equals 0 or 1.
L5: Channel polarization
Recursive method
16/26
L5: Channel polarization
Recursive method
17/26
Extremality of BEC
Extremality of BSC (Mrs. Gerber’s lemma)
Let H−1 : [0, 21 ] → [0, 12 ] be the inverse of the binary entropy
function H(p) = −p log(p) − (1 − p) log(1 − p), 0 ≤ p ≤ 21 .
H(U1 |Y1 Y2 ) ≤ H(X1 |Y1 ) + H(X2 |Y2 )
− H(X1 |Y1 )H(X2 |Y2 )
U1
U2
+
X1
X2
W
W
Y1
Y2
H(U1 |Y1 Y2 ) ≥ H H−1 (H(X1 |Y1 )) ∗ H−1 (H(X2 |Y2 ))
with equality iff W is a BSC.
with equality iff W is a BEC.
U1
+
X2
U2
L5: Channel polarization
Recursive method
18/26
Notation
L5: Channel polarization
X1
W
W
Y1
Y2
Recursive method
19/26
For the size-4 construction
+
The two channels created by the basic transform
W
W
(W , W ) → (W1 , W2 )
will be denoted also as
W − = W1
and
W + = W2
Likewise, we write W −− , W −+ for descendants of W − ; and W +− ,
W ++ for descendants of W + .
L5: Channel polarization
Recursive method
20/26
L5: Channel polarization
Recursive method
21/26
... obtain a pair of W − and W + each
... duplicate the basic transform
+
+
L5: Channel polarization
W
W−
W
W+
W
W−
W
W+
Recursive method
21/26
... apply basic transform on each pair
+
+
L5: Channel polarization
Recursive method
L5: Channel polarization
Recursive method
21/26
... decode in the indicated order
+
W−
U1
W+
U3
W−
U2
W−
W+
U4
W+
21/26
L5: Channel polarization
+
Recursive method
W−
W+
21/26
... obtain the four new bit-channels
Overall size-4 construction
U1
W −−
U1
U3
W +−
U3
U2
W −+
U2
U4
W ++
U4
L5: Channel polarization
Recursive method
21/26
“Rewire” for standard-form size-4 construction
U1
+
+
U2
U3
+
+
X1
X2
X3
X4
U4
W
W
W
W
L5: Channel polarization
+
+
X3
+
+
X4
W
W
W
Y1
Y3
Y2
Y4
Recursive method
Y1
U1
Y2
U2
Y3
U3
Y4
U4
+
U7
+
21/26
+
+
+
+
+
+
+
U6
21/26
L5: Channel polarization
X1
X2
X3
X4
X5
+
X6
+
X7
+
X8
U8
Recursive method
X2
W
Size 8 construction
U5
L5: Channel polarization
X1
Recursive method
W
W
W
W
W
W
W
W
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
22/26
The first bit channel W −
Polarization of a BEC W
The first bit channel W − is a BEC.
Polarization is easy to analyze when W is a BEC.
W
W−
If W is a BEC(ǫ), then so are
and W + , with erasure probabilities
∆
ǫ− = 2ǫ − ǫ2
and
+ ∆
ǫ =ǫ
2
0
If W is a BEC(ǫ), then so are
and W + , with erasure probabilities
∆
ǫ− = 2ǫ − ǫ2
0
ǫ
ǫ
1
respectively.
L5: Channel polarization
1−ǫ
1−ǫ
W−
W−
?
and
23/26
∆
1
L5: Channel polarization
?
ǫ−
1
1 − ǫ−
respectively.
Recursive method
0
ǫ−
ǫ+ = ǫ2
1
1 − ǫ−
0
Recursive method
23/26
Polarization for BEC( 12 ): N = 16
The second bit channel W +
Capacity of bit channels
1
The second bit channel W + is a BEC.
0.9
0.8
and
+ ∆
ǫ =ǫ
2
W+
0
ǫ+
ǫ+
1
respectively.
1 − ǫ+
1 − ǫ+
0.7
0.6
Capacity
If W is a BEC(ǫ), then so are W −
and W + , with erasure probabilities
∆
ǫ− = 2ǫ − ǫ2
0
?
0.5
0.4
0.3
1
0.2
0.1
N=16
0
2
4
6
8
10
12
14
16
Bit channel index
L5: Channel polarization
Recursive method
23/26
L5: Channel polarization
Recursive method
24/26
Polarization for BEC( 21 ): N = 32
Polarization for BEC( 12 ): N = 64
Capacity of bit channels
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Capacity
Capacity
Capacity of bit channels
1
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
N=32
0
5
10
15
20
25
N=64
0
30
10
20
Bit channel index
L5: Channel polarization
40
50
Recursive method
24/26
L5: Channel polarization
Recursive method
24/26
Polarization for BEC( 12 ): N = 256
Capacity of bit channels
Capacity of bit channels
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Capacity
1
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
N=128
0
20
40
60
80
100
N=256
0
120
Bit channel index
L5: Channel polarization
60
Bit channel index
Polarization for BEC( 21 ): N = 128
Capacity
30
Recursive method
50
100
150
200
250
Bit channel index
24/26
L5: Channel polarization
Recursive method
24/26
Polarization for BEC( 21 ): N = 512
Polarization for BEC( 12 ): N = 1024
Capacity of bit channels
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Capacity
Capacity
Capacity of bit channels
1
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
N=512
0
50
100
150
200
250
300
350
400
450
N=1024
0
500
Bit channel index
L5: Channel polarization
100
200
300
400
500
600
700
800
900
Recursive method
24/26
L5: Channel polarization
Recursive method
Polarization martingale
Theorem (Polarization, A. 2007)
1
The bit-channel capacities {C (Wi )} polarize: for any
δ ∈ (0, 1), as the construction size N grows
no. channels with C (Wi ) > 1 − δ
−→ C (W )
N
C (W ++ )
C (W2 )
C (W +− )
C (W )
1000
Bit channel index
24/26
1
1−δ
and
C (W −+ )
no. channels with C (Wi ) < δ
−→ 1 − C (W )
N
C (W1 )
Theorem (Rate of polarization, √
A. and Telatar (2008))
C (W −− )
0
1
L5: Channel polarization
2
3
4
5
Recursive method
6
7
Above theorem holds with δ ≈ 2−
8
25/26
L5: Channel polarization
N.
Recursive method
δ
0
26/26
Lecture 6 – Polar coding
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
◮
L4: Probabilistic coding
◮
Objective: Introduce polar coding
Topics
L5: Channel polarization
L6: Polar coding
L7: Origins of polar coding
◮
Code construction
◮
Encoding
◮
Decoding
◮
Performance
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
L6: Polar coding
1/45
Polar code example: W = BEC( 21 ), N = 8, rate 1/2
I (Wi )
Rank
0.0039
8
frozen U1
0.1211
7
frozen U2
0.1914
6
frozen U3
0.6836
4
data U4
0.3164
5
frozen U5
0.8086
3
data U6
0.8789
2
data U7
0.9961
1
data U8
L6: Polar coding
+
+
+
+
+
+
+
+
+
+
+
+
W
W
W
W
W
W
W
W
Encoding
L6: Polar coding
2/45
Polar code example: W = BEC( 21 ), N = 8, rate 1/2
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
3/45
I (Wi )
Rank
0.0039
8
0
0.1211
7
0
0.1914
6
0
0.6836
4
U4
0.3164
5
0
0.8086
3
U6
0.8789
2
U7
0.9961
1
U8
L6: Polar coding
+
+
+
+
+
+
+
+
+
+
+
+
W
W
W
W
W
W
W
W
Encoding
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
3/45
Encoding complexity
Encoding: an example
Theorem
Encoding complexity for polar coding is O(N log N).
frozen
0
frozen
0
frozen
0
free
1
frozen
0
free
1
free
0
free
1
Proof:
◮
Polar coding transform can be represented as a graph with
N[1 + log(N)] variables.
◮
The graph has (1 + log(N)) levels with N variables at each
level.
◮
Computation begins at the source level and can be carried out
level by level.
◮
+
+
+
+
+
+
W
+
W
+
+
W
W
+
W
+
W
+
W
W
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
Space complexity O(N), time complexity O(N log N).
L6: Polar coding
Encoding
4/45
Encoding: an example
frozen 0
+0
frozen 0
0
frozen 0
+1
free 1
1
frozen 0
+1
free 1
1
free 0
+1
free 1
1
L6: Polar coding
L6: Polar coding
Encoding
5/45
Encoding: an example
+
+
+
+
+
+
+
W
W
W
W
W
+
W
W
W
Encoding
Y1
frozen
0
+0
Y2
frozen
0
0
Y3
frozen
0
+1
1
Y4
free
1
1
1
Y5
frozen
0
+1
Y6
free
1
1
Y7
free
0
+1
1
Y8
free
1
1
1
5/45
L6: Polar coding
+
+
1
+
+
+
1
+
+
+
0
0
Encoding
W
W
W
W
W
W
W
W
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
5/45
Encoding: an example
Successive Cancellation Decoding (SCD)
frozen
0
+0
frozen
0
0
frozen
0
+1
1
free
1
1
1
frozen
0
+1
free
1
1
free
0
free
1
+
1
+1
1
+
1
+
0
+
0
+
0
0
0
0
+1
1
1
1
1
1
+
+
L6: Polar coding
W
W
W
W
W
W
W
W
Y1
Y2
Y3
Theorem
The complexity of successive cancellation decoding for polar codes
is O(N log N).
Y4
Y5
Y6
Proof: Given below.
Y7
Y8
Encoding
5/45
SCD: Exploit the x = |a|a + b| structure
u1
u2
u3
+ b1
+
b2
+
u7
u8
L6: Polar coding
+
+
u6
+
+
+
b3
+
u4
u5
+
+
x1
x2
x3
b4 +
x4
a1
x5
a2
x6
a3
x7
a4
x8
Decoding
W
W
W
W
W
W
W
W
L6: Polar coding
Decoding
6/45
First phase: treat a as noise, decode (u1, u2, u3 , u4)
y1
u1
y2
u2
y3
u3
y4
u4
+ b1
+
+
+
+
b2
+
b3
+
b4 +
x1
x2
x3
x4
y5
noise a1
x5
y6
noise a2
x6
y7
noise a3
x7
y8
noise a4
x8
7/45
L6: Polar coding
Decoding
W
W
W
W
W
W
W
W
y1
y2
y3
y4
y5
y6
y7
y8
8/45
End of first phase
û1
+ b̂1
+
û2
û3
Second phase: Treat b̂ as known, decode (u5, u6, u7, u8)
b̂2
+
u6
u7
+
+
+
+
u8
x1
x2
+
b̂3
+
û4
u5
+
x3
+
b̂4 +
x4
a1
x5
a2
x6
a3
x7
a4
x8
L6: Polar coding
W
W
W
W
W
W
W
W
y1
known b̂1
y2
known b̂2
y3
known b̂3
y4
known b̂4 +
y5
u5
y6
u6
y7
u7
y8
u8
Decoding
9/45
First phase in detail
u1
u3
u4
L6: Polar coding
+ b1
+
+
+
+
W
+
W
+
W
W
a1
W
a2
W
a3
W
a4
L6: Polar coding
W
y1
y2
y3
y4
y5
y6
y7
y8
Decoding
10/45
Equivalent channel model
+
u2
+
+
+
+
b2
+
b3
+
b4 +
x1
x2
x3
x4
noise a1
x5
noise a2
x6
noise a3
x7
noise a4
x8
Decoding
W
W
W
W
W
W
W
W
y1
b1
y2
b2
y3
b3
y4
b4 +
+
+
+
x1
x2
x3
x4
y5
noise a1
x5
y6
noise a2
x6
y7
noise a3
x7
y8
noise a4
x8
11/45
L6: Polar coding
Decoding
W
W
W
W
W
W
W
W
y1
y2
y3
y4
y5
y6
y7
y8
12/45
First copy of W −
Second copy of W −
b1
+
b2
x2
+
b3
x1
x3
+
x4
b4 +
noise a1
x5
noise a2
x6
noise a3
x7
noise a4
x8
L6: Polar coding
W
W
W
W
W
W
W
W
y1
b1
y2
b2
y3
b3
y4
b4 +
x1
x2
+
x3
+
x4
y5
noise a1
x5
y6
noise a2
x6
y7
noise a3
x7
y8
noise a4
x8
Decoding
13/45
Third copy of W −
L6: Polar coding
W
W
W
W
W
W
W
W
y1
y2
y3
y4
y5
y6
y7
y8
Decoding
14/45
Fourth copy of W −
b1
+
b2
+
b3
+
b4 +
L6: Polar coding
+
x1
x2
x3
x4
noise a1
x5
noise a2
x6
noise a3
x7
noise a4
x8
Decoding
W
W
W
W
W
W
W
W
y1
b1
y2
b2
y3
b3
y4
b4 +
+
+
+
x1
x2
x3
x4
y5
noise a1
x5
y6
noise a2
x6
y7
noise a3
x7
y8
noise a4
x8
15/45
L6: Polar coding
Decoding
W
W
W
W
W
W
W
W
y1
y2
y3
y4
y5
y6
y7
y8
16/45
Decoding on W −
u1
b = |t|t + w|
+
u2
u3
+
+
u4
L6: Polar coding
+ b1
W−
b2
W−
b3
W−
b4
W−
(y1 , y5 )
u1
(y2 , y6 )
u2
(y3 , y7 )
u3
(y4 , y8 )
u4
Decoding
17/45
Decoding on W −−
+ w1
+ b1
W−
b2
W−
+ t1
b3
W−
t2
b4
W−
w2 +
L6: Polar coding
(y2 , y6 )
(y3 , y7 )
(y4 , y8 )
Decoding
18/45
Decoding on W −−−
u1
u1
+ w1
W −−
u2
w2
W −−
W −−−
(y2 , y4 , y6 , y8 )
Compute
∆
and set
W −−− (y1 , . . . , y8 | u1 = 0)
W −−− (y1 , . . . , y8 | u1 = 1)


 u1
û1 = 0


1
Decoding
(y1 , y2 , . . . , y8 )
(y1 , y3 , y5 , y7 )
L−−− =
L6: Polar coding
(y1 , y5 )
19/45
L6: Polar coding
if u1 is frozen
else if L−−− > 0
else
Decoding
20/45
Decoding on W −−+
Decoding on W −−+
u2
known û1
+
W −−
u2
W −−
(y1 , y3 , y5 , y7 )
W −−+
(y1 , . . . , y8 , û1 )
Compute
(y2 , y4 , y6 , y8 )
∆
L−−+ =
W −−+ (y1 , . . . , y8 , û1 | u2 = 0)
W −−+ (y1 , . . . , y8 , û1 | u2 = 1)
and set


 u2
û2 = 0


1
L6: Polar coding
Decoding
21/45
Complexity for successive cancelation decoding
L6: Polar coding
if u2 is frozen
else if L−−+ > 0
else
Decoding
21/45
Performance of polar codes
Probability of Error (A. and Telatar (2008)
◮
Let CN be the complexity of decoding a code of length N
◮
Decoding problem of size N for W reduced to two decoding
problems of size N/2 for W − and W +
For any binary-input symmetric channel W , the probability of frame
error for polar coding at rate R < C (W ) and using codes of length
N is bounded as
0.49
Pe (N, R) ≤ 2−N
◮
So
for sufficiently large N.
CN = 2CN/2 + kN
for some constant k
◮
This gives CN = O(N log N)
A more refined versions of this result has been given given by S. H.
Hassani, R. Mori, T. Tanaka, and R. L. Urbanke (2011).
L6: Polar coding
Decoding
22/45
L6: Polar coding
Decoding
23/45
Construction complexity
Gaussian approximation
Construction Complexity
Polar codes can be constructed in time O(Npoly(log (N))).
This result has been developed in a sequence of papers by
◮
R. Mori and T. Tanaka (2009)
◮
I. Tal and A. Vardy (2011)
◮
R. Pedarsani, S. H. Hassani, I. Tal, and E. Telatar (2011)
L6: Polar coding
Construction
24/45
Example of Gaussian approximation
◮
Trifonov (2011) introduced a Gaussian approximation
technique for constructing polar codes
◮
Dai et al. (2015) studied various refinements of Gaussian
approximation for polar code construction
◮
These methods work extremely well although a satisfactory
explanation of why they work is still missing
L6: Polar coding
Construction
25/45
Polar coding summary
Polar code construction and performance estimation by Gaussian
approximation
Summary
100
Given W , N = 2n , and R < I (W ), a polar code can be constructed
such that it has
◮ construction complexity O(Npoly(log (N))),
◮ encoding complexity ≈ N log N,
◮ successive-cancellation decoding complexity ≈ N log N,
√
√ ◮ frame error probability Pe (N, R) = o 2− N+o( N) .
Polar(65536,61440,8) - BPSK
Ultimate Shannon limit
BPSK Shannon limit
Threshold SNR at target FER
Gaussian approximation
10-1
FER
10-2
Shannon BPSK limit
10-3
Shannon limit
10-4
Gap to ultimate capacity = 3.42
Gap to BPSK capacity = 1.06
10-5
10-6
0
1
2
3
4
5
6
7
8
Es /N0 (dB)
L6: Polar coding
Construction
26/45
L6: Polar coding
Construction
27/45
Performance improvement for polar codes
◮
Concatenation to improve minimum distance
◮
List decoding to improve SC decoder performance
Concatenation
Method
Block turbo coding with polar constituents
Generalized concatenated coding with polar inner
Reed-Solomon outer, polar inner
Polar outer, block inner
Polar outer, LDPC inner
Ref
AKMOP (2009)
AM (2009)
BJE (2010)
SH (2010)
EP (ISIT’2011)
AKMOP: A., Kim, Markarian, Özgür, Poyraz
GCC: A., Markarian
BJE: Bakshi, Jaggi, and Effros
SH: Seidl and Huber
EP: Eslami and Pishro-Nik
L6: Polar coding
Performance
28/45
Overview of decoders for polar codes
◮
◮
◮
L6: Polar coding
Performance
29/45
List decoder for polar codes
Successive cancellation decoding: A depth-first search method
with complexity roughly N log N
◮
Sufficient to prove that polar codes achieve capacity
◮
Equivalent to an earlier algorithm by Schnabl and Bossert
(1995) for RM codes
◮
Simple but not powerful enough to challenge LDPC and turbo
codes in short to moderate lengths
List decoding: A breadth-first search algorithm with limited
branching (known as “beam search” in AI).
◮
First proposed by Tal and Vardy (2011) for polar codes.
◮
List decoding was used earlier by Dumer and Shabunov (2006)
for RM codes
◮
Complexity grows as O(LN log N) for a list size L. But
hardware implementation becomes problematic as L grows due
to sorting and memory management.
◮
First produce L candidate decisions
◮
Pick the most likely word from the list
◮
Complexity O(LN log N)
Sphere-decoding (“British Museum” search with branch and
bound, starts decoding from the opposite side).
L6: Polar coding
Performance
30/45
L6: Polar coding
Performance
31/45
Polar code performance
Polar code performance
Successive cancellation decoder
Improvement by list-decoding: List-32
10 0
10 0
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
10 -2
10 -2
FER
10 -1
FER
10 -1
10 -3
10 -3
10 -4
10 -4
10 -5
0
0.5
1
1.5
2
2.5
3
10 -5
3.5
0
0.5
1
1.5
EsNo (dB)
L6: Polar coding
2
2.5
3
3.5
EsNo (dB)
Performance
32/45
Polar code performance
L6: Polar coding
Performance
33/45
Polar code performance
Improvement by list-decoding: List-1024
Comparison with ML bound
10 0
10 0
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-1024, CRC-0, SNR = 2
ML Bound for P(2048,1024), 4-QAM
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-1024, CRC-0, SNR = 2
10 -2
10 -2
FER
10 -1
FER
10 -1
10 -3
10 -3
10 -4
10 -4
10 -5
0
0.5
1
1.5
2
2.5
3
10 -5
3.5
0
0.5
EsNo (dB)
L6: Polar coding
Performance
1
1.5
2
2.5
3
3.5
EsNo (dB)
34/45
L6: Polar coding
Performance
35/45
Polar code performance
Polar code performance
Introducing CRC improves performance at high SNR
Comparison with dispersion bound
10 0
10 0
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-1024, CRC-0, SNR = 2
ML Bound for P(2048,1024), 4-QAM
P(2048,1024), 4-QAM, L-32, CRC-16, SNR = 2
10 -1
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-1024, CRC-0, SNR = 2
ML Bound for P(2048,1024), 4-QAM
P(2048,1024), 4-QAM, L-32, CRC-16, SNR = 2
Dispersion bound for (2048,1024)
10 -1
FER
10 -2
FER
10 -2
10 -3
10 -3
10 -4
10 -4
10 -5
0
0.5
1
1.5
2
2.5
3
10 -5
3.5
0
0.5
1
1.5
2
EsNo (dB)
L6: Polar coding
Performance
36/45
Polar codes vs WiMAX Turbo Codes
L6: Polar coding
3.5
4
Performance
37/45
Better performance obtained with List-32 + CRC
10 0
P(1024,512), 4-QAM, L-1, CRC-0, SNR = 2
P(1024,512), 4-QAM, L-32, CRC-0, SNR = 2
P(1024,512), 4-QAM, L-32, CRC-16, SNR = 2
Dispersion bound for (1024,512)
WiMAX CTC (960,480)
10 -1
3
Polar codes vs WiMAX LDPC Codes
Comparable performance obtained with List-32 + CRC
10 0
2.5
EsNo (dB)
P(2048,1024), 4-QAM, L-1, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-0, SNR = 2
P(2048,1024), 4-QAM, L-32, CRC-16, SNR = 2
Dispersion bound for (2048,1024)
WiMAX LDPC(2304,1152), Max Iter = 100
10 -1
FER
10 -2
FER
10 -2
10 -3
10 -3
10 -4
10 -4
10 -5
0
0.5
1
1.5
2
2.5
3
3.5
10 -5
4
0
0.5
EsNo (dB)
L6: Polar coding
Performance
1
1.5
2
2.5
3
3.5
4
EsNo (dB)
38/45
L6: Polar coding
Performance
39/45
Polar Codes vs DVB-S2 LDPC Codes
Polar codes vs IEEE 802.11ad LDPC codes
LDPC (16200,13320), Polar (16384,13421). Rates = 0.82. BPSK-AWGN
channel.
Park (2014) gives the following performance comparison.
Polar N = 16384, R = 37/45, Frame Error Rate of List Decoder
0
10
(Park’s result on LDPC conflicts
with reference IEEE
802.11-10/0432r2. Whether
there exists an error floor as
shown needs to be confirmed
independently.)
−1
FER
10
−2
10
Polar List = 1
Polar List = 32
Polar List 32 with CRC
−3
DVBS216200 37/45
10
Source: Youn Sung Park, “Energy-Effcient Decoders of Near-Capacity Channel
Codes,” PhD Dissertation, The University of Michigan, 2014.
−4
10
2
2.5
3
3.5
Eb/N0 (dB)
L6: Polar coding
Performance
40/45
Summary of performance comparisons
◮
Successive cancellation decoder is simplest but inherently
sequential which limits throughput
◮
BP decoder improves throughput and with careful design
performance
L6: Polar coding
Performance
Implementation performance metrics
Implementation performance is measured by
◮
Chip area (mm2)
◮
List decoder but significantly improves performance at low
SNR
◮
Throughput (Mbits/sec)
◮
◮
Adding CRC to list decoding improves performance
significantly at high SNR with little extra complexity
Energy efficiency (nJ/bit)
◮
Hardware efficiency (Mb/s/mm2)
◮
41/45
Overall, polar codes under list-32 decoding with CRC offer
performance comparable to codes used in present wireless
standards
L6: Polar coding
Performance
42/45
L6: Polar coding
Polar coding performance
43/45
Successive cancellation decoder comparisons
BP decoder comparisons
Property
Unit
1024
0.5
1024
0.5
1024
0.5
[4]
BP Circular
Unidirectional,
Reduced
Complexity
1024
0.5
CMOS
CMOS
CMOS
CMOS
65
1.48
1
300
477.5
15
1024
65
1.48
0.475
50
18.6
15
171
45
12.46
1
606
2056.5
15
2068
45
1.65
1
555
328.4
15
1960
pJ/b
102.1
23.8
110.5
19.3
pJ/b/iter
15.54
3.63
7.36
1.28
7306.78
693.77
99.80
Normalized to 45 nm according to ITRS roadmap
613.4
1263.8
210.6
166.01
1187.71
2068
1960
34.9
110.5
19.3
179.85
166.01
1187.71
Decoding type
and Scheduling
Decoder Type
Block Length
Technology
Area [mm2 ]
Voltage [V]
Frequency [MHz]
Power [mW]
Throughput [Mb/s]
Engy.-per-bit [pJ/b]
Hard. Eff. [Mb/s/mm2 ]
[1]
[2]1
[3]2
SC
1024
90 nm
3.213
1.0
2.79
32.75
2860
11.45
890
SC
1024
65 nm
0.68
1.2
1010
497
730
BP
1024
65 nm
1.476
1.0
0.475
300
50
477.5
18.6
4676
779.3
102.1
23.8
3168
528
[1]
[2]
[3]
[3]
[4]
SCD with
folded
HPPSN
Specialized
SC
BP Circular
Unidirectional
BP Circular
Unidirectional
BP All-ON,
Fully
Parallel
Block length
Rate
1024
Technology
CMOS
Process
Core area
Supply
Frequency
Power
Iterations
Throughput∗
Energy
efficiency
Energy eff. per
iter.
Area efficiency
nm
mm2
V
MHz
mW
Mb/s
Mb/s/mm2
[1] O. Dizdar and E. Arıkan, arXiv:1412.3829, 2014.
65
0.068
1.2
1010
16384
0.9
Altera
Stratix 4
40
1.35
106
1
497
1
1091
Mb/s
Throughput∗
Energy
149.6
pJ/b
efficiency
Area efficiency
Mb/s/mm2
18036.5
1250.21
∗ Throughput obtained by disabling the BP early-stopping rules for fair comparison.
[2] Y. Fan and C.-Y. Tsui, “An efficient partial-sum network architecture for semi-parallel polar codes decoder
implementation,” Signal Processing, IEEE Transactions on, vol. 62, no. 12, pp. 3165-3179, June 2014.
[3] C. Zhang, B. Yuan, and K. K. Parhi, “Reduced-latency SC polar decoder architectures,” arxiv.org, 2011.
[1] Y.-Z. Fan and C.-Y. Tsui, “An efficient partial-sum network architecture for semi-parallel polar codes decoder implementation,” IEEE
Transactions on Signal Processing, vol. 62, no. 12, pp. 3165–3179, June 2014.
[2] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast polar decoders: Algorithm and implementation,” IEEE Journal on
Selected Areas in Communications, vol. 32, no. 5, pp. 946–957, May 2014.
[3] Y. S. Park, “Energy-efficient decoders of near-capacity channel codes,” in http://deepblue.lib.umich.edu/handle/2027.42/108731, 23
October 2014 PhD.
1
Throughput 730 Mb/s calculated by technology conversion metrics
2
Performance at 4 dB SNR with average no of iterations 6.57
L6: Polar coding
Polar coding performance
[4] A. D. G. Biroli, G. Masera, E. Arıkan, “High-throughput belief propagation decoder architectures for polar codes,” submitted 2015.
44/45
L6: Polar coding
Polar coding performance
45/45
Lecture 7 – Origins of polar coding
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
◮
L4: Probabilistic coding
◮
L5: Channel polarization
Objective: Relate polar codes to the probabilistic approach in
coding
Topics
◮
◮
L6: Polar coding
L7: Origins of polar coding
◮
Sequential decoding and cutoff rates
Methods for boosting the cutoff rate
◮
Pinsker’s scheme
◮
Massey’s scheme
Polar coding as a method to boost the cutoff rate to capacity
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
L7: Origins of polar coding
1/40
L7: Origins of polar coding
2/40
Goals
◮
◮
Outline
Show how polar coding originated from attempts to boost the
cutoff rate of sequential decoding
In particular, focus on the two papers:
◮
Pinsker (1965) “On the complexity of decoding”
◮
Massey (1981) “Capacity, cutoff rate, and coding for a
direct-detection optical channel”
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
3/40
Pointwise search: 2-D or 2 x 1-D ?
◮
◮
◮
◮
Correlated loss model: X , Y both forgotten with probability ǫ
◮
Independent loss model: X , Y each forgotten independently
with probability ǫ
◮
Sequential decoding
◮
Pinsker’s scheme
◮
Massey’s scheme
◮
Polarization
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
4/40
Search complexities
◮
Correlated loss
◮
◮
2-D search
◮
◮
A basic fact about search
An item is placed at random in a 2-D
√ square grid with M
bins: (X , Y ) uniform over {1, . . . , M}2 .
Loss models:
◮
◮
◮
◮
◮
No of questions until finding (X , Y ) is a RV: GXY
◮
◮
May ask “Is X = x?” or “Is Y = y ?” Again receive “Yes/No”
answer.
◮
No of questions until finding X and Y : GX + GY
√
E [GX ] + E [GY ] = 2 [(1 − ǫ) · 1 + ǫ M/2]
Independent loss
May ask “Is (X , Y ) = (x, y )?” Receive “Yes/No” answer.
1-D search
E [GXY ] = (1 − ǫ) · 1 + ǫ M/2
√
E [GXY ] = (1 − ǫ)2 + 2ǫ(1 − ǫ) M/2 + ǫ2 M/2
√
E [GX ] + E [GY ] = 2 [(1 − ǫ) · 1 + ǫ M/2]
Which type of search is better for minimizing complexity?
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
5/40
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
6/40
Search complexities, cutoff
E [GXY ]
E [GX ] + E [GY ]
◮
◮
◮
Correlated loss
O(1) if M = o(1/ǫ)
O(1) if M = o(1/ǫ2 )
Search complexity: Conclusions drawn
Independent loss
O(1) if M = o(1/ǫ2 )
O(1) if M = o(1/ǫ2 )
In order to reduce the complexity of pointwise search for an object
under noisy observations
“Cutoff”: Search complexity not O(1), grows with M
1-D search cutoff better than 2-D search cutoff under
correlated loss model
◮
Define object features and search feature by feature
◮
Define the features s.t. the observation noise across them
have positive correlation
Cutoffs the same under independent loss model
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
7/40
Convolutional codes, Sequential decoding, ...
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
8/40
Sequential decoding: the algorithm
SD is a search algorithm for the correct path in a tree code
◮
Convolutional codes were invented by P. Elias (1955)
◮
Sequential decoding by J. M. Wozencraft (1957)
◮
Fano’s algorithm (1963), US Patent 3,457,562 (1969)
◮
SD enjoyed popularity in 1960s
◮
First coding system in space
◮
Viterbi algoritm (1967)
◮
SD lost ground to Viterbi algorithm in 1970s and never
recovered
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
9/40
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
10/40
Sequential decoding: the metric
SD uses a “metric” to distinguish
the correct path from the
incorrect ones
Sequential decoding: the cutoff rate
P(y n |x n )
− nR
Γ(y n , x n ) = log
P(y n )
n
xn
yn
R
Relation to cutoff rates and sequential decoding
SD visits nodes at level N in
a certain order
◮
◮
◮
◮
Forgets what it saw beyond
level N upon backtracking
L7: Origins of polar coding
For a channel with transition probabilities W (y |x), Rcomp
equals
R0 = maxQ − log
11/40
y
x
p
Q(x) W (y | x)
#2
Achievability: Wozencraft (1957), Reiffen (1962), Fano
(1963), Stiglitz and Yudkin (1964)
◮
Converse: Jacobs and Berlekamp (1967)
◮
Refinements: Wozencraft and Jacobs (1965), Savage (1966),
Gallager (1968), Jelinek (1968), Forney (1974), Arıkan (1986)
Relation to cutoff rates and sequential decoding
12/40
R0 as an error exponent
◮
Let GN be the number of
nodes searched (visited) at
level N until correct node is
found
Let R be the code rate
There exist codes s.t.
Random coding
exponent, (N, R)
codes:
Pe ≤ 2−NEr (R)
◮
Union bound:
Pe ≤ 2−N(R0 −R)
For any code of rate R,
◮
E [GN ] & 1 + 2−N(R0 −R)
Relation to cutoff rates and sequential decoding
"
X X
◮
L7: Origins of polar coding
E [GN ] ≤ 1 + 2−N(R0 −R)
◮
◮
∆
Rules of the game: pointwise, no “look-ahead”
◮
SD achieves arbitrarily reliable communication at constant
average complexity per bit at rates below a (computational)
cutoff rate Rcomp
Fano’s metric:
path length
candidate path
received sequence
code rate
L7: Origins of polar coding
◮
ER (R) ≥ R0 − R
13/40
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
14/40
R0 as a figure of merit
◮
◮
For a while, R0 appeared as a realistic goal
A figure of merit in design of modulation schemes
◮
“The author does not know of any channel for which Rcomp is
less than 21 C , but no definite lower bound to Rcomp has yet
been found.”
Wozencraft and Jacobs, Principles of Communication
Engineering, 1965
◮
Wozencraft and Kennedy, “Modulation and demodulation for
probabilistic coding,” IT Trans.,1966
◮
Massey, “Coding and modulation in digital communications,”
Zürich, 1974
An example came in 1980 that showed R0 could be arbitrarily
small as a fraction of C
◮
But in fact a paradoxical result had already come from
Pinsker (1965) that showed the “flaky” nature of R0
Forney gives a first-hand account of this situation in his 1995
Shannon Lecture
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
15/40
Boosting the cutoff rate
◮
◮
◮
Fano (1963) wrote:
◮
◮
◮
R0 vs C
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
16/40
Pinsker’s scheme (1965)
Goal: Finding SD schemes with Rcomp larger than R0
R0 is a fundamental limit if one follows the rules of the game:
◮
Single searcher
◮
No look-ahead
To boost the cutoff rate, change one or both of these rules
◮
Use multiple sequential decoders
◮
Provide look-ahead
◮
◮
◮
◮
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
17/40
Block coding just below capacity: K /N ≈ C (W )
N large, block error rate small: Pe ∼ 2−O(N)
Each SD sees a memoryless BSC with R0 near 1
Boosts the cutoff rate to capacity
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
18/40
A scheme that doesn’t work
Equivalent scheme
Cutoff rate = R0 (Derived vector channel)
No improvement in cutoff rate
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
19/40
A conservation law for the cutoff rate
◮
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
20/40
Channel splitting to boost cutoff rate (Massey, 1981)
“Parallel channels” theorem (Gallager, 1965)
R0 (Derived vector channel) ≤ N R0 (W )
◮
“Cleaning up” the channel by pre-/post-processing can only
hurt R0
◮
Shows that boosting cutoff rate requires more than one
sequential decoder
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
◮
21/40
Begin with a quaternary erasure channel (QEC)
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
22/40
Channel splitting to boost cutoff rate (Massey, 1981)
◮
Channel splitting to boost cutoff rate (Massey, 1981)
Relabel the inputs
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
23/40
Capacity, cutoff rate for one QEC vs two BECs
◮
Split the QEC into two binary erasure channels (BEC)
◮
BECs fully correlated: erasures occur jointly
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
24/40
Cutoff rate improvement by splitting
2
Ordinary coding of QEC
Independent coding of BECs
1.8
Cutoff rate of QEC
E
QEC
BEC
D
Sum cutoff rate after splitting
Capacity, cutoff rate (bits)
E
1.6
D
E
BEC
C (QEC) = 2(1 − ǫ)
C (BEC) = (1 − ǫ)
4
R0 (QEC) = log 1+3ǫ
2
R0 (BEC) = log 1+ǫ
D
1.4
Capacity of QEC
1.2
Cutoff rate of BEC
1
0.8
0.6
0.4
0.2
0
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
25/40
0
L7: Origins of polar coding
0.1
0.2
0.3
0.4
0.5
0.6
Erasure probability ε
0.7
Relation to cutoff rates and sequential decoding
0.8
0.9
1
26/40
Why does Massey’s scheme work?
Comparison of Pinsker’s and Massey’s schemes
◮
◮
◮
◮
◮
Pinsker
Why do we have 2 R0 (BEC) ≥ R0 (QEC)?
◮
Construct a superchannel by combining independent copies of
a given DMC W
◮
Split the superchannel into correlated subchannels
◮
Ignore correlations between the subchannels, encode and
decode them independently
GN (QEC) = GN (BEC1 ) GN (BEC2 )
◮
Can be used universally
= GN (BEC1 )2
◮
Can achieve capacity
◮
Not practical
Let GN denote the number of guesses at level N until finding
the correct node
Joint decoder has quadratic complexity
correlated erasures
Thus,
◮
Massey
E [GN (QEC)] = E [GN (BEC1 )2 ] ≥ (E [GN (BEC1 )])2
◮
Second moment of GN (BEC) becomes exponentially large at a
rate below R0 (BEC).
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
27/40
Prescription for a new scheme
◮
Consider small constructions
◮
Retain independent encoding for the subchannels
◮
Do not ignore correlations between subchannels at the
expense of capacity
◮
◮
Split the given DMC W into correlated subchannels
◮
Ignore correlations between the subchannels, encode and
decode them independently
◮
Applicable only to specific channels
◮
Cannot achieve capacity
◮
Practical
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
Notation
∆
◮
Let V : F2 = {0, 1} → Y be an arbitrary binary-input
memoryless channel
◮
Let (X , Y ) be an input-output ensemble for channel V with
X uniform on F2
◮
The (symmetric) capacity is defined as
∆
∆
I (V ) = I (X ; Y ) =
This points to multi-level coding and successive cancellation
decoding
◮
V (y |x)
1
2 V (y |x) log 1
1
2 V (y |0) + 2 V (y |1)
y ∈Y x∈F2
XX
The (symmetric) cutoff rate is defined as
∆
∆
R0 (V ) = R0 (X ; Y ) = − log
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
28/40
29/40
L7: Origins of polar coding
X
y ∈Y


X p
x∈F2
Relation to cutoff rates and sequential decoding
1
2
2
V (y |x)
30/40
Basic module for a low-complexity scheme
The first bit-channel W1
Combine two copies of W
U1
+
U2
X1
X2
W
W
W1 : U1 → (Y1 , Y2 )
Y1
Y2
U1
G2
and split to create two bit-channels
+
random U2
W
W
Y1
Y2
W1 : U1 → (Y1 , Y2 )
W2 : U2 → (Y1 , Y2 , U1 )
C (W1 ) = I (U1 ; Y1 , Y2 )
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
31/40
The second bit-channel W2
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
32/40
The 2x2 transformation is information lossless
◮
W2 : U2 → (Y1 , Y2 , U1 )
With independent, uniform U1 , U2 ,
I (W − ) = I (U1 ; Y1 Y2 ),
I (W + ) = I (U2 ; Y1 Y2 U1 ).
U1
+
W
Y1
◮
U2
W
Thus,
Y2
I (W − ) + I (W + ) = I (U1 U2 ; Y1 Y2 )
= 2I (W ),
C (W2 ) = I (U2 ; Y1 , Y2 , U1 )
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
◮
33/40
and I (W − ) ≤ I (W ) ≤ I (W + ).
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
34/40
The 2x2 transformation “creates” cutoff rate
Cutoff Rate Polarization
With independent, uniform U1 , U2 ,
R0 (W − ) = R0 (U1 ; Y1 Y2 ),
Theorem (2006)
R0 (W + ) = R0 (U2 ; Y1 Y2 U1 ).
The cutoff rates {R0 (Ui ; Y N U i −1 )} of the channels created by the
recursive transformation converge to their extremal values, i.e.,
Theorem (2005)
Correlation helps create cutoff rate:
and
R0 (W − ) + R0 (W + ) ≥ 2R0 (W )
with equality iff W is a perfect channel, I (W ) = 1, or a pure noise
channel, I (W ) = 0. Cutoff rates start polarizing:
Relation to cutoff rates and sequential decoding
35/40
Sequential decoding with successive cancellation
◮
1 # i : R0 (Ui ; Y N U i −1 ) ≈ 0 → 1 − I (W ).
N
Remark: {I (Ui ; Y N U i −1 )} also polarize.
R0 (W − ) ≤ R0 (W ) ≤ R0 (W + )
L7: Origins of polar coding
1 # i : R0 (Ui ; Y N U i −1 ) ≈ 1 → I (W )
N
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
Final step: Doing away with sequential decoding
Use the recursive construction to generate N bit-channels
with cutoff rates R0 (Ui ; Y N U i −1 ), 1 ≤ i ≤ N.
Encode the bit-channels independently using convolutional
coding
◮
Due to polarization, rate loss is negligible if one does not use
the “bad” bit-channels
◮
Decode the bit-channels one by one using sequential decoding
and successive cancellation
◮
◮
Achievable sum cutoff rate is
Rate of polarization is strong enough that a vanishing frame
error rate can be achieved even if the “good” bit-channels are
used uncoded
◮
The resulting system has no convolutional encoding and
sequential decoding, only successive cancellation decoding
◮
N
X
36/40
R0 (Ui ; Y N U i −1 )
i =1
which approaches N I (W ) as N increases.
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
37/40
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
38/40
Polar coding
Polar coding complexity and performance
Theorem (2007)
To communicate at rate R < I (W ):
; Y N U i −1 )
◮
Pick N, and K = NR good indices i such that I (Ui
is high,
◮
let the transmitter set Ui to be uncoded binary data for good
indices, and set Ui to random but publicly known values for
the rest,
◮
With the particular one-to-one mapping described here and with
the successive cancellation decoding
let the receiver decode the Ui successively: U1 from Y N ; Ui
from Y N Û i −1 .
L7: Origins of polar coding
Relation to cutoff rates and sequential decoding
39/40
◮
polarization codes are ‘I (W ) achieving’,
◮
encoding complexity is N log N,
◮
decoding complexity is N log N,
◮
probability of error decays like 2−
L7: Origins of polar coding
√
N
(with E. Telatar, 2008).
Relation to cutoff rates and sequential decoding
40/40
Lecture 8 – Coding for bandlimited channels
L1: Information theory review
L2: Gaussian channel
L3: Algebraic coding
◮
L4: Probabilistic coding
◮
L5: Channel polarization
L6: Polar coding
L7: Origins of polar coding
Objective: To discuss coding for bandlimited channels in
general and with polar coding in particular
Topics
◮
Bit interleaved coded modulation (BICM)
◮
Multi-level coding and modulation (MLCM)
◮
Lattice coding
◮
Direct polarization approach
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
L8: Coding for bandlimited channels
1/37
L8: Coding for bandlimited channels
2/37
The AWGN Channel
Capacity
The AWGN channel is a continuous-time channel
Shannon’s formula gives the capacity of the AWGN channel as
Y (t) = X (t) + N(t)
C[b/s] = W log2 (1 + P/WN0 ) (bits/s)
such that the input X (t) is a random process bandlimited to W
subject to a power constraint X 2 (t) ≤ P, and N(t) is white
Gaussian noise with power spectral density N0 /2.
L8: Coding for bandlimited channels
Background
3/37
Signal Design Problem
L8: Coding for bandlimited channels
Background
4/37
Discrete Time Model
An AWGN channel of bandwidth W gives rise to 2W independent
discrete time channels per second with input-output mapping
Y =X +N
The continuous time and real-number interface of the AWGN
channel is inconvenient for digital communications.
◮
Need to convert from continuous to discrete-time
◮
Need to convert from real numbers to a binary interface
◮
◮
◮
X is a random variable with mean 0 and energy
E [X 2 ] ≤ P/2W
N is Gaussian noise with 0-mean and energy N0 /2.
It is customary to normalize the signal energies to joules per 2
dimensions and define
Es = P/W
Joules/2D
as signal energy (per two dimensions).
◮
L8: Coding for bandlimited channels
Background
5/37
One defines the the signal-to-noise ratio as Es /N0 .
L8: Coding for bandlimited channels
Background
6/37
Capacity
Signal Design Problem
Now, we need a digital interface instead of real-valued inputs.
The capacity of the discrete-time AWGN channel is given by
C=
1
log2 (1 + Es /N0 ),
2
Background
7/37
Separation of coding and modulation
◮
Each constellation A has a capacity CA (bits/D) which is a
function of Es /N0 .
◮
The spectral efficiency ρ (bits/D) has to satisfy
◮
Finding a signal set with good Euclidean distance properties
and other desirable features is the “signal design” problem.
◮
Typically, the dimension n is 1 or 2.
L8: Coding for bandlimited channels
8/37
Each constellation A has a cutoff rate R0,A (bits/D) which is a
function of Es /N0 such that through random coding one can
guarantee the existence of coding and modulation schemes with
probability of frame error
at the operating Es /N0 .
Pe < 2−N[R0,A (Es /N0 )−ρ]
The spectral efficiency is the product of two terms
ρ=R×
Background
Cutoff rate: A simple measure of reliability
ρ < CA (Es /N0 )
◮
Select a subset A ⊂ Rn as the “signal set” or “modulation
alphabet”.
(bits/D),
achieved by i.i.d. Gaussian inputs X ∼ N(0, Es /2) per dimension.
L8: Coding for bandlimited channels
◮
log2 (|A|)
dim(A)
where N is the frame length in modulation symbols.
where R (dimensionless) is the rate of the FEC.
◮
For a given ρ, there any many choices w.r.t. R and A.
L8: Coding for bandlimited channels
Background
9/37
L8: Coding for bandlimited channels
Background
10/37
Sequential decoding and cutoff rate
◮
M-ary Pulse Amplitude Modulation
Sequential decoding (Wozencraft, 1957) is a decoding
algorithm for convolutional codes that can achieve spectral
efficiencies as high as the cutoff rate at constant average
complexity per decoded bit.
◮
The difference between cutoff rate and capacity at high Es /N0
is less than 3 dB.
◮
This was regarded as the solution of the coding and
modulation problem in early 70s and interest in the problem
waned. (See Forney 1995 Shannon Lecture for this story.)
◮
Polar coding grew out of attempts to improve the cutoff rate
of channels by simple combining and splitting operations.
L8: Coding for bandlimited channels
Background
A 1-D signal set with A = {±α, ±3α, . . . , ±(M − 1)}.
◮
◮
11/37
Capacity of M-PAM
Capacity (bits)
6
L8: Coding for bandlimited channels
8
7
5
4
3
6
5
4
2
1
1
10
20
30
40
PAM-2
PAM-4
PAM-8
PAM-16
PAM-32
PAM-64
PAM-128
PAM-256
PAM-512
PAM-1024
Shannon capacity
Shannon cutoff rate
Gaussian input cutoff rate
3
2
0
0
-10
50
Es/N0 (db)
Background
0
10
20
30
40
50
Es/N0 (db)
M-PAM is good enough from a capacity viewpoint.
L8: Coding for bandlimited channels
12/37
Cutoff rate with PAM
9
PAM-2
PAM-4
PAM-8
PAM-16
PAM-32
PAM-64
PAM-128
Shannon Limit
0
-10
Background
Cutoff rate of M-PAM
Cutoff rate (bits)
7
Consider the capacity, cutoff rate
Capacity with PAM
9
8
Average energy: Es = 2α2 (M 2 − 1)/3 (Joules/2D)
M-PAM is satisfactory also in terms of cutoff rate.
13/37
L8: Coding for bandlimited channels
Background
14/37
Conventional approach
How does it work in practice?
WiMAX CTC Codes: Fixed Spectral Efficiency, Different Modulation
10 0
Given a target spectral efficiency ρ and a target error rate Pe at a
specific Es /No ,
◮
select M large enough so that M-PAM capacity is close
enough to the Shannon capacity at the given Es /No
Spectral efficiency = 3 b/2D
for both cases.
10 -2
apply coding external to modulation to achieve the desired Pe
FER
◮
CTC(576,432), 16-QAM
CTC(864,432), 64-QAM
10 -1
Such separation of coding and modulation was first challenged
successfully by Ungerboeck (1981).
It takes 144 symbols to carry
the payload in both cases.
10 -3
Gap to Shannon
about 3 dB at FER 1E-3
10 -4
However, with the advent of powerful codes at affordable
complexity, there is a return to the conventional design
methodology.
Provides a coding gain of 4.8 dB
over uncoded transmission
10 -5
4
5
6
7
8
9
10
11
12
13
Es/No in dB
Theory and practice don’t match here!
L8: Coding for bandlimited channels
Background
15/37
Why change modulation instead of just the code rate?
L8: Coding for bandlimited channels
WiMAX: Same rate-3/4 code with different order QAM modulations
◮
◮
spec. eff. 4.5
spec. eff. 3
Suppose we fix the modulation as 64-QAM and wish to
deliver data at spectral efficiencies 1, 2, 3, 4, 5 b/2D.
spec. eff. 1.5
10 -1
We would need a coding scheme that works well at rates 1/6,
1/3, 1/2, 2/3, 5/6.
10 -2
FER
◮
16/37
Alternative: Fixed code, variable modulation
10 0
◮
Background
The inability of delivering high quality coding over a wide
range of rates forces one to change the order of modulation.
10 -3
The difficulty here is practical: it is a challenge to have a
coding scheme that works well over all rates from 0 to 1.
10 -4
CTC(576,432), 4-QAM
CTC(576,432), 16-QAM
CTC(576,432), 64-QAM
Gap to Shannon limit widens slightly with increasing
modulation order but in general good agreement.
10 -5
0
2
4
6
8
10
12
14
16
18
Es/No in dB
L8: Coding for bandlimited channels
Background
17/37
L8: Coding for bandlimited channels
Background
18/37
Polar coding and modulation
Direct Method
Idea: Given a system with q-ary modulation, treat it as an ordinary
q-ary input memoryless channel and apply a suitable polarization
transform.
Polar codes can be applied to modulation in at least three different
ways.
◮
Direct polarization
◮
Multi-level techniques
◮
Polar lattices
◮
BICM
L8: Coding for bandlimited channels
polar
Theory of q-ary polarization exists.
19/37
Direct Method
◮
Şasoğlu, E., E. Telatar, and E. Arıkan. “Polarization for
arbitrary discrete memoryless channels.” IEEE ITW 2009.
◮
Sahebi, A. G. and S. S. Pradhan, “Multilevel polarization of
polar codes over arbitrary discrete memoryless channels.”
IEEE Allerton, 2011.
◮
Park, W.-C. and A. Barg. “Polar codes for q-ary channels,”
IEEE Trans. Inform. Theory, 2013.
◮
...
L8: Coding for bandlimited channels
polar
20/37
Multi-Level Modulation (Imai and Hirakawa, 1977)
Represent (if possible) each channel input symbol as a vector
X = (X1 , X2 , . . . , Xr ); then the capacity can be written as a sum of
capacities of smaller channels by the chain rule:
I (X ; Y ) = I (X1 , X2 , . . . , Xr ; Y )
r
X
=
I (Xi ; Y |X1 , . . . , Xi −1 ).
The difficulty with the direct approach is complexity of decoding.
G. Montorsi’s ADBP is a promising approach for reducing the
complexity here.
i =1
This splits the original channel into r parallel channels, which are
encoded independently and decoded using successive cancellation
decoding.
Polarization is a natural complement to MLM.
L8: Coding for bandlimited channels
polar
21/37
L8: Coding for bandlimited channels
polar
22/37
Polar coding with multi-level modulation
Example: 8-PAM as 3 bit channels
Already a well-studied subject:
◮
Arıkan, E., “Polar Coding,” Plenary Talk, ISIT 2011.
◮
Seidl, M., Schenk, A., Stierstorfer, C., and Huber, J. B.
“Polar-coded modulation,” IEEE Trans. Comm. 2013.
◮
◮
◮
◮
Seidl, M., Schenk, A., Stierstorfer, C., and Huber, J. B.
“Multilevel polar-coded modulation‘,” IEEE ISIT 2013
◮
Three layers of binary channels created
◮
Each layer encoded independently
◮
Layers decoded in the order b3 , b2 , b1
2-PAM
Bit b2
Beygi, L., Agrell, E., Kahn, J. M., and Karlsson, M., “Coded
modulation for fiber-optic networks,” IEEE Sig. Proc. Mag.,
2014.
...
polar
PAM signals selected by three bits (b1 , b2 , b3 )
Bit b1
Ionita, Corina, et al. ”On the design of binary polar codes for
high-order modulation.” IEEE GLOBECOM, 2014.
L8: Coding for bandlimited channels
◮
23/37
Polarization across layers by natural labeling
0
8-PAM
-7
-6
1
1
0
-4
0
4-PAM
Bit b3
0
1
0
-5
-3
-2
L8: Coding for bandlimited channels
1
0
-1
1
2
4
1
1
0
3
5
6
1
7
polar
24/37
Performance comparison: Polar vs. Turbo
4.5
4
3.5
3
Capacity (bits)
Turbo code
◮ WiMAX CTC
◮ Duobinary, memory 3
◮ QAM over AWGN channel
◮ Gray mapping
◮ BICM
◮ Simulator: “Coded
Modulation Library”
Layer 1 capacity
Layer 2 capacity
Layer 3 capacity
Sum of three layers
Shannon limit
2.5
2
1.5
1
0.5
Polar code
◮ Standard construction
◮ Successive cancellation
decoding
◮ QAM over AWGN channel
◮ Natural mapping
◮ Multi-level PAM
◮ PAM over AWGN channel
0
0
5
10
15
20
25
SNR (dB)
Most coding work needs to be done at the least significant bits.
L8: Coding for bandlimited channels
polar
25/37
L8: Coding for bandlimited channels
polar
26/37
Example: 8-PAM as 3 bit channels
Multi-layering jump-starts polarization
◮
PAM signals selected by three bits (b1 , b2 , b3 )
◮
Three layers of binary channels created
◮
Each layer encoded independently
◮
Layers decoded in the order b3 , b2 , b1
0
2-PAM
Bit b2
-4
0
4-PAM
Bit b3
0
-6
1
0
2.5
1
1
-2
4
0
1
0
2
1
2
1.5
1
0
1
0.5
6
1
0
8-PAM
-7
-5
-3
-1
L8: Coding for bandlimited channels
1
3
5
polar
27/37
−1
10
Polar
−2
polar
28/37
−2
10
FER
10
FER
20
−1
10
−3
10
−4
−3
10
−4
10
Turbo
10
−1
15
0
10
−6
10
SNR (dB)
16-QAM, Rate 3/4
0
10
5
L8: Coding for bandlimited channels
10
−5
0
7
4-QAM, Rate 1/2
10
Layer 1 capacity
Layer 2 capacity
Layer 3 capacity
Sum of three layers
Shannon limit
3
Capacity (bits)
Bit b1
3.5
Polar(512,256) 4−QAM
Polar(1024,512) 4−QAM
CTC(480,240) 4−QAM
CTC(960,480) 4−QAM
−6
10
0
1
Polar(512,384) 16−QAM
CTC(192,144) 16−QAM
CTC(384,288) 16−QAM
CTC(576,432) 16−QAM
−5
10
2
3
4
5
6
3
3.5
EbNo (dB)
L8: Coding for bandlimited channels
polar
29/37
L8: Coding for bandlimited channels
4
4.5
5
5.5
6
EbNo (dB)
polar
6.5
7
7.5
8
8.5
30/37
64-QAM, Rate 5/6
Complexity comparison: 64-QAM, Rate 5/6
0
Average decoding time in milliseconds per codeword (ms/cw)
10
Eb /N0
10 dB
11 dB
−1
10
CTC(576,432)
6.23
1.83
Polar(768,640)
0.92
1.01
Polar(384,320)
0.48
0.53
−2
10
FER
Polar codes show a complexity advantage against CTC codes.
−3
10
Both decoders implemented as MATLAB mex functions. Polar decoder is a successive
−4
10
−5
10
7.5
cancellation decoder. CTC decoder is a public domain decoder (CML). Profiling done
Polar(768,640) 64−QAM
Polar(384,320) 64−QAM
CTC(576,480) 64−QAM
8
8.5
9
9.5
L8: Coding for bandlimited channels
by MATLAB Profiler. Iteration limit for CTC decoder was 10; average no of iterations
was 10 at 10 dB and 3.3 at 11 dB. CTC decoder used a linear approximation to
10
10.5
EbNo (dB)
11
11.5
12
12.5
13
polar
log-MAP while polar decoder used exact log-MAP.
31/37
Lattices and polar coding
◮
32/37
Yan et al used the Barnes-Wall lattice contructions such as
BW16 = RM(1, 4) + 2RM(3, 4) + 4(Z16 )
Yan, Yanfei, and L. Cong, “A construction of lattices from
polar codes.” IEEE 2012 ITW.
as a template for constructing polar lattices of the type
Yan, Yanfei, Ling Liu, Cong Ling, and Xiaofu Wu.
“Construction of capacity-achieving lattice codes: Polar
lattices.” arXiv preprint arXiv:1411.0187 (2014)
L8: Coding for bandlimited channels
polar
Lattices and polar coding
Yan, Cong, and Liu explored the connection between lattices and
polar coding.
◮
L8: Coding for bandlimited channels
polar
P16 = P(1, 4) + 2P(3, 4) + 4(Z16 )
and demonstrated by simulations that polar lattices perform better.
33/37
L8: Coding for bandlimited channels
polar
34/37
BICM
BICM vs Multi Level Modulation
Why has BICM won over MLM and other techniques in practice?
BICM [Zehavi, 1991], [Caire, Taricco, Biglieri, 1998] is the
dominant technique in modern wireless standards such as LTE.
As in MLM, BICM splits the channel input symbols into a vector
X = (X1 , X2 , . . . , Xr ) but strives to do so such that
I (X ; Y ) = I (X1 , X2 , . . . , Xr ; Y )
r
X
=
I (Xi ; Y |X1 , . . . , Xi −1 )
≈
i =1
r
X
L8: Coding for bandlimited channels
I (Xi ; Y ).
i =1
polar
35/37
BICM and Polar Coding
◮
MLM is provably capacity-achieving; BICM is suboptimal but
the rate penalty is tolerable.
◮
MLM has to do delicate rate-matching at individual layers,
which is difficult with turbo and LDPC codes.
◮
BICM is well-matched to iterative decoding methods used
with turbo and LDPC codes.
◮
MLM suffers extra latency due to multi-stage decoding
(mitigated in part by the lack of need for protecting the upper
layers by long codes)
◮
With MLM, the overall code is split into shorter codes which
weakens performance (one may mix and match the block
lengths of each layer to alleviate this problem).
L8: Coding for bandlimited channels
polar
36/37
L1: Information theory review
L2: Gaussian channel
This subject, too, has been studied in connection with polar codes.
◮
L3: Algebraic coding
Mahdavifar, H. and El-Khamy, M. and Lee, J. and Kang, I.,
“Polar Coding for Bit-Interleaved Coded Modulation,” IEEE
Trans. Veh. Tech., 2015.
L4: Probabilistic coding
L5: Channel polarization
◮
Afser, H., N. Tirpan, H. Delic, and M. Koca, “Bit-interleaved
polar-coded modulation,” Proc. IEEE WCNC, 2014.
L6: Polar coding
◮
Chen, Kai, Kai Niu, and Jia-Ru Lin. “An efficient design of
bit-interleaved polar coded modulation.” IEEE PIMRC 2013.
L7: Origins of polar coding
◮
...
L8: Coding for bandlimited channels
L9: Polar codes for selected applications
L8: Coding for bandlimited channels
polar
37/37
L9: Polar codes for selected applications
1/27
Lecture 9 – Polar codes for selected applications
◮
◮
Millimeter Wave 60 GHz Communications
Objective: Review the literature on polar coding for selected
applications
Topics
◮
60 GHz wireless
◮
Optical access networks
5G
◮
◮
Ultra reliable low latency communications (URLLC)
◮
Machine type communications (MTC)
◮
5G channel coding at Gb/s throughput
L9: Polar codes for selected applications
2/27
Millimeter Wave 60 GHz Communications
◮
7 GHz of bandwidth available (57-64 GHz allocated in the US)
◮
Free-space path loss (4πd/λ)2 is high at λ = 5 mm but
compensated by large antenna arrays.
◮
Propagation range limited severely by O2 absorption. Cells
confined to rooms.
L9: Polar codes for selected applications
60 GHz Wireless
3/27
Millimeter Wave 60 GHz Communications
Wei et al compare polar codes with the LDPC codes used in the
standard using a nonlinear channel model
◮
Recent IEEE 802.11.ad Wi-Fi standard operates at 60 GHz
ISM band and uses an LDPC code with block length 672 bits,
rates 1/2, 5/8, 3/4, 13/16.
◮
Two papers compare polar codes that study polar coding for
60 GHz applications:
◮
Z. Wei, B. Li, and C. Zhao, “On the polar code for the 60 GHz
millimeter-wave systems,” EURASIP, JWCN, 2015.
◮
Youn Sung Park, “Energy-Effcient Decoders of Near-Capacity
Channel Codes,” PhD Dissertation, The University of
Michigan, 2014.
Wei, B. Li, and C. Zhao, “On the polar code for the 60 GHz millimeter-wave
systems,” EURASIP, JWCN, 2015.
L9: Polar codes for selected applications
60 GHz Wireless
4/27
L9: Polar codes for selected applications
60 GHz Wireless
5/27
Millimeter Wave 60 GHz Communications
Millimeter Wave 60 GHz Communications
Wei et al compare polar codes with the LDPC codes used in the
standard using a nonlinear channel model
Wei et al compare polar codes with the LDPC codes used in the
standard using a nonlinear channel model
Wei, B. Li, and C. Zhao, “On the polar code for the 60 GHz millimeter-wave
Wei, B. Li, and C. Zhao, “On the polar code for the 60 GHz millimeter-wave
systems,” EURASIP, JWCN, 2015.
systems,” EURASIP, JWCN, 2015.
L9: Polar codes for selected applications
60 GHz Wireless
6/27
Polar codes vs IEEE 802.11ad LDPC codes
L9: Polar codes for selected applications
60 GHz Wireless
7/27
Polar codes vs IEEE 802.11ad LDPC codes
Park (2014) gives the following performance comparison.
In terms of implementation complexity and throughput, Park
(2014) gives the following figures.
(Park’s result on LDPC conflicts
with reference IEEE
802.11-10/0432r2. Whether
there exists an error floor as
shown needs to be confirmed
independently.)
Source: Youn Sung Park, “Energy-Efficient Decoders of Near-Capacity Channel
Codes,” PhD Dissertation, The University of Michigan, 2014.
Source: Youn Sung Park, “Energy-Effcient Decoders of Near-Capacity Channel
Codes,” PhD Dissertation, The University of Michigan, 2014.
L9: Polar codes for selected applications
60 GHz Wireless
8/27
L9: Polar codes for selected applications
60 GHz Wireless
9/27
Optical access/transport network
Polar codes for optical access/transport
There have been some studies of polar codes fore optical
transmission.
◮
10-100 Gb/s at 1E-12 BER
◮
OTU4 (100 Gb/s Ethernet) and ITU G.975.1 standards use
Reed-Solomon (RS) codes
◮
◮
A. Eslami and H. Pishro-Nik, “A practical approach to polar
codes,” ISIT 2011. (Considers a polar-LDPC concatenated
code and compares it with OTU4 RS codes.)
◮
Z. Wu and B. Lankl, “Polar codes for low-complexity forward
error correction in optical access networks,” ITG-Fachbericht
248: Photonische Netze - 05, 06.05.2014, Leipzig. (Compares
polar codes with G.975.1 RS codes.)
◮
L. Beygi, E. Agrell, J. M. Kahn, and M. Karlsson, “Coded
modulation for fiber-optic networks,” IEEE Sig. Proc. Mag.,
Mar. 2014. (Coded modulation for optical transport.)
The challenge is to provide high reliability at low hardware
complexity.
L9: Polar codes for selected applications
Optical access
10/27
Comparison of polar codes with G.975.1 RS codes
Optical access
Optical access
11/27
Comparison of polar codes with G.975.1 RS codes
Source: Z. Wu and B. Lankl, above reference.
L9: Polar codes for selected applications
L9: Polar codes for selected applications
Source: Z. Wu and B. Lankl, above reference.
12/27
L9: Polar codes for selected applications
Optical access
13/27
Coded modulation for fiber-optic communication
Coded modulation: BICM approach
Split the 2q ’ary channel into q bit channels and decode them
independently.
Main reference for this part is the paper:
L. Beygi, E. Agrell, J. M. Kahn, and M. Karlsson, “Coded
modulation for fiber-optic networks,” IEEE Sig. Proc. Mag., Mar.
2014.
◮
Data rates 100 Gb/s and beyond
◮
BER 1E-15
◮
Channel model: Self-interfering nonlinear distortion, additive
Gaussian noise
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Sig. Proc. Mag., Mar. 2014.
L9: Polar codes for selected applications
Optical access
14/27
Coded modulation: Multi-level approach
L9: Polar codes for selected applications
Optical access
Coded modulation: BICM approach
Split the 2q ’ary channel into q bit channels and decode them
successively.
Split the 2q ’ary channel into q bit channels and decode them
independently.
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Sig. Proc. Mag., Mar. 2014.
Sig. Proc. Mag., Mar. 2014.
L9: Polar codes for selected applications
Optical access
15/27
16/27
L9: Polar codes for selected applications
Optical access
17/27
Coded modulation: TCM approach
Coded modulation: q’ary coding
Split the 2q ’ary channels into two classes and encode the low-order
channels using a trellis hand-crafted for large Euclidean distance
and ML-decoded
No splitting; 2q ’ary processing applied; too complex
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Sig. Proc. Mag., Mar. 2014.
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Sig. Proc. Mag., Mar. 2014.
L9: Polar codes for selected applications
Optical access
18/27
Coded modulation: Polar approach
L9: Polar codes for selected applications
Optical access
19/27
Coded modulation: performance comparison
Split the 2q ’ary channel into “good”, “mediocre”, and “bad” bit
channels; apply coding only to mediocre channels
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Figure source: Beygi, L., et al, “Coded modulation for fiber-optic networks,” IEEE
Sig. Proc. Mag., Mar. 2014.
Sig. Proc. Mag., Mar. 2014.
L9: Polar codes for selected applications
Optical access
20/27
L9: Polar codes for selected applications
Optical access
21/27
Outline
What is 5G?
Andrews et al.3 answer this question as follows.
◮
◮
What is 5G?
◮
Technology proposals for 5G
◮
Polar coding for 5G
◮
3
L9: Polar codes for selected applications
5G Scenarios
22/27
Technical requirements for 5G
◮
◮
◮
Aggregate: 1000 times more capacity/km2 compared to 4G
◮
Cell-edge: 100 - 1000 Mb/s/user with 95% guarantee
◮
Peak: 10s of Gb/s/user
◮
Very high frequencies and massive bandwidths with very large
no of antennas
◮
Extreme base station and device connectivity
◮
Universal connectivity between 5G new air interfaces, LTE,
WiFi, etc.
Andrews et al., “What will 5G be?” JSAC 2014
L9: Polar codes for selected applications
5G Scenarios
23/27
Key technology ingredients for 5G
Again, according to Andrews et al., 5G will have to meet the
following requirements (not all at once):
◮ Data rates compared to 4G
◮
It willl not be an incremental advance over 4G.
Will be characterized by
It is generally agreed that the 1000x aggregate data rate increase
will be possible through a combination of three types of gains.
◮
Densification of network access nodes
◮
Increased bandwidth (move to mm waves)
Increased spectral efficiency through new communication
techniques:
◮
Round-trip latency: Some applications (tactile Internet,
two-way gaming, virtual reality) will require 1 ms latency
compared to 10-15 ms that 4G can provide
Energy and cost: Link energy consumption should remain the
same as data rates increase, meaning that a 100-times more
energy-efficient link is required
◮
advanced MIMO
◮
improved multi-access
◮
better interference management
◮
improved coding and modulation schemes
No of devices: 10,000 more low-rate devices for M2M
communications, along with traditional high-rate users
L9: Polar codes for selected applications
5G Scenarios
24/27
L9: Polar codes for selected applications
5G Scenarios
25/27
Summary
Outlook
◮
With list-decoding and CRC polar codes deliver comparable
performance to LDPC and Turbo codes used in present
wireless standards
◮
SoA in coding is already close to theoretical limits for
low-order modulation, leaving little margin for improvement
The biggest asset of polar coding compared to SoA is its
universal, flexible, and versatile nature
◮
◮
Universal: the same hardware can be used with different code
lengths, rates, channels
◮
Flexible: the code rate can be adjusted readily to any number
between 0 and 1
◮
Versatile: can be used in multi-terminal coding scenarios
L9: Polar codes for selected applications
Polar code outlook
26/27
◮
There is need for new FEC techniques as we move to 5G
scenarios that call for very high spectral efficiencies and
advanced multi-user and multi-antenna techniques
◮
Extensive research is needed before any FEC method can be
declared a winner for 5G scenarios; the field is wide open for
introducing new techniques
◮
It is likely that the winner will emerge based on a trade-off
between the overall communication performance under a
diverse set of application scenarios and a number of
implementation metrics such as complexity and energy
efficiency
L9: Polar codes for selected applications
Polar code outlook
27/27

Download Report

A Short Course on Polar Coding

Paperzz.com

Your Paperzz