1 A geometric proof of the polarization property arXiv:1706.06764v1 [cs.IT] 21 Jun 2017 Ilya Dumer Abstract—We analyze one version of the successive cancellation (SC) decoding that uses two random functions of the transmitted symbols: their likelihoods and variations of posterior probabilities. The first function increases its expected value on the upgrading channels, while the second does so on the degrading channels. We show that both quantities can be bounded by sin θ and cos θ of another random variable θ that ranges from 0 to π/2. We then present a simple proof that shows that the expected value of sin θ cos θ tends to 0 in the consecutive iterations of the SC algorithm. This proves the polarization property of the SC decoding. Index terms: Polar codes; Reed-Muller codes; successive cancellation decoding; polarization. I. I NTRODUCTION In this paper, we analyze one algorithm of successive cancellation (SC) decoding and give an elementary proof of its polarizing behavior. This SC algorithm was first applied in [1] to the general Reed-Muller codes RM (r, m) and showed that these codes yield vastly different output bit error rates (BER) for different information bits. This disparity was then addressed [1] by eliminating some information bits with the highest BERs. Simulation results of [2] showed that the optimal selection of the eliminated (frozen) bits drastically improves decoding of the original codes RM (r, m). However, the analytical tools used in these and later publications [3]-[4] do not reveal polarization properties of the bit-frozen subcodes of codes RM (r, m) or their capacity reaching performance. A major breakthrough in this area was achieved by E. Arikan [5], who proved that the optimal bit-frozen subcodes of the full codes RM (m, m) - now well known as polar codes - achieve the channel capacity of any symmetric memoryless channel as m → ∞. Paper [5] also proposes a new analytical technique, which reveals some novel properties of generic recursive processing, such as bit polarization. This technique also yields the capacity-achieving subcodes originating from codes RM (r, m) of rate R → 1, such as codes with lim r/m > 1/2. The goal of this paper is to apply some geometric concepts and simplify the original Arikan’s proof [5] of polarization properties of SC decoding. Some other proofs are also presented in [6] and [7]. The proof presented below does not involve stochastic processes or information theory: it only relies on the fact that sin θ, cos θ, and sin θ cos θ are concave functions for any angle θ ∈ [0, π/2]. To do so, we give some well known introductory material in Sections II and III. In Section II, we describe polynomial codes based on the Plotkin (u, u + v)-construction. Then in Section III, we describe conventional SC decoding using two different random variables (rv). For any received symbol y, one rv is the likelihood I. Dumer is with the College of Engineering, University of California, Riverside, CA 92521, USA; email: [email protected] h = P (0|y)/P (1|y). The other rv g = P (0|y) − P (1|y) measures the variation between the posterior probabilities of the transmitted symbols 0 and 1. We show that quantities g and h can be recalculated respectively as products g1 g2 and h1 h2 on the degrading and upgrading channels of the SC decoding. In Sections IV and V, we proceed with the Bhattacharyya parameter Z and the expectation G of the variables g. As pointed out by E. Arikan [10], parameter G is also studied in statistics as the variational distance [11]. Section IV presents some new inequalities for parameters G and Z. Our goal is to show that lim GZ = 0 for almost all sequences of m → ∞ successive channel transformations (except for a fraction of them that declines exponentially in m). To do so, in Section V we re-consider parameters G and Z in terms of a single rv θ ∈ [0, π/2]. We first show that these two parameters can be bounded from above as the expectations of sin θ and cos θ. We then proceed with some new inequalities, which reduce the polarization problem to a much simpler Problem A. Let an angle θ ∈ [0, π/2] be equally likely transformed into one of two complementary angles: θ(0) = arcsin(sin2 θ) or θ(1) = arccos(cos2 θ). Consider all sequences ξ ∈ Fm 2 of m consecutive random transformations. Prove that angles θ(ξ) tend to 0 or π/2 for most sequences ξ as m → ∞. In Fig. 1, we give a preliminary illustration of Problem A and depict three-step transformations of the angle θ = π/4 into angles θ(000) and θ(111) . 1.0 θ(111) 0.8 θ (11) θ (1) 0.6 θ (0 ) 0.4 θ(00) 0.2 0.0 Fig. 1. Original θ θ(000) 0.0 0.2 0.4 0.6 0.8 X 1.0 Three-step transformations of the angle θ = π/4. In Section VI, we address Problem A and show that almost all sequences ξ of m → ∞ transformations satisfy condition sin θ(ξ) cos θ(ξ) → 0 for any original angle θ. This solves the polarization problem. II. R ECURSIVE P LOTKIN CONSTRUCTION RM codes and polar codes can be designed using polynomial constructions. Consider any boolean polynomial f (x) ≡ 2 f1111 x a4 4 f0000 :::fa1a2a3a4 ::: 1 0 1 0 1 0 1 c111 x a3 3 0 1 c11 0 0 a1 : 1 1 1 0 0 0 0 0 1 0 1 1 RM(4; 4) 0 0 1 c0 1 ξ : x1x2 0 0 c00 1 c1 x 1a1 1 c10 c01 1 x 2a2 0 c000 1 η : x2x3x4 1 0 0 RM(5,5) Fig. 2. Decomposition of RM (4, 4) Fig. 3. f (x1 , . . . , xm ) for any x ∈ Fm 2 . We also consider sequences (paths) ξ = (a1 , ..., am ) ∈ Fm 2 and define monomials xξ ≡ xa1 1 · ... · xamm Then any polynomial f (x) is decomposed as follows X X f (x) = xa1 1 fa1 (x2 , ..., xm ) = ... = xa1 1 · ... · xa` ` a1 ,..,a` a1 =0,1 · fa1 ,..,a` (x`+1 , ..., xm ) = ... = X fξ xξ (1) ξ Any step ` = 1, ..., m − 1 ends with the incomplete paths ξ1 ` ≡ (a1 , ..., a` ) that decompose the polynomial f (x) with respect to the first ` variables. Finally, step m defines each bit fξ associated with any path ξ and its monomial xξ . Codes RM (r, m) consist of the maps f (x) : Fm 2 → F2 . Here we take all polynomials f (x) of degree r and all vectors x ∈ Fm 2 , which form positions of our code. Each map generates a codeword X c = c(f ) = fξ c xξ . ξ Here any vector c(xξ ) has weight 2m−w(ξ) , where w(ξ) is the Hamming weight of ξ. Note that for a1 = 0, 1, two polynomials xa1 1 fa1 (x2 , ..., xm ) generate the codewords (c0 , c0 ) and (0, c1 ). Then codewords c = c0 , c0 +c1 of code RM (r, m) are formed by RM codes {c0 } and {c1 } of length 2m−1 . This is the instance of the Plotkin (u, u + v) construction. Similarly, we may further decompose codes RM (r, m) using the Plotkin construction in each step ` = 2, ..., m − 1. The Plotkin construction is also equivalent to the Arikan’s 2 × 2 kernel [5]. Decomposition (1) is shown in Fig. 2 for the code RM (4, 4). Each decomposition step ` = 1, ..., 4 is marked by the splitting monomial xa` ` . For example, path ξ = 0110 gives the information bit f0110 associated with the monomial xξ ≡ x2 x3 . Now consider some subset of k paths T = {ξ(i), i = 1, ..., k} ⊂ Fm 2 Then we encode k information bits via their paths and obtain P ξ codewords c(T ) = c(x ). These codewords form a ξ∈T linear code C(m, T ). Subcode C(m, T ) of code RM(5,5) Fig. 3 presents such a code C(m, T ). Here we use all paths ξ 0 bounded on the left by the path ξ = 11000 (red dashed line) and all paths η 0 bounded by the path η = 01110 (blue dashed line). These two paths generate monomials x1 x2 and x2 x3 x4 . All paths ξ 0 have weights w(ξ 0 ) ≤ 2 and form code RM (2, 5). Similarly, paths η 0 have weights w(η 0 ) ≤ 3 in variables x2 , ..., x5 . Thus, paths η 0 generate a repeated code RM (3, 4). In turn, code C(m, T ) is the sum of the codes generated by the boundaries ξ and η. Construction C(m, T ) also leads to polar codes, which use subsets T ⊂ Fm 2 optimized for the recursive SC decoding. This algorithm is considered in the next section. III. SC DECODING Recursive decoding of the Plotkin construction. Below, we consider transmission over a discrete memoryless channel W with inputs ±1. To do so, we use also symbols (−1)a for any binary input a = 0, 1. In particular, all-zero codeword 0n is mapped onto 1n . The Plotkin construction has the form of c = (u, uv) , where vector uv is the component-wise product of vectors u and v with symbols ±1. For any received symbol y, we will define three interrelated quantities: the posterior probability (PP) q that a symbol c = 1 is transmitted, the offset g, and the likelihood h. These quantities are defined as follows: q = q(y) = Pr{c = 1 | y} g = 2q − 1, h = q/ (1 − q) (2) For example, let W be a binary symmetric channel BSC(ε) with transition error probability p = (1 − ε)/2, where ε ∈ [0, 1]. Then any output y = ±1 gives quantities g(y) = εy, h(y) = (1 + εy)/(1 − εy). (3) Let c = (cj ) be any vector of even length µ. We use notation j` and jr for positions of the left and right halves. Now let y = (yj ) be the received vector corrupted by noise. We then use vectors q = (qj ), g = (gj ) and h = (hj ) with symbols defined in (2). The following recursive algorithm of [2]-[4] performs SC decoding of information bits in the recursive (u, uv) constructions, such as codes R(r, m) or their bit-frozen subcodes C(m, T ). This algorithm is also identical to the conventional SC decoder of [5]. We first wish to derive vector v in the 3 (1) (u, uv) construction. To do so, we first find PP qj symbol vj of vector v of length n/2, for each Algorithm Ψ(m, T ) for code C(m, T ). Given: a vector q = (qj ) of PP. Take i = 1, ..., k and ` = 1, ..., m. (1) qj ≡ Pr{vj = 1 | qj` , qjr } For a path ξ(i) = a1 (i), ..., am (i) in step ` do: (1) The vector q(1) of PP qj represents the corrupted version of vector v. Simple recalculations show that the corresponding (1) (1) offsets gj = 2qj − 1 can be recalculated as (1) gj = gj` gjr (4) Here indices j, j` and jr run through the same set 1, ..., n/2 (since the newly defined vectors g(1) have length n/2). We e ∈ RM (r − may now decode vector g(1) into some vector v 1, m − 1) of length n/2. e , note that two symbols yj` and yjr vej Given vector v represent two corrupted versions of symbol uj in the (u, uv) construction. Then symbol uj has likelihoods hj` and e hjr = v e (hjr ) j in the left and right halves, which gives its overall likelihood (0) hj = hj` e hjr (5) (0) e We can now decode vector h(0) ≡ (hj ) into some vector u ∈ RM (r, m − 1). Observe also that recalculations (4) degrade the original channel, whereas recalculations (5) upgrade it. In the general setting, recalculations (4) and (5) form the level ` = 1 of SC decoding. We then apply these recalculations to the new vectors q(1) and q(0) , which represent the corrupted versions of vectors v and u, and proceed similarly at any level ` = 2, ..., m. Any current path ξ = ξ1 ` receives a PP-vector q(ξ) of length µ = 2m−` . Then we process the v-extension (ξ, 1) using recalculations (4) on the two halves of vector g(ξ) : (ξ,1) gj (ξ) (ξ) = gj` gjr (6) By recursion, we now assume that the path (ξ, 1) returns its e = v e (ξ) to the node ξ. Similarly, we use current output v (ξ) (ξ) recalculations (5) with likelihoods hj` and e hjr for the u(ξ,0) extension h (ξ,0) hj (ξ) (ξ) hjr = hj` e (7) Apply recalculations (6) if a` (i) = 1 Apply recalculations (7) if a` (i) = 0. Output information bit fξ(i) if ` = m. The above algorithm can be extended to the SC list decoding that tracks L most probable code candidates throughout the process and has complexity order of Ln log n. Simulation results of [2]-[4] show that the optimized bit-frozen subcodes substantially outperform the original RM codes in SC list decoding and require much smaller lists. SC list decoding can also be combined with precoding techniques, which can further reduce the output BERs, as shown in [8]. IV. R ANDOM VARIABLES AND THEIR TRANSFORMATIONS IN SC DECODING Consider a code C(m, T ) = C(m, ξ) defined by a single path ξ = (a1 , ..., am ) and let it be used over a discrete memoryless symmetric (DMS) channel W. We now consider a codeword 1n transmitted over this path and assume that e (ξ) = 1 all preceding (frozen) paths give correct outputs v in recursive recalculations (6) and (7). Then for every prefix ξ = (a1 , ..., a` ), we can simplify recalculations (7) as follows (ξ,0) hj (ξ) (ξ) = hj` hjr (8) Recalculations (6) and (8) essentially form a new DMS channel W (ξ) : X → Y (ξ) that outputs a random variable (rv) h(ξ) or g (ξ) starting from the original rv gj or hj . Following [5], [9], we consider the compound channel W (ξ) as an ensemble of some number k of binary symmetric channels W (ξ) (t) = BSC(βt , εt ) that have transition error probabilities pt = (1 − εt )/2 Pk and occur with some probability distribution {βt }, where t=1 βt = 1. We use notation W (ξ) = ∪kt=1 BSC(βt , εt ) Here the new parameters k, εt and βt depend on a specific path ξ. We will use the expectation of the offsets εt over the distribution {βt } : Xk G (ξ) = EGt = βt ε t (9) t=1 Then the current vector h(ξ,0) can be decoded into some vector e (ξ) . Thus, the v-extensions (marked with ones on Fig. 2) u always precede the u-extensions in each decoding step. Finally, the last step gives the likelihood qξ = Pr{fξ = 0 | y} of one information bit fξ on the path ξ. We then choose the more reliable bit fξ . Thus, the decoder recursively retrieves every information symbol fξ moving back and forth along the paths of Fig. 2 or Fig. 3. It is easy to verify [1] that the overall complexity has the order of n log n. Recursive decoding of polar codes. Any subcode C(m, T ) ⊂ RM (m, m) with k paths ξ(1), ..., ξ(k) is decoded similarly. Here we simply drop all frozen paths ξ ∈T / that give information bits fξ ≡ 0. This gives the following algorithm. Recall from (3) that for any BSC(βt , εt ), symbols y give the offsets g(y) = yεt . Also, recalculations (6) use the products (ξ) (ξ) gj` gjr of independent rvs in all steps `. Thus, any degrading channel W (ξ,1) : X → Y (ξ,1) can be considered as the ensemble of the new BSC channels: W (ξ,1) = ∪t,s BSC(βt,s , εt εs ) where t, s = 1, ..., k and βt,s = βt βs . Then X G (ξ,1) = βt,s (εt εs ) = [G (ξ) ]2 t,s Next, consider the Bhattacharyya parameter [5]: p p P Z (ξ) = y∈Y (ξ) W (ξ) (y|0) W (ξ) (y|1) (10) (11) 4 For the BSC(βt , εt ), we obtain the Bhattacharyya parameters 1/2 1/2 1 + εt 1 − εt 1 − εt 1 + εt zt = + 1 − εt 2 1 + εt 2 2 1/2 = 1 − εt (12) (ξ) Thus, the compound channel W gives X q Z (ξ) = Ezt = βt 1 − ε2t t (13) 1/2 For a BSC(βt , εt ) with zt = 1 − ε2t , we will also use an alternative notation BSC(βt r zt ). Similarly to (10), it is also easy to verify that the upgrading channel W (ξ,0) : X → Y (ξ,0) forms the ensemble W (ξ,0) = ∪t,s BSC(βt,s r zt zs ) This gives an important Arikan’s identity [5], [12] X Z (ξ,0) = βt,s zt zs = [Z (ξ) ]2 t,s (14) (15) We will now relate parameters G (ξ) and Z (ξ) . Lemma 1: For any channel W (ξ) , q (16) 1 − G (ξ) ≤ Z (ξ) ≤ 1 − [G (ξ) ]2 √ 1 − x2 is a concave function. Also, Proof. Note that √ 2 1 − x ≥ 1 − x for any x ∈ [0, 1]. Then the lower bound in (16) follows from the definitions (9) and (13). We also apply the Jensen inequality to (13) to obtain the upper bound. Consider an ensemble of 2` equiprobable paths ξ = (a1 , ..., a` ). Our main goal is to prove that for ` → ∞, most paths ξ (with the exception of a vanishing fraction) achieve polarization, so that (G (ξ) , Z (ξ) ) → (0, 1) (G (ξ) , Z (ξ) ) → (1, 0) Lemma 2: For any channel W hold for ` → ∞ if and only if (ξ) Proof. We first rewrite equalities (9), (13) and (18) in the angular form using parameters θt . Then X X G (ξ) = E(εt ) = βt cos θt ≤ cos βt θt = cos θ t X t X (ξ) Z = E(zt ) = βt sin θt ≤ sin βt θt = sin θ t t Here we apply the Jensen inequality for the concave functions sin x and cos x with 0 ≤ x ≤ π/2. Lemma 4: For the channels W (ξ,1) and W (ξ,0) , U (ξ,1) ≤ cos2 θ (1 − cos4 θ)1/2 (21) 2 4 (ξ,0) 1/2 U ≤ sin θ (1 − sin θ) (22) Proof. Consider the channel W (ξ,1) defined in (10). According to (13) and (19), hX i hX i U (ξ,1) = βt,s εt εs βi,j (1 − ε2i ε2j )1/2 t,s i,j Here all indices i, j, t, s run from 1 to k. Note that X βt,s εt εs = E(εt εs ) = E2 (εt ) ≤ cos2 θ. t,s Also, (1 − ε2i ε2j )1/2 is a concave function of the variable x = εi εj . Then h i1/2 X 1/2 2 βi,j (1−ε2i ε2j )1/2 ≤ 1 − [E(εi εj )] = 1 − E4 (εi ) i,j This proves (21). Similarly, hX i hX U (ξ,0) = βt,s zt zs t,s (17) To prove this, we introduce a single function U (ξ) = G (ξ) Z (ξ) Lemma 3: For any compound channel W (ξ) , parameters G and Z (ξ) satisfy relations P G (ξ) = Pt βt cos θt ≤ cos θ (20) Z (ξ) = t βt sin θt ≤ sin θ (ξ) i,j βi,j (1 − zi2 zj2 )1/2 i Then we obtain (22) by repeating the previous case. VI. P ROOF OF POLARIZATION PROPERTY (18) , asymptotic equalities (17) U (ξ) → 0 Proof. The “only if” part follows from the definition (18). The “if” part follows from (16). Indeed, G (ξ) +Z (ξ) ≥ 1 and G (ξ) , Z (ξ) ≤ 1. One of these two quantities tends to 0 if U (ξ) → 0. Then we obtain asymptotic equality G (ξ) +Z (ξ) → 1. This gives (17). V. P OLARIZATION PARAMETERS IN POLAR The following theorem proves polarization property and also solves Problem A of Section 1. Theorem 1: Paths ξ = (a1 , ..., a` ) of any length ` satisfy inequality ` E U (ξ) ≤ (0.87) (23) Corollary 1: Most paths ξ = (a1 , ..., a` ), except the fraction `/2 `/2 (0.87) of them, satisfy inequality U (ξ) < (0.87) and yield polarization property U (ξ) → 0 as ` → ∞. Proof. Consider the ensemble N of equiprobable paths ξ = (a1 , ..., a` ). For each ξ, the bit a`+1 takes values 0 and 1 equally likely. Then the quantity U (ξ,a`+1 ) has the mean value h i EU (ξ,a`+1 ) = U (ξ,0) + U (ξ,1) /2 COORDINATES Given any DMS channel W (ξ) = ∪kt=1 BSC(βt , εt ) we now define the angular parameters θt and their mean θ = θ(ξ) : P θt = arccos εt = arcsin zt , θ = Eθt = t βt θt (19) We have the following important lemmas. For every ξ, we can now consider a random angle Θ ∈ [0, π/2] that complementary angles θ(0) and θ(1) ( θ(0) = arcsin sin2 θ(ξ) Θ= θ(1) = arccos cos2 θ(ξ) the angle θ = θ(ξ) and equally likely takes two such that if a`+1 = 0 if a`+1 = 1 5 This is the setting of Problem A of Section 1. Now the upper bounds (20), (21), and (22) give inequalities U (ξ) ≤ r(θ), EU (ξ,a`+1 ) ≤ r(Θ) where [4] [5] [6] r(θ) = sin θ cos θ, r(Θ) = sin Θ cos Θ = = [ sin2 θ (1 − sin4 θ)1/2 + cos2 θ (1 − cos4 θ)1/2 ]/2 Next, note that r(Θ)/r(θ) < 0.87 for any θ ∈ [0, π/2] with the maximum at θ = π/4, as seen in Fig. 4. In turn, this implies that for the paths ξ of length `, the function r(θ(ξ) ) has the expected value [7] [8] [9] [10] [11] ` E r(θ(ξ) ) < (0.87) r(π/4) [12] which completes the proof. [13] r(Θ) / r (θ) 0.85 [14] 0.80 [15] 0.75 0.70 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 θ The ratio r(Θ)/r(θ) of functions r(θ) = sin θ cos θ and r(Θ) = E( sin Θ cos Θ) Fig. 4. Remarks. A slightly stronger version of Corollary 1 shows that almost all paths (except a vanishing fraction) satisfy inequality U (ξ) ≤ c−`/2 , where we can take any c > 1/2 as ` → ∞. Indeed, it is easy to verify that r(Θ)/r(θ) → 2−1/2 as r(θ) → 0, which in turn holds for almost all paths ξ of length log2 ` → ∞. More precise arguments, which use the concave functions rλ (θ) of a vanishing degree λ > 0, show that EU (ξ) has the order below c−` , where c > 1/2. Finally, the above technique can be used to obtain fast polarization of the order h i log2 EU (ξ) < 2−`/2+f (`) where f (`) is any function such that f (`)`−1/2 → ∞. However, the proof of this fact is more involved and does not simplify similar results of the papers [13] - [15]. Acknowledgment. The author thanks E. Arikan and I. Tal for helpful comments. R EFERENCES I. Dumer, “Recursive decoding of Reed-Muller codes,” Proc. 37 th Allerton Conf. on Commun., Cont., and Comp., Monticello, IL, USA, 1999, pp. 61-69 (http://arxiv.org/abs/1703.05303). [2] I. Dumer and K. Shabunov, “Recursive constructions and their maximum likelihood decoding,” Proc. 38th Allerton Conf. on Commun., Cont., and Comp., Monticello, IL, USA, 2000, pp. 71-80 (http://arxiv.org/abs/1703.05302). [3] I. Dumer and K. Shabunov, “Near-optimum decoding for subcodes of Reed-Muller codes,” 2001 IEEE Intern. Symp. Info. Theory, Washington DC, USA, June 24-29, 2001, p. 329. [1] I. Dumer and K. Shabunov, “Soft decision decoding of Reed-Muller codes: recursive lists,” IEEE Trans. Info. Theory, vol. 52, no. 3, pp. 1260-1266, 2006. E. Arikan, “Channel polarization: a method for constructing capacityachieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Info. Theory, vol. 55, no. 6, pp. 3051-3073, 2009. V. Guruswami and P. Xia, “Polar Codes: speed of polarization and polynomial gap to capacity,” vol. 61, no. 1, pp. 3-16, 2015. M. Alsan and E. Telatar, “A simple proof of polarization and polarization for non-stationary memoryless channels,” IEEE Trans. Info. Theory, vol. 62, no. 9, pp. 4873-4878, 2016. I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inform. Theory, vol. 61, no. 5, pp. 2213–2226, 2015. S. B. Korada, “Polar Codes for Channel and Source Coding,” Ph.D. thesis, Ecole Polytechnique Federale De Lausanne, 2009. E. Arikan, Private communication, March 2017. J. Duchi, “Lecture Notes for Statistics 311/Electrical Engineering 377”, Stanford University, 2016, https://stanford.edu/class/stats311/Lectures/full notes.pdf T. S. Jayram and E. Arikan, “A note on some inequalities used in channel polarization and polar coding,” To appear in IEEE Trans. Info. Theory, 2017. E. Arıkan and E. Telatar, “On the rate of channel polarization,” in Proc. IEEE Intern. Symp. Inform. Theory (ISIT’2009), Seoul, South Korea, 2009, pp. 1493–1495. S. B. Korada, E. Sasoglu, and R. Urbanke, “Polar codes: characterization of exponent, bounds, and constructions,” IEEE Trans. Inform. Theory, vol. 56, no. 12, pp. 6253–6264, 2010. I. Tal, “A simple proof of fast polarization,” available at https://arxiv.org/abs/1704.07179, April 2017.
© Copyright 2026 Paperzz