1 An Iterative Geometric Mean Decomposition Algorithm for MIMO Communications Systems Chiao-En Chen, Member, IEEE, Yu-Cheng Tsai, and Chia-Hsiang Yang, Member, IEEE Abstract—This paper presents an iterative geometric mean decomposition (IGMD) algorithm for multiple-input-multipleoutput (MIMO) wireless communications. In contrast to the conventional geometric mean decomposition (GMD) algorithm, the proposed IGMD does not require the explicit Kth root computation in the preprocessing stage but depends on a carefully constructed iterative procedure that generates the GMD in its limit. We prove analytically that the proposed IGMD is guaranteed to converge to the exact GMD under certain sufficient conditions, and propose three different constructions achieving this condition. Both numerical simulations and complexity analysis of the proposed IGMD have been conducted and compared with the conventional GMD. Simulation results show that our new IGMD algorithm effectively reduces the complexity overhead and hence is more advantageous for low-complexity implementations. Index Terms—Geometric mean decomposition (GMD), MIMO, QR, SVD, Tomlinson-Harashima precoding (THP) I. I NTRODUCTION M ULTIPLE-input-multiple-output (MIMO) communications [1, 2] have continued to be one of the key technologies of the next generation wireless systems because of their potential to provide higher data rate and better reliability compared to the conventional single-input-singleoutput (SISO) systems. When the channel state information (CSI) is available at both the transmitter and receiver, it is well known that the closed-loop gain can be further acquired by jointly designing the precoder and the equalizer. Among these closed-loop transceiver design schemes, singular-valuedecomposition (SVD)-based linear transceiver decomposes the MIMO channel into multiple parallel subchannels and is known to achieve the channel capacity if proper power allocation [3] is applied. However, because of the variation of the signal-to-noise-ratio (SNR) in each subchannel, the bit error rate (BER) performance is dominated by the subchannel with the worst SNR. Consequently, without sophisticated bitallocation schemes, fundamental trade-off between the BER and capacity cannot be avoided in this type of design [4, 5]. In addition to the SVD-based linear design, geometricmean-decomposition (GMD)-based nonlinear transceiver design has also been proposed [4, 6]. With the help of GMD [6, 7], the MIMO channel is decomposed into multiple subchannels with identical SNR, and hence the simple identical This work was supported by the National Science Council (NSC), Taiwan, R.O.C. under Grant Number NSC 102-2221-E-194-009-MY2. Chiao-En Chen is with the Department of Electrical Engineering and the Department of Communications Engineering, National Chung Cheng University, Chiayi, Taiwan, R.O.C. (e-mail: [email protected]). Yu-Cheng Tsai and Chia-Hsiang Yang are with the Electronics Engineering Department, National Chiao Tung University, Hsinchu, Taiwan, R.O.C. bit allocation can be used for all subchannels. It has also been shown that the GMD-based transceiver under the zero-forcing (ZF) constraint asymptotically achieves both the optimal BER and capacity at sufficiently high SNR. Because of these good properties, various extensions and generalizations of GMDbased transceivers have been proposed in the literature [5, 8]– [13]. As the GMD is the core of many advanced MIMO transceiver designs, the associated implementation issues began to draw researchers’ attention [14, 15]. In [14], a scaled GMD algorithm was proposed to simplify the detection logic. In [15], the authors presented a constant throughput GMD implementation which also supports hardware sharing between precoding and signal detection modules. In this paper, a new implementation issue of the GMD algorithm is addressed. It is noted that the existing GMD algorithms require the computation of the geometric mean (GM) σ̄ of all the positive singular values in the pre-processing stage. √ This requires the capability of computing the Kth root K A of some positive real number A and hence results in additional complexity overhead. In this paper, we propose1 an iterative GMD (IGMD) algorithm based on the successive approximation method. The advantages of the proposed IGMD algorithm as well as our main contributions are summarized as follows. 1) A new algorithm for computing the geometric mean decomposition is proposed. Unlike the conventional GMD algorithm, the proposed algorithm has a regular structure that simplifies the control logics and is easier to accommodate different signal dimensions from the hardware implementation perspective. Another important feature of the proposed IGMD algorithm is that it does not require the explicit Kth root computation of the geometric mean σ̄ but depends on a carefully constructed iterative procedure that generates the GMD in its limit. The proposed algorithm substantially reduces the complexity overhead and hence is more advantageous compared to the conventional GMD for applications with limited computing capability. 2) We prove analytically that the proposed IGMD algorithm always converges to the exact GMD in its limit under certain sufficient conditions. From the sufficient condition, we propose three different constructions: IGMDAM (arithmetic mean), IGMD-GM (geometric mean), 1 Although we we have focused on the GMD problem of a point-to-point MIMO channel in this paper, the proposed IGMD algorithm can be easily extended to other GMD applications such as the BD (block diagonal)-GMD [8] precoders design in a multi-user MIMO scenario. 2 and IGMD-HM (harmonic mean), and verify their convergence numerically. We also find that the convergence behaviour of the proposed IGMD not only depends on the topological property of the mapping (to be designed) but also on how the algorithms are initialized. Meansquare-error (MSE) and error rate performance using different initializations such as QR factorization, QR factorization with V-BLAST (Vertical-Bell Laboratories Layered Space-Time) [16, 17] sorting, singular value decomposition (SVD), and interleaved-SVD have been studied through numerical simulations. 3) The proposed IGMD algorithm under various constructions and initializations is implemented using CORDIC (COordinate Rotation DIgital Computer) arithmetic [18] and compared with the conventional GMD from the complexity perspective. The complexity of the building blocks in the conventional GMD and the proposed IGMD algorithms have been analyzed and the overall performance versus complexity tradeoff has been simulated. The rest of this article is organized as follows. Section II reviews the geometric mean decomposition as well as its conventional implementation algorithm. Section III presents the proposed new IGMD algorithm. Analytical proof for the sufficient condition such that the proposed IGMD converges to the exact GMD is then provided. In Section IV, we explicitly show three different constructions of the proposed IGMD that can satisfy the required sufficient condition. Section V studies the MSE and error rate performance of different constructions of the proposed IGMD algorithm. Detailed computational complexity IGMD algorithm is also provided in comparison to the conventional GMD. Finally, Section VI concludes the paper. Notations: Throughout this paper, matrices and vectors are set in boldface, with uppercase letters for matrices and lower case letters for vectors. The superscripts T , H denote the transpose and conjugate transpose of a matrix, respectively. We use diag{x1 , · · · , xK } to represent the diagonal matrix with diagonal elements {x1 , · · · , xK }, [X]p,q to represent the (p, q)th component of X, and [X]m:n,p:q to represent the submatrix formed by the consecutive mth to nth rows and pth to qth columns of X. We use the expression A := B to denote the in-place update operation in which the value in A is updated by the value of B. II. R EVIEW ON THE S TANDARD G EOMETRIC M EAN D ECOMPOSITION A LGORITHM In this section, we review the main results of the GMD and its implementation. From [6, 7], it has been shown that given an N × M matrix H of rank K, there exists semi-unitary matrices Q ∈ CN ×K and S ∈ CM ×K , and an upper triangular matrix R ∈ RK×K such that H = QRSH , (1) where the diagonal elements of R are all identical and equal to the geometric mean σ̄ of the positive singular values of H, that is, √ σ̄ = A = K K Y i=1 σi !1/K , for all i = 1, · · · , K, (2) where σ1 ≥ σ2 ≥ · · · ≥ σK > 0. The decomposition of H in (1) is referred as the QRS decomposition [6] or GMD [7] in the literature. An efficient implementation procedure [19] which computes Q, R, and S from H is described as follows. Step 1. The algorithm starts with the SVD of H, given by H = UΣVH , (3) where U ∈ CN ×K and V ∈ CM ×K are both semiunitary and Σ = diag{σ1 , · · · , σK }. The algorithm then sets Q := U, S := V, and R := Σ for initialization, and computes the geometric mean σ̄ of the positive singular values of H via (2). The algorithm then starts from k := 1. Step 2. At stage k, where k ranges from 1 to K − 1, the algorithm performs the following procedure. The algorithm first checks the (k, k)th element of R, denoted as Rk,k . If Rk,k ≥ σ̄, then the algorithm chooses some p > k such that Rp,p ≤ σ̄; otherwise the algorithm chooses some p > k such that Rp,p ≥ σ̄. After p has been determined, the algorithm swaps the Rp,p with Rk+1,k+1 , Q:,p with Q:,k+1 , and S:,p with S:,k+1 . This can be achieved by setting R := P(k)T RP(k) , (k) Q := QP (k) S := SP (4) , (5) , (6) where P(k) is the associated permutation matrix. (k) (k) Step 3. Construct 2 × 2 matrices ΘL and ΘR as 1 cRk,k sRk+1,k+1 (k) , (7) ΘL = cRk,k σ̄ −sRk+1,k+1 c −s (k) ΘR = , (8) s c such that Rk,k (k) ΘL 0 0 Rk+1,k+1 (k) ΘR = σ̄ 0 ⋆ Rk,k Rk+1,k+1 σ̄ (9) Here ⋆ represents some number (generally nonzero) that we don’t care. It is straightforward to verify that (9) can always be achieved by choosing s 2 p σ̄ 2 − Rk+1,k+1 c= , s = 1 − c2 . (10) 2 2 Rk,k − Rk+1,k+1 (k) (k) Step 4. Construct GL , and GR from the identity ma(k) trix IK with the submatrix [GL ]k:k+1,k:k+1 and (k) (k) (k) [GR ]k:k+1,k:k+1 replaced by ΘL and ΘR , respectively. Update R, Q, and S as (k) (k) R := GL RGR , (11) QGT L, (12) (13) Q := S := SGR . . 3 Step 5. If k = K−1, then the algorithm terminates; otherwise, the algorithm updates k := k + 1 and goes back to step 2. It follows that the matrices Q, R, and S generated by the above-mentioned QRS algorithm can be explicitly expressed as (1)T Q =UP(1) GL S R (K−1)T · · · P(K−1) GL , (1) (K−1) =VP GR · · · P(K−1) GR , (K−1) (K−1)T (1) (1)T =GL P · · · GL P Σ (1) (1) (K−1) (K−1) · P GR · · · P GR . (1) (14) (15) (16) Note that the aforementioned GMD algorithm has to compute the geometric mean σ̄ of all the positive singular values as in (2), and hence requires the capability of computing Kth root of A. For special cases where K = 2L with L being some positive integer, it is possible to decompose the computation of σ̄ into successive geometric mean computations of two numbers: r q q √ √ √ √ σ1 σ2 σ3 σ4 · · · σK−3 σK−2 σK−1 σK , σ̄ = · · · where the square root operation can be carried out efficiently using CORDIC-based computing √ [18]. On the other hand, for general cases where K 6= 2L , K A has to be performed with √ much more efforts. One way of computing K A is by first transforming A into logarithmic domain and then converting it back after dividing by K. This approach requires a large look-up table and a piecewise polynomial (including linear) approximation device to realize both the logarithmic and exponential functions and hence calls for a mass of memory to ensure the accuracy for such high dynamic-range computation. Another possible way of finding σ̄ for the general case is to compute it iteratively using Newton’s type of Kth-root algorithm [20], which often requires a good starting point to ensure reasonable rate of convergence. A computationally efficient Kth-root algorithm that can be viewed as a slight modification of the Newton’s algorithm using the technique of binary approximation is described in [21]. It is worthwhile noting that the accuracy of σ̄ plays an important role in the conventional GMD algorithm. This is because the conventional GMD algorithm proceeds in a sequential fashion, and hence any numerical error in σ̄ not only causes numerical errors at each stage but also propagates and accumulates to all the later stages. As a consequence, σ̄ has to be computed with sufficient accuracy for the conventional GMD algorithm to function properly, which results in nonnegligible complexity overhead. III. P ROPOSED I TERATIVE G EOMETRIC M EAN D ECOMPOSITION A LGORITHM To mitigate the complexity overhead in the pre-processing stage of the conventional GMD algorithm, a new iterative GMD algorithm is proposed in this section. The main idea of the proposed IGMD is to properly design the planar rotations similar to those used in the conventional GMD so that the spread of the diagonal elements of the updated R matrix can be gradually reduced as the algorithm proceeds. With proper design, it can be shown that the proposed IGMD can achieve the exact GMD in the limit without the computation of σ̄, and hence avoids the requirement of Kth-root computation. As it will be elaborated in Section V, this feature brings in performance advantages for applications with limited computing capability. The proposed iterative GMD algorithm is described as follows. Initialization: Given the matrix H ∈ CN ×M of rank K ≤ min(N, M ), the algorithm starts with some general orthogonal decomposition of H H = ŨR̃ṼH , (17) where Ũ ∈ CN ×K and Ṽ ∈ CM ×K are both semi-unitary, and R̃ ∈ CK×K is upper-triangular. For general N , M , and K, one can always choose the SVD for initialization. For special cases where H is full column rank with M = K, other orthogonal decompositions such as the QR decomposition [22] can also be used. The algorithm initializes by setting Q := Ũ, S := Ṽ, R := R̃, and starts with iteration index ℓ := 1. Iteration: In each iteration, the algorithm performs K − 1 stages of operations with the stage index k ranging from 1 to K − 1. At stage k, the algorithm first computes the SVD for the 2 × 2 submatrix of R Rk,k Rk,k+1 (k) (k)H Rk:k+1,k:k+1 = = U(k) , γ Σ γ Vγ 0 Rk+1,k+1 (18) (k) (k) where the singular matrices Uγ ∈ C2×2 n and Voγ ∈ (k) (k) (k) 2×2 C are both unitary, and Σγ = diag σγ,1 , σγ,2 is a (k) (k) diagonal matrix with singular values σγ,1 and σγ,2 . With(k) (k) out loss of generality, we assume σγ,2 ≤ σγ,1 . After the singular values are obtained, carefully designed planar rotations are then applied to obtain an upper triangular matrix with positive diagonal elements Ω (Rk,k , Rk+1,k+1 ) and Rk,k Rk+1,k+1 /Ω (Rk,k , Rk+1,k+1 ), where Ω is a continuous mapping from (0, ∞) × (0, ∞) to (0, ∞) with some desired property to be discussed in details shortly. In matrix notations, we then have # " Ω (Rk,k , Rk+1,k+1 ) ⋆ (k) (k) (k) . ΦL Σγ Φ R = Rk,k Rk+1,k+1 0 Ω(Rk,k ,Rk+1,k+1 ) (19) (k) (k) Note that the planar rotations ΦL and ΦR applied in (19) aliT h (k) (k) ways exist as long as σγ,1 , σγ,2 multiplicatively majorizes T [Ω (Rk,k , Rk+1,k+1 ) , Rk,k Rk+1,k+1 /Ω (Rk,k , Rk+1,k+1 )] , (k) (k) or equivalently when σγ,2 ≤ Ω (Rk,k , Rk+1,k+1 ) ≤ σγ,1 (k) holds [23, 24]. It is easy to verify that the matrices ΦL and (k) ΦR can be constructed as " # (k) (k) 1 cσγ,1 sσγ,2 (k) , (20) ΦL = (k) (k) Ω (Rk,k , Rk+1,k+1 ) −sσγ,2 cσγ,1 c −s (k) ΦR = , (21) s c 4 where v 2 u (k) u 2 p u Ω (Rk,k , Rk+1,k+1 ) − σγ,2 , s = 1 − c2 . (22) c=u 2 2 t (k) (k) σγ,1 − σγ,2 Combining the relations in (18) and (19), we then obtain (k) (k) ΘL Rk:k+1,k:k+1 ΘR " Ω (Rk,k , Rk+1,k+1 ) = 0 (k) (k) ⋆ Rk,k Rk+1,k+1 Ω(Rk,k ,R)k+1,k+1) (k)H (k) (k) # , (k) k:k+1,k:k+1 and ΘR , respectively. The matrices R, Q, and S are finally updated as (k) (k) R := GL RGR , Q := S := (24) (k)T QGL , (k) SGR . (25) (26) It is clear that R remains upper-triangular, whereas Q and S both remain unitary after (24)-(26) are performed at the end of each stage. If the stage index k is smaller than K − 1, the algorithm sets k := k+1 and performs the procedure (18)-(26). Otherwise, the algorithm sets the iteration index ℓ := ℓ+1 and starts a new iteration unless the prescribed number of iterations is attained. For convenience of subsequent discussion, we denote Q[ℓ] , [ℓ] R , and S[ℓ] as the updated Q, R, S, respectively, at the end of (K − 1)th stage in the ℓth iteration. It is then easy to verify that the following relations hold for the proposed IGMD algorithm: (1)T Q[ℓ+1] =Q[ℓ] GL (K−1)T · · · GL , (1) (K−1) S[ℓ+1] =S[ℓ] GR · · · GR , (K−1) (1) [ℓ] (1) [ℓ+1] R =GL · · · GL R GR (27) (28) (K−1) . · · · GR (29) for all ℓ = 0, 1, · · · . Here Q[0] , S[0] , and R[0] are defined as Ũ, o Ṽ, and R̃, respectively. The planary rotation matrices n n o (k) K−1 (k) ℓ→∞ k=1 k,k IV. D ESIGN OF MAPPING Ω For the ease of following discussions, we introduce several new notations. For the given τ > 0, we define a subset A(τ ) ⊂ RK : ) ( K Y K xk = τ , A(τ ) = x ∈ R x > 0, k=1 where x = [x1 , · · · , xK ]T . We also define continuous mappings T (j) : A(τ ) → A(τ ), j = 1, · · · , K − 1, given by x1:j−1 x1:j−1 Ω (xj , xj+1 ) xj . xj xj+1 (31) T (j) xj+1 = Ω(x j ,xj+1 ) xj+2:K xj+2:K If we denote the vector on the main diagonal of R[ℓ] as r[ℓ] , then the diagonal vectors of R[ℓ+1] and R[ℓ] can be related from (24) and (29) using the new notations r[ℓ+1] = T (K−1) · · · T (2) T (1) r[ℓ] ··· (32) (33) = T r[ℓ] = T ℓ+1 r[0] , where T (x) = T (K−1) ◦ T (K−2) ◦ · · · ◦ T (2) ◦ T (1) (x), (34) n o r[0] = diag R̃ , and T ℓ+1 (x) is the (ℓ + 1)-fold repeated composition of T (x). With these new notations, the main results for designing Ω is given by the following Proposition. Proposition 1 If the mapping Ω : (0, ∞) × (0, ∞) → (0, ∞) satisfies the following property z1 z2 ≤ z1 + z2 , (35) Ω(z1 , z2 ) + Ω(z1 , z2 ) for all z1 , z2 > 0, with equality holds when z1 = z2 , then K−1 GL and GR clearly also depend on the k=1 k=1 iteration index ℓ, but the dependency is not denoted explicitly in (27)-(29) for simplicity as long as no confusion results. Unlike the conventional GMD that requires proper swapping of (4)-(6) that depends on the value σ̄, the proposed IGMD algorithm does not require the computation of σ̄ and always performs on the diagonal elements of R with consecutive indices at each stage. The IGMD algorithm therefore has a more regular structure that simplifies the control logics and is easier to accommodate to problems of different dimensions from the implementation perspective. These advantages rely k,k for all k = 1, · · · , K. The exact GMD is therefore achieved when the algorithm converges. (23) where ΘL = ΦL Uγ , and ΘR = Vγ ΦR . Because (k) (k) ΘL and ΘR are both products of unitary matrices, they are unitary matrices as well. (k) (k) (k) (k) After ΘL and ΘR are obtained, GL and GR are then from the identity hconstructed i h imatrix IK with the submatrix (k) (k) (k) GL and GR replaced by ΘL k:k+1,k:k+1 (k) on the careful design of Ω. In the following section, we show that it is possible to design mapping Ω such that !1/K K h i i h Y [ℓ] = σ̄ = lim R R̃ , (30) lim r[ℓ] = σ̄1. ℓ→∞ Proof QK (36) h i [0] [0] Given r[0] = r1 , · · · , rK = diag{R̃}, we let τ = [0] , and consider k=1 rkP K F (x) = k=1 xk . From the function F : A(τ ) → (0, ∞), the arithmetic mean-geometric mean (AM-GM) inequality, we have !1/K K K X Y xk ≥ K F (x) = = Kτ 1/K = K σ̄, (37) xk k=1 k=1 5 in which the absolute minimum of F (x) is attained in A(τ ) at x = σ̄1. In addition, we have the following inequality if Proposition 1 holds: j−1 K X X F T (j) (x) = xk + Ω(xj , xj+1 ) xk + k=1 V. S IMULATION R ESULTS AND C OMPLEXITY C OMPARISON k=j+2 xj xj+1 Ω(xj , xj+1 ) K X xk = F (x), ≤ + (38) k=1 with equality achieved when xj = xj+1 . It follows readily that T (x), which is a composite mapping of T (1) , · · · , T (K−1) , satisfies F (T (x)) ≤ F (x), (39) with the equality holds when x1 = x2 = · · · = xK . Consequently, y [ℓ−1] = F (T (r[ℓ−1] )) is a monotonically decreasing sequence in (0, ∞), and hence is guaranteed to converge to the greatest lower bound K σ̄ [25]. As F is continuous, we then have = K σ̄, (40) = F lim T r[ℓ−1] lim F T r[ℓ−1] ℓ→∞ ℓ→∞ which is attained when limℓ→∞ T r[ℓ−1] = σ̄1. As a result, we have limℓ→∞ r[ℓ] = limℓ→∞ T r[ℓ−1] = σ̄1, which completes the proof. There exists potentially many functions that satisfy condi√ tion (35). The geometric mean ΩGM (z1 , z2 ) = z1 z2 clearly satisfies Proposition 1 as √ z1 z2 ΩGM (z1 , z2 ) + = 2 z1 z2 ≤ z1 + z2 , (41) ΩGM (z1 , z2 ) because of the AM-GM inequality. In addition to ΩGM (z1 , z2 ), the arithmetic mean ΩAM (z1 , z2 ) = (z1 + z2 )/2 is another choice that also satisfies Proposition 1. This can be observed by squaring both sides of the AM-GM inequality 4z1 z2 ≤ (z1 + z2 )2 ⇔ (z1 + z2 )2 + 4z1 z2 ≤ 2(z1 + z2 )2 z1 + z2 2z1 z2 ⇔ + ≤ z1 + z2 2 z1 + z2 constructions not only depends on the topological property of the mapping but also depends on how the algorithms are initialized. (42) As a result, ΩAM (z1 , z2 )+ ΩAMz1(zz12,z2 ) ≤ z1 +z2 , with equality holds if and only if z1 = z2 . Note that ΩAMz1(zz12,z2 ) is simply the harmonic mean (HM) function ΩHM (z1 , z2 ) = 2z1 z2 /(z1 +z2 ) satisfying ΩAM (z1 , z2 ) = z1 z2 /ΩHM (z1 , z2 ). It is then clear that ΩHM (z1 , z2 ) also satisfies Proposition 1 from the same relation we obtained in (42). Based on ΩAM , ΩGM , and ΩHM , we can then construct three different types of IGMD algorithms: IGMD-AM, IGMDGM, and IGMD-HM, respectively. As these mappings are highly nonlinear, it is very difficult to compare the convergence speed of the proposed algorithm analytically in these three constructions. Hence we resort to computer simulations as shown in Section V and leave the more challenging theoretical analysis to our future work. In fact, as it will be observed from the simulation results, the convergence behaviour of different In this section, we present simulation results of three different types of the proposed IGMD algorithms. Throughout the simulation, we assume standard K ×K i.i.d. (independent and identically distributed) Rayleigh fading channel in which every element in the channel matrix H is modelled as a zero-mean circularly symmetric complex Gaussian random variable with unit variance. To highlight the applicability of the proposed algorithm in the challenging K 6= 2L case, we choose K = 7 in most of the simulation. Each simulation point in the figure is averaged over 104 channel realizations. A. Convergence of the proposed IGMD Figures 1(a) and 1(b) show the MSE of the diagonal elements of R using SVD and QR factorization as initialization, respectively. For SVD initialization, it is possible to exploit the degrees-of-freedom from the ordering of singular values to enable more efficient averaging in each stage. Based on this idea, we propose to use an interleaved-SVD (intrlv-SVD) so that a large Rk,k tends to be averaged with a small Rk+1,k+1 and vice versa. To be more specific, we use the following factorization: ˜ Σ̃ ˜ Ṽ ˜ H ∈ C7×7 , H = Ũ (43) ˜ and ˜ = diag {[σ , σ , σ , σ , σ , σ , σ ]}, and Ũ where Σ̃ 1 7 2 6 3 5 4 ˜ are the corresponding left and right singular matrices, Ṽ respectively. For QR initialization, we also propose to use VBLAST ordering (VBQR) [16, 17] to speed up convergence. This idea is motivated by the fact that the diagonal elements of R̃ in VBQR generally has less spread [26] than those in QR. Hence, using VBQR as initialization generally requires fewer iterations to achieve the same MSE compared to that of using standard QR initialization as shown in Fig. 1(b). From Figs. 1(a) and 1(b), the simulation results show that the IGMD-HM achieves the same MSE with smallest number of iterations, followed by the IGMD-GM, and the IGMD-AM when QR, VBQR, and SVD are used as initialization. On the other hand, when intrlv-SVD is used, the IGMD-AM achieves the same MSE with smallest number of iterations, followed by the IGMD-GM, and finally the IGMD-HM. In the second simulation setting, we investigate the error rate performance of the proposed IGMD applied to a 7 × 7 GMDbased ZF Tomlinson-Harashima precoded (ZFTHP) MIMO system [4] with 16-quadrature amplitude modulation. Figs. 2(a) and 2(b) show the error rate of the proposed IGMD using QR and VBQR, respectively. By comparing Figs. 2(a) and 2(b), it is observed that VBQR provides a better initialization for the proposed IGMD, and results in faster convergence. At the first iteration, the error rate of IGMD-AM and IGMD-HM appears to be similar. For iteration number greater than 1, the IGMD-GM and IGMD-HM both outperform the IGMD-AM and perform very close to the exact GMD after four iterations. 6 0 10 −1 10 −1 10 −2 10 10 BER MSE −2 ZFTHP−QR IGMD−ZFTHP−QR−AM (iter=1) IGMD−ZFTHP−QR−AM (iter=2) IGMD−ZFTHP−QR−AM (iter=4) IGMD−ZFTHP−QR−GM (iter=1) IGMD−ZFTHP−QR−GM (iter=2) IGMD−ZFTHP−QR−GM (iter=4) IGMD−ZFTHP−QR−HM (iter=1) IGMD−ZFTHP−QR−HM (iter=2) IGMD−ZFTHP−QR−HM (iter=4) GMD−ZFTHP −3 −3 10 10 IGMD−SVD−AM IGMD−SVD−GM IGMD−SVD−HM IGMD−intrlv−SVD−AM IGMD−intrlv−SVD−GM IGMD−intrlv−SVD−HM −4 10 0 −4 10 5 10 15 0 5 10 Iterations Eb/N0 (dB) (a) (a) 15 20 25 15 20 25 0 10 −1 10 −1 10 −2 10 BER MSE −2 10 −3 ZFTHP−VBQR IGMD−ZFTHP−VBQR−AM (iter=1) IGMD−ZFTHP−VBQR−AM (iter=2) IGMD−ZFTHP−VBQR−AM (iter=4) IGMD−ZFTHP−VBQR−GM (iter=1) IGMD−ZFTHP−VBQR−GM (iter=2) IGMD−ZFTHP−VBQR−GM (iter=4) IGMD−ZFTHP−VBQR−HM (iter=1) IGMD−ZFTHP−VBQR−HM (iter=2) IGMD−ZFTHP−VBQR−HM (iter=4) GMD−ZFTHP −3 10 10 IGMD−QR−AM IGMD−QR−GM IGMD−QR−HM IGMD−VBQR−AM IGMD−VBQR−GM IGMD−VBQR−HM −4 10 0 −4 10 5 10 15 0 5 10 Iterations Eb/N0 (dB) (b) (b) Fig. 1. MSE comparison of the diagonal elements of R under proposed IGMD using (a) SVD and interleaved-SVD and (b) QR and VB-QR as initialization. Fig. 2. BER performance of the proposed Iterative GMD algorithm in a 7×7 MIMO ZFTHP system using (a) QR (b) VB-QR as initialization. Figures 3(a) and 3(b) show the error rate of the proposed IGMD using standard SVD and interleaved SVD, respectively. When standard SVD is used, the IGMD-GM performs the best, followed by the IGMD-HM, and the IGMD-AM for sufficiently high SNR. On the contrary, when the interleaved SVD is used, the IGMD-HM performs much worse compared to the IGMD-AM and IGMD-GM. For most SNR region of practical interests in this setting, the IGMD-AM is comparable to the IGMD-GM for iteration number greater than 1. From Fig. 2(b) and Fig. 3(b), it is also observed that the proposed IGMD-intrlv-SVD-GM and IGMD-intrlv-SVD-AM achieve even better error rates than the IGMD-VBQR-GM and IGMDVBQR-HM after four iterations. B. Complexity Comparison To highlight the complexity advantages, we compare our proposed IGMD algorithm with the conventional GMD algorithm. The required Kth root algorithm in the conventional GMD is implemented as in Algorithm 1 [21], in which only elementary functions including comparison, addition, bit shifting, and multiplication are required. For general cases √ (A 6= 1), the output y in Algorithm 1 converges to K A through iteratively narrowing the search range bounded by M . The number of iterations n determines the number of binary digits of accuracy. To make a quantitative comparison in computational complexity, multiplications involved are taken into account. A typical 32-bit fixed-point representation ({sign, integer, fraction} = {1, 4, 27}) for channel matrix H is adopted. For hardware implementation, the dynamic range of the datapaths needs to be taken into consideration. As a first-order estimate, an N × M -bit multiplier can be regarded as M N -bit adders or as N M -bit adders, and an N -bit adder can be treated as N/16 16-bit adder(s) [27, 28]. Hence, the number of the atomic 16bit equivalent additions is used as our complexity metric for fair comparison. Table I shows the required complexity of the building blocks in the conventional GMD and the proposed IGMD algorithms. In the Kth root algorithm as shown in Algorithm 1, the dynamic range ofQthe multiplication increases drastically for K calculating A = i=1 σi and (y + M )K . For example, multiplications with output word length ranging from 64 to 224 7 −1 10 −2 BER 10 SVD IGMD−ZFTHP−SVD−AM (iter=1) IGMD−ZFTHP−SVD−AM (iter=2) IGMD−ZFTHP−SVD−AM (iter=4) IGMD−ZFTHP−SVD−GM (iter=1) IGMD−ZFTHP−SVD−GM (iter=2) IGMD−ZFTHP−SVD−GM (iter=4) IGMD−ZFTHP−SVD−HM (iter=1) IGMD−ZFTHP−SVD−HM (iter=2) IGMD−ZFTHP−SVD−HM (iter=4) GMD−ZFTHP −3 10 −4 10 0 5 10 15 20 25 Eb/N0 (dB) (a) −1 10 √ Algorithm 1: K A Algorithm using Binary Approximation [21] Input: A, K, n Output: y M = 1; if A < 1 then while A ≤ M K do M = M/2; end y = M; else if A > 1 then while A ≥ M K do M = M × 2; end y = M/2; for i = 1 : n do M = M/2; if (y + M )K ≤ A then y =y+M ; end end −2 BER 10 TABLE I C OMPLEXITY IN T ERMS OF N UMBER OF 16- BIT E QUIVALENT A DDITIONS SVD IGMD−ZFTHP−intrlv−SVD−AM (iter=1) IGMD−ZFTHP−intrlv−SVD−AM (iter=2) IGMD−ZFTHP−intrlv−SVD−AM (iter=4) IGMD−ZFTHP−intrlv−SVD−GM (iter=1) IGMD−ZFTHP−intrlv−SVD−GM (iter=2) IGMD−ZFTHP−intrlv−SVD−GM (iter=4) IGMD−ZFTHP−intrlv−SVD−HM (iter=1) IGMD−ZFTHP−intrlv−SVD−HM (iter=2) IGMD−ZFTHP−intrlv−SVD−HM (iter=4) GMD−ZFTHP −3 10 −4 10 0 5 10 15 20 25 Eb/N0 (dB) Arithmetic function QK √A = i=1 σi K A Algorithm [21] 2×2 SVDs/planar rotations diagonal swap operations CORDIC based sqrt Conventional 1674 1674n 2016 5124 − AM − − 2808l − − HM − − 2808l − − GM − − 2808l − 198l n : Number of iterations performed in the Kth root algorithm. (b) l : Number of iterations performed in the IGMD algorithm. Fig. 3. BER performance of the proposed Iterative GMD algorithm in a 7×7 MIMO ZFTHP system using (a) SVD (b) interleaved-SVD as initialization. bits are required to retain full precision for K = 7. After some proper truncation in word lengths, it follows that 1674×(n+1) QK 16-bit additions are required for calculating A = i=1 σi and (y + M )K with n iterations. Additional diagonal swap computations because of irregular data-dependent control flow are also required in the conventional GMD. For the 2 × 2 SVDs and planar rotations required in all GMD algorithms, we efficiently implemented them by using CORDICs where only constant multiplications are necessary for scaling operations. Through canonic signed digit (CSD) [29] coding, the scaling operation by 0.60725 (000000.101001̄001̄0) for CORDIC can be efficiently realized by shift-and-add operations. It follows that 2808 16-bit equivalent additions are required for the 2 × 2 SVDs and planar rotations in each iteration. For the GMbased IGMD algorithm, additional computations for calculating square root are necessary. It can be shown that a CORDICbased square-root requires 198 16-bit equivalent additions per iteration. Consequently, under our proposed implementation, IGMD-GM has slightly higher complexity comparing to the other IGMDs, whereas the IGMD-AM and the IGMD-HM essentially have the same complexity. Fixed-point simulations have been conducted to evaluate the MSE performance of the GMD algorithms with respect to computational complexity. Instead of directly implementing IGMD-HM by constructing the planar rotation so that the upper-left element is updated by ΩHM , we implement it by constructing the planar rotations so that the lower-right element is updated by ΩAM . Through this novel implementation, only shift-and-add operations are required in computing ΩAM which is computationally more efficient than direct implementation which requires square root and division operations in computing ΩHM . In Fig. 4(a) and 4(b), the MSE performance versus complexity of both QR-based (QR and VB-QR) and SVD-based (SVD and interleaved SVD) GMD algorithms has been simulated under a 7 × 7 i.i.d. Rayleigh fading channel. For fair comparison, we also implement the conventional GMD algorithm with two different initializations, namely the GMDQR and the GMD-SVD respectively. It is observed that in this simulation scenario both VB-QR and interleaved SVD provide substantial performance improvement when compared with their counterparts. The proposed IGMD algorithms are also observed to provide considerable performance advantages when compared with the conventional GMD algorithms when 8 0 10 0 10 −1 10 −1 MSE MSE 10 −2 10 GMD−QR IGMD−QR−AM IGMD−QR−GM IGMD−QR−HM IGMD−VBQR−AM IGMD−VBQR−GM IGMD−VBQR−HM GMD−QR IGMD−QR−AM IGMD−QR−GM IGMD−QR−HM IGMD−VBQR−AM IGMD−VBQR−GM IGMD−VBQR−HM −3 10 −3 10 −2 10 −4 0.4 0.6 0.8 1 1.2 Complexity 1.4 1.6 1.8 10 2 1500 2000 2500 4 x 10 (a) 3000 3500 Complexity 4000 4500 5000 (a) 0 10 0 10 −1 10 −1 MSE MSE 10 −2 GMD−SVD IGMD−SVD−AM IGMD−SVD−GM IGMD−SVD−HM IGMD−SVD−Intrlv−AM IGMD−SVD−Intrlv−GM IGMD−SVD−Intrlv−HM 10 −3 10 4000 6000 8000 −2 10 −3 10 −4 10000 Complexity 12000 14000 16000 10 GMD−SVD IGMD−SVD−AM IGMD−SVD−GM IGMD−SVD−HM IGMD−SVD−Intrlv−AM IGMD−SVD−Intrlv−GM IGMD−SVD−Intrlv−HM 1500 2000 2500 Complexity 3000 3500 4000 (b) (b) Fig. 4. Complexity comparison under 7 × 7 i.i.d. Rayleigh fading channel: (a) QR-based and (b) SVD-based. Fig. 5. Complexity comparison under 4 × 4 i.i.d. Rayleigh fading channel: (a) QR-based and (b) SVD-based. operated in the low complexity region. In Figs. 5(a) and 5(b), the same complexity comparison is performed under a 4 × 4 i.i.d. Rayleigh fading channel. Similar to the case in the 7 × 7 i.i.d. Rayleigh fading channel, the proposed IGMD algorithms outperform the conventional GMDs in the low complexity region. In addition, the proposed IGMDs appear to have larger performance gain when the problem dimension is smaller. This is because the spread on the diagonal of R is generally smaller in lower dimension problems and hence the proposed IGMDs can be very efficient in this case as they only require a small number of iterations to achieve the desired precision. In Fig. 6(a) and 6(b), the complexity of proposed algorithms are compared in a 7 × 7 correlated Rayleigh fading channel. Uniform linear arrays at both transmit and receive sides with antenna spacing of 0.3λ have been considered, and the correlated channel is generated using the typical Kronecker model. It can be observed from the figures that the proposed IGMDs become less efficient in the correlated channel when compared with the conventional GMDs. This is because the diagonal of R generally has larger spread when the channel is more correlated, and hence it generally takes more iterations for the proposed IGMDs to achieve the desired precision. From the complexity analysis, it is also observed that one common characteristic for the SVD, QR, and QR-based IGMDs is that the HM usually has better performance, followed by the GM, and then the AM. This characteristic is related to the fact that the SVD, QR, and VB-QR initializations all tend to have larger elements on the upper left corner when compared with the lower right corner on the diagonal of R. As the AM mapping also generates larger element on the upper-left corner and smaller element on the lower-right corner because of the AM-HM inequality, when applying these initializations to IGMD-AM, larger elements are then discouraged from being averaged with smaller elements as the algorithm proceeds and hence AM construction is expected to take more iterations to achieve the same performance. This characteristic also explains why IGMD-HM tends to have better performance when SVD, QR, and VB-QR initializations are used. For interleaved-SVD initialization, the performance of IGMDs is very difficult to characterize as it depends on the mapping and also the interleave pattern. In addition, 9 ACKNOWLEDGMENT 0 10 The authors would like to thank Prof. Chiu-Chu Melissa Liu for the helpful discussion and the anonymous reviewers for their valuable suggestions. −1 10 MSE R EFERENCES −2 10 GMD−QR IGMD−QR−AM IGMD−QR−GM IGMD−QR−HM IGMD−VBQR−AM IGMD−VBQR−GM IGMD−VBQR−HM −3 10 0.4 0.6 0.8 1 Complexity 1.2 1.4 1.6 1.8 4 x 10 (a) 0 MSE 10 −1 10 GMD−SVD IGMD−SVD−AM IGMD−SVD−GM IGMD−SVD−HM IGMD−SVD−Intrlv−AM IGMD−SVD−Intrlv−GM IGMD−SVD−Intrlv−HM −2 10 0.4 0.6 0.8 1 1.2 Complexity 1.4 1.6 1.8 2 4 x 10 (b) Fig. 6. Complexity comparison under 7 × 7 correlated Rayleigh fading channel: (a) QR-based and (b) SVD-based. it is observed that although the proposed interleave pattern provides significant gain in the 7 × 7 i.i.d. Rayleigh fading channel, it does not always outperform the plain SVD in some other scenarios. This suggests a good interleave pattern not only depends on the types of algorithm but also depends on the data (channel matrix). A more rigorous treating on the performance characterization is beyond the scope of this article and will be left for our future research. VI. C ONCLUSION A new algorithm for computing the geometric mean decomposition is proposed. We prove analytically that the proposed IGMD is guaranteed to converge to the exact GMD under certain sufficient conditions and present three different constructions. The proposed IGMD algorithm does not require computing Kth root at the pre-processing stage and has a regular structure that is easily scalable for different problem dimensions. These advantages lead to a more efficient hardware design when operated in the low complexity regime and have been verified from extensive numerical simulations. [1] E. Telatar, “Capacity of multi-antenna gaussian channels,” Europ. Trans. Telecommu., vol. 10, no. 6, pp. 585–595, Nov.-Dec. 1999. [2] G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multiple antennas,” Bell Labs Tech. Journal, vol. 1, no. 2, pp. 41–59, 1996. [3] G. G. Raleigh and J. M. Cioffi, “Spatio-temporal coding for wireless communication,” IEEE Trans. Commun., vol. 46, no. 3, pp. 357–366, Mar. 1998. [4] Y. Jiang, J. Li, and W. Hager, “Joint transceiver design for mimo communications using geometric mean decomposition,” IEEE Trans. Signal Process., vol. 53, no. 10, pp. 3791–3803, Oct. 2005. [5] Y. Jiang, J. Li, and W. W. Hager, “Uniform channel decomposition for MIMO communications,” IEEE Trans. Signal Process., vol. 53, no. 11, pp. 4283–4294, Nov. 2005. [6] J.-K. Zhang, A. Kavčić, and K. M. Wong, “Equal-diagonal QR decomposition and its application to precoder design for successive-cancellation detection,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 154–172, Jan. 2005. [7] Y. Jiang, W. Hager, and J. Li, “The geometric mean decomposition,” Linear Algebra and its Applications, vol. 396, pp. 373–384, Feb. 2005. [8] S. Lin, W. W. L. Ho, and Y.-C. Liang, “Block diagonal geometric mean decomposition (BD-GMD) for MIMO broadcast channels,” IEEE Trans. Wireless Commun., vol. 7, no. 7, p. 2778, Jul. 2008. [9] F.-S. Tseng and W.-R. Wu, “Joint source/relay precoders design in amplify-and-forward relay systems: A geometric mean decomposition approach,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process. (ICASSP), Taipei, Apr. 2009, pp. 2641–2644. [10] F. Liu, L. Jiang, and C. He, “Advanced joint transceiver design for block diagonal geometric mean decomposition based multiuser MIMO system,” IEEE Transactions Vehicular Technology, vol. 59, no. 2, pp. 692–703, Feb. 2010. [11] C.-C. Weng and P. P. Vaidyanathan, “Block diagonal GMD for zeropadded MIMO frequency selective channels,” IEEE Trans. Signal Process., vol. 59, no. 2, pp. 713–727, Feb. 2011. [12] C.-H. Liu and P. P. Vaidyanathan, “Generalized geometric mean decomposition and DFE transceiver design—part i: Design and complexity,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 3112–3123, Jun. 2012. [13] ——, “Generalized geometric mean decomposition and DFE transceiver design—part ii: Performance analysis,” IEEE Trans. Signal Process., vol. 60, no. 6, pp. 3124–3133, Jun. 2012. [14] W. C. Kan and G. E. Sobelman, “MIMO transceiver design based on a modified geometric mean decomposition,” in Proc. IEEE Int. Symp. Circuits Syst., New Orleans, LA, May 2007, pp. 677–680. [15] W.-D. Chen and Y.-T. Hwang, “A constant throughput geometric mean decomposition scheme design for wireless MIMO precoding,” IEEE Trans. Veh. Tech., vol. 62, no. 5, pp. 2080–2090, Jun. 2013. [16] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A. Valenzuela, “V-BLAST: An architecture for realizing very high data rates over the rich-scattering wireless channel,” in Proc. URSI International Symposium on Signals Systems and Electronics, 1998, pp. 295–300. [17] G. D. Golden, G. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky, “Detection algorithm and initial laboratory results using V-BLAST space time communications architecture,” Electronic Letters, vol. 35, no. 1, pp. 14–16, Jan. 1999. [18] P. K. Meher, J. Vallas, T.-B. Juang, K. Sridharan, and K. Maharatna, “50 years of CORDIC: Algorithms, architectures, and applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 9, pp. 1893–1907, Sep. 2009. [19] P. P. Vaidyanathan, S.-M. Phoong, and Y.-P. Lin, Signal Processing and Optimization for Transceiver Systems. New York: Cambridge University Press, 2010. [20] K. E. Atkinson, An Introduction to Numerical Analysis, 2nd ed. New York: Wiley, 1989. [21] Rosettacode.org, “Nth root,” http://rosettacode.org/wiki/Nth root, accessed: 2014-08-23. [22] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore and London: The Johns Hopkins University Press, 1996. 10 [23] H. Weyl, “Inequalities between two kinds of eigenvalues of a linear transformation,” Proc. of the National Academy of Sciences of the United States of America, vol. 35, no. 7, pp. 408–411, 1949. [24] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and Its Applications. New York: Academic, 1991. [25] W. Rudin, Principles of Mathematical Analysis, 3rd ed. McGraw-Hill, 1976. [26] C.-T. Lin and W.-R. Wu, “QRD-based antenna selection for ML detection of spatial multiplexing MIMO systems: Algorithms and applications,” IEEE Trans. Veh. Tech., vol. 60, no. 7, pp. 3178–3191, Sep. 2011. [27] J. Rabaey, A. Chandrakasan, and B. Nikolić, Digital Integrated Circuits: A Design Perspective, 2nd ed. Prentice-Hall, 2003. [28] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Addison-Wesley, 2010. [29] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. Chiao-En Chen (M’ 05) was born in Kaohsiung, Taiwan in 1976. He received the B.Sc. and M.Sc. degrees in Electrical Engineering from National Taiwan University, Taipei, Taiwan in 1998 and 2000 respectively. From 2003 to 2008, he was with the Electrical Engineering Department at University of California, Los Angeles, where he received his Ph.D. degree. Since 2008, he joined both the Department of Electrical Engineering and the Department of Communications Engineering at National Chung Cheng University, Chiayi, Taiwan, and is currently an associate professor. His research interests include statistical signal processing and multiple-input-multiple-output (MIMO) communications. Dr. Chen was a co-recipient of the Best Paper Award in IEEE WCNC 2012, and a co-author of the book “Detection and Estimation for Communication and Radar Systems,” published by Cambridge University Press, 2013. Yu-Cheng Tsai received the B.S. degree from the Department of Electrical Engineering, National Chung Hsing University, Taichung, Taiwan. He is currently pursuing the M.S. degree in Electronics Engineering form National Chiao Tung University, Hsinchu, Taiwan. His research interests include algorithms development of signal processing and VLSI design for wireless baseband processing. Chia-Hsiang Yang (S’07-M’10) received his B.S. and M.S. degrees from the National Taiwan University, Taiwan, in 2002 and 2004, respectively, all in Electrical Engineering. He received his Ph.D. degree from the Department of Electrical Engineering of the University of California, Los Angeles in 2010. He then joined the faculty of the Electronics Engineering Department at the National Chiao Tung University, Taiwan, as an Assistant Professor. His current research interests include energy-efficient integrated circuits and architectures for biomedical and communication signal processing. Dr. Yang was a winner of the DAC/ISSCC Student Design Contest in 2010. He received the 2010-2011 Distinguished Ph.D. Dissertation in Circuits & Embedded Systems Award from the Department of Electrical Engineering, University of California, Los Angeles. In 2013, he was a co-recipient of the ISSCC Distinguished-Technical-Paper Award.
© Copyright 2026 Paperzz