954 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 6, AUGUST 2009 Turbo Coded Multiple-Antenna Systems for Near-Capacity Performance Yeong-Luh Ueng, Chia-Jung Yeh, Mao-Chao Lin, and Chung-Li Wang Abstract—For a turbo coded BLAST (Bell LAbs Space-Time architecture) system with Nt transmit antennas and Nr receive antennas, there is a significant gap between its detection threshold and the capacity in case Nt > Nr . In this paper, we show that by introducing a convolutional interleaver with block delay between the BLAST mapper and the turbo encoder, the threshold can be improved. Near-capacity thresholds can be achieved for some cases. To take advantage of the low detector complexity in Alamouti STBC (space-time block code), we also investigate a STBC system, which is the concatenation of the Alamouti STBC with a turbo trellis coded modulation. By using a proper labelling and adding a convolutional interleaver with block delay to such a STBC system, we achieve both lower error floors and lower thresholds. Index Terms—Iterative decoding, iterative detection, multipleinput multiple-output (MIMO), turbo codes, turbo principle, Bell Labs Space-Time architecture (BLAST), space-time block code (STBC). I. I NTRODUCTION ULTI-input multi-output (MIMO) systems with Nt transmit antennas and Nr receive antennas are attractive for their capability to achieve higher data rates. In [1], a MIMO system called BLAST (Bell LAbs Space-Time architecture) has been proposed to provide spatial multiplexing for achieving higher data rates. BLAST mapper can be serially concatenated with an outer channel encoder to obtain time diversity in the fast fading channel [2]-[9]. In [5], the channel code used is a turbo code and the resultant scheme is a turbo coded BLAST system. It was noted in [5] that, in the fast fading channel, the extrinsic information transfer (EXIT) [32] curve of the MIMO detector is not flat for the case of Nt > Nr even if Gray mapping is employed. However, the EXIT curve of a turbo code is almost a horizontal line and the code EXIT curve is, therefore, poorly matched to the detector EXIT curve. Hence, the decoding thresholds are distant from the capacities for the case of Nt > Nr [5]. For either irregular low-density parity-check (LDPC) [38] or irregular repeat accumulate (IRA) [39] coded BLAST systems, M Manuscript received 30 September 2008; revised 16 January 2009. This work was supported by National Science Council of the R.O.C. under grants NSC 95-2221-E-007-035 and NSC 96-2221-E-002-092. Yeong-Luh Ueng is with the Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan, R.O.C. (e-mail:[email protected]). Chia-Jung Yeh is with the Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. (email:[email protected]). Mao-Chao Lin is with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, R.O.C. (e-mail:[email protected]) Chung-Li Wang is with the Department of Electrical and Computer Engineering University of California, Davis, CA, USA (email:[email protected]). Digital Object Identifier 10.1109/JSAC.2009.090813. there is room for arranging the degree distributions of variable or check nodes to match the EXIT curve of the MIMO detector and hence such BLAST systems can achieve near-capacity performance [8][9]. However, the LDPC codes or IRA codes optimized for the multiple-antenna systems with Nt > Nr usually have many degree-2 nodes. Hence, in case of short block sizes, such LDPC coded BLAST (multiple-antenna) systems have high error floors [31]. Unlike the approaches used in [8][9], in this paper, we use an alternative approach to design coded BLAST systems for near-capacity performance. We introduce a convolutional interleaver [37] with block delay between the BLAST mapper and the binary turbo encoder of the turbo coded BLAST system in [5]. We show that, the EXIT curves of detector and turbo decoder in the proposed system can match well for the cases of Nt = 2 with Nr = 1 or 2. Hence, near-capacity performance can be achieved. For some other cases, our system can still achieve better decoding thresholds as compared to the system in [5]. In the delay diversity scheme of [33][34], copies of the same symbols are transmitted through multiple antennas at different times to provide spatial diversity to combat fading for reliable communication. In [44][45], channel coding is integrated into the delay diversity scheme to provide coding gain in addition to the diversity gain. The delay schemes in [33][34][44][45] can be viewed as a convolutional interleaver with symbol delay rather than block delay. Moreover, the multiple-antenna systems in [33][34][44][45] are not designed to achieve near-capacity performance. For MIMO systems, space-time block codes (STBC) [10][11] and space-time trellis codes (STTC) [12] can also be used to provide spatial diversity. For STBC, there is a subclass called orthogonal STBC, which has the advantage of low detector complexity. The orthogonal STBC with Nt = 2 is the famous Alamouti code [10]. Orthogonal STBC can be concatenated with a bandwidth-efficient outer code such as trellis coded modulation (TCM) [13], bit-interleaved coded modulation (BICM) [14]-[18], turbo TCM (TTCM) [19]-[23], or LDPC-based coded modulation [24] to enhance its coding gain or diversity gain [25]-[31][42]. It was demonstrated in [31][43] that for the MIMO detector consisting of an Alamouti STBC decoder and a demapper of Gray-labelled modulation, the associated EXIT curve is almost flat. Hence, it is possible to design a STBC system which is the concatenation of a turbo trellis coded modulation and the Alamouti STBC [29] using Gray-labelled modulation to achieve a good decoding threshold since the EXIT curves of the MIMO detector and the binary turbo decoder can match well. In a way similar to that for turbo coded BLAST, we propose c 2009 IEEE 0733-8716/09/$25.00 UENG et al.: TURBO CODED MULTIPLE-ANTENNA SYSTEMS FOR NEAR-CAPACITY PERFORMANCE to insert a convolutional interleaver with block delay between the binary turbo encoder and signal mapper of the turbo coded Alamouti STBC system in [29]. However, unlike the scheme in [29], mixed labeling [16] instead of Gray labeling is employed in the signal mapper of the proposed system so that lower error floor can be obtained. The detector EXIT curve of Alamouti STBC with Gray labeling is close to horizontal while the detector EXIT curve of Alamouti STBC with mixed labeling is not. However, the block delay operation in the proposed system can make the detector EXIT curve close to horizontal even if mixed labeling is used. Hence, the EXIT curves of detector and turbo decoder in the proposed system can match well. In [28], a convolutional interleaver is applied to the Alamouti STBC in conjunction with a tail-biting TCM. The tail-biting TCM design in [28] is not suitable for constructing multiple-antenna systems with good thresholds at long code lengths. We can apply the turbo principle to the proposed turbo coded multiple-antenna systems by iteratively performing decoding and detection between the turbo decoder and the MIMO detector. For the turbo coded BLAST system, the MIMO detector is a BLAST demapper while for the turbo coded STBC system, the MIMO detector consists of a signal demapper and a STBC decoder. The soft output of the turbo decoder can be used to update the log-likelihood ratio (LLR) output of the MIMO detector. Since the introduction of delay elements in the proposed systems, the adjacent turbo codewords are correlated and hence optimum decoding is very difficult. We can resort to suboptimum decoding. The simplest is that the MIMO detector and the turbo decoder exchange extrinsic information within each single turbo codeword. Such a decoding method is called iterative decoding within a single codeword (IDSC). We can improve the error performance by exchanging extrinsic information between adjacent turbo codewords. Such a decoding method is called iterative decoding between adjacent codewords (IDAC). Two types of IDAC with different decoding delays and complexities will be investigated. The remainder of this paper is organized as follows. The turbo coded BLAST system in [5] and the proposed turbo coded BLAST system are described in Sections II and III, respectively. A turbo coded Alamouti STBC system in [29] and the proposed turbo coded Alamouti STBC system are discussed in IV. This paper concludes in Section V. II. A T URBO C ODED BLAST S YSTEM In this section, we review a turbo coded BLAST system [5], for which the transmitter is implemented by serially concatenating an interleaver and a BLAST mapper to a binary turbo encoder. 955 [b̂0 , · · · , b̂Nt −1 ]T = [bo , · · · , bK−1 ]T denote such a group, where each b̂i is a binary m-tuple, bk ∈ {0, 1}, and T denotes the transpose. The output of the BLAST mapper is represented by the symbol vector s̄ = [s0 , · · · , sNt −1 ]T , where si is a constellation point labelled by b̂i and is transmitted through the (i + 1)-th transmit antenna. Let Es be the average energy of s̄. Furthermore, we require that E[|si |2 ] = Es /Nt for i = 0, 1, · · · , Nt − 1. In this paper, we consider the time required for the transmission of N code bits as one block unit. There are q MIMO transmissions (channel uses) within one block unit and for each MIMO transmission (channel use), K code bits are transmitted. B. Channel Model In this paper, we consider the Rayleigh fading channel. Elements of the Nr × Nt channel matrix H are independently and identically distributed zero-mean complex Gaussian random variables with independent real and imaginary parts each having variance of 0.5. We assume fast fading. Hence, the channel matrices H at different time instants are independent. In addition, it is assumed that H is unknown to the transmitter and is known perfectly to the receiver. Let n̄ be an Nr -tuple consisting of Gaussian entries with covariance matrix Q= E[n̄∗ n̄] =N0 INr , where n̄∗ denotes the conjugate transpose of n̄ and INr is a Nr × Nr identity matrix. Let rj be the received signal of the (j + 1)-th receive antenna. We have r̄ = H s̄ + n̄, where r̄ = [r0 , · · · , rNr −1 ]T . The normalized signal-to-noise ratio (SNR) Eb /N0 is defined as Nr Eb Es N0 |dB = N0 |dB + 10 log10 RNt m [8]. C. MIMO Detector: BLAST Demapper Let c̄i be the (K − 1)-bit binary representation of i and b̄k− = [b0 , · · · , bk−1 , bk+1 , · · · , bK−1 ]T . For each code bit bk , r(bk =1) be the a k = 0, 1, · · · , K − 1, let LM,a (bk ) = ln PP r(b k =0) priori LLR and L̄M,a (b̄k− ) = [LM,a (b0 ), · · · , LM,a (bk−1 ), LM,a (bk+1 ), · · · , LM,a (bK−1 )]T . (1) From the channel model, we have the conditional probability density function P r(r̄ | H, s̄) = ∗ −1 1 For a BLAST det(2πQ)−1/2 e− 2 (r̄−H s̄) Q (r̄−H s̄) . demapper, the a posteriori LLR for each code bit bk is given by LM,p (bk | r̄) = LM,a (bk ) + ln y(1) y(0) (2) A. Transmitter where The w-bit message ū is encoded by a rate-R binary turbo encoder to yield a turbo codeword of N code bits, where N = qK, K = mNt , and 2m is the constellation size for each transmit antenna. Since an interleaver is inserted between the turbo encoder and the BLAST mapper, the turbo codeword is interleaved before being divided into q groups. Let b̄ = 2K−1 −1 r̄−H·map([(c̄i )0:k−1 l (c̄i )k:K−2 ])2 exp(− ) y(l) = i=0 2N0 exp[c̄i · L̄M,a (b̄k− )], l = 0, 1, map(·) denotes the BLAST mapping of the associated mNt -tuple, and (c̄i )a:b denotes the (b − a + 1)-tuple containing the a-th to the b-th components of c̄i . 956 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 6, AUGUST 2009 0.6 l l (u ,p )] 0.5 Detector, (Nt , Nr ) = (2, 1), Ia (v̄2 ,+1) =0, propos ed 0.3 Detector, (Nt , Nr ) = (2, 1), Ia (v̄2 ,+1) =0.5, propos ed Detector, (Nt , Nr ) = (2, 1), Ia (v̄2 ,+1) =1, propos ed 0.2 I M,e l l (u ,p ) [I D,a 0.4 Detector, (Nt , Nr ) = (2, 1), original Detector, (Nt , Nr ) = (1, 1), original 0.1 Detector, (Nt , Nr ) = (4, 1), original Decoder 0 0 0.1 0.2 0.3 0.4 I 0.5 (u ,p ) [I M,a l l 0.6 0.7 0.8 0.9 1 (u ,p )] D,e l l Eb Fig. 1. EXIT curves of MIMO detector (BLAST demapper at N = 3.9 0 dB) and turbo decoder (NI = 20) for the proposed turbo coded BLAST system and the original turbo coded BLAST system in [5]. D. Iterative Detection and Decoding The receiver consists of a turbo decoder and a MIMO detector. The MIMO detector and the turbo decoder iteratively exchange extrinsic information by the turbo principle. The soft-in/soft-out MIMO detector computes the a posteriori values LM,p and then sends LM,e = LM,p − LM,a to the turbo decoder for the code bits representing s̄, where LM,e consists of channel information and extrinsic information. Then, LM,e is deinterleaved and then is taken as the a priori value LD,a of the decoder for further iterative decoding steps. Through the soft-in/soft-out decoding with NI iterations of the turbo decoder, we have LD,p and LD,e = LD,p − LD,a . After interleaving, LD,e is fed back to the MIMO detector and will be used as LM,a . With updated LM,a , the detector will update its LLR output, LM,e . E. EXIT Curves for BLAST Demapper and Turbo Decoder The convergence performance of iterative detection and decoding can be analyzed by using EXIT charts based on mutual information [5]. Let Ia [ck , La (ck )] denote the mutual information between bit ck and its a priori L-value (LLR) La (ck ). Similarly, Ie [ck , Le (ck )] denotes the mutual information between bit ck and its extrinsic L-value Le (ck ). In addition, we use the average mutual information a (c̄) and Ie (c̄) to denote NI−1 N −1 I [c , L (c )]/N and a k k=0 a k k=0 Ie [ck , Le (ck )]/N of bits in c̄ ≡ (c0 , c1 , · · · , cN −1 ), respectively. In the calculation of Ia (c̄) and Ie (c̄), N =2097152. Let (ū , p̄ ) denote the -th turbo codeword with N bits for which bits in (ū , p̄ ) are used as the labeling bits of the BLAST mapper, where ū and p̄ represent the interleaved message bits and the interleaved parity bits, respectively. Let IM,a ((ū , p̄ )) and IM,e ((ū , p̄ )) denote the a priori and extrinsic mutual information of the demapper, respectively. In addition, we denote the demapper extrinsic transform characteristic or EXIT curve of the turbo coded BLAST system in [5] by TM , i.e., IM,e = TM (IM,a , Eb /N0 , Nt , Nr ). We obtain TM by Monte Carlo simulation based on the assumption that the a priori values LM,a of the demapper are Gaussian distributed 2 2 and mean value σM,a /2 [5]. Fig. 1 shows with variance σM,a the demapper EXIT curves TM with Gray-mapped QPSK for the cases of (Nt , Nr ) = (4, 1), (2, 1), and (1, 1) respectively. We see that these curves resemble straight lines and meet at IM,a = 1. It can be shown that any 1 × Nr curve for Graymapped QPSK is a horizontal line [8][36]. For example, the 1 × 1 curve TM (Nt = 1, Nr = 1) is a horizontal line with Eb value of E[J( 8R N |h|2 )], where h is a zero mean, unit 0 variance, complex Gaussian random variable and function J(·) is given by ∞ exp[−(z − σ 2 /2)2 /2σ 2 ] √ log2 [1+exp(−z)]dz. J(σ) = 1− 2πσ 2 −∞ (3) Es Es 2 Note that J( 8 N0 ) and E[J( 8 N0 |h| )] are the capacities of BPSK signals over the 1 × 1 additive white Gaussian noise (AWGN) channel and Rayleigh fading channel, respectively [8]. Also included in Fig. 1 is the EXIT curve ID,e = TD (ID,a ) of a rate-1/2 turbo decoder, where ID,a ((ū , p̄ )) and ID,e ((ū , p̄ )) denote the a priori and extrinsic mutual information of the turbo decoder, respectively. The generator matrix of the 4-state constituent codes used in the turbo decoder is (1, 5/7)8 . It is obvious that TD is not a function of Eb /N0 . In contrast, an increase in Eb /N0 will result in a vertical shift of the demapper EXIT curves toward higher output values. From the area property described in [35][36], the areas 1 1 AM = 0 TM (IM,a )dIM,a and AD = 0 TD (ID,a )dID,a provide good approximations of I(s̄, r̄)/K and (1 − R), respectively, where I(s̄, r̄) is the mutual information between the input symbol s̄ and output symbol r̄ of the channel, and convergence of iterative detection and decoding is possible for AM > (1 − AD ), i.e., R < I(s̄, r̄)/K. Since an area gap between the inner demapper and outer decoder EXIT curves directly relates to a rate loss, both curves should be matched to each other to minimize this gap [36]. From the curves respectively labelled by ”Detector, (Nt , Nr ) = (2, 1), original” and ”Detector, (Nt , Nr ) = (4, 1), original” in Fig. 1, we see that the EXIT curve of the BLAST demapper is not flat for the case of Nt > Nr . However, the EXIT curve of a turbo code is almost a horizontal line [31] and the code EXIT curve is, therefore, poorly matched to the demapper EXIT curve. Hence, the rate loss is large and the threshold is distant from the capacity for the case of Nt > Nr [5]. Taking (Nt , Nr ) = (2, 1) as example, the capacity of such a MIMO system is 3.25 dB, while the threshold of this turbo coded BLAST system is 4.4 dB. III. A T URBO C ODED BLAST S YSTEM FOR N EAR -C APACITY P ERFORMANCE In the following, we propose to introduce a convolutional interleaver with block delay between the turbo encoder and the BLAST mapper to effectively flatten the demapper EXIT curves. The proposed BLAST system can achieve nearcapacity performance for (Nt , Nr ) = (2, 1) and (2,2) with R = 1/2. For many other cases, our system can achieve better decoding thresholds as compared to the original turbo coded BLAST system in [5]. In Sections III.A to III.F, we use the UENG et al.: TURBO CODED MULTIPLE-ANTENNA SYSTEMS FOR NEAR-CAPACITY PERFORMANCE Binary source interleaver Ȇ1 Turbo Encoder hard decision LD , p ( p" ) LD ,a (u" ) Turbo Decoder p" Ȇ2 LD , p (u" ) LD ,a ( p" ) + + u" v1, " DB BLAST Mapper v2, " Channel 3 11 3 1 2 LD ,e ( p" ) LD ,e (u" ) LM ,e (u" ) LM ,e ( p" ) Ȇ2 Ȇ1 LM , p (u" ) + + LM , p ( p" ) BLAST Demapper LM ,a ( p" ) LM ,a (u" ) memory Fig. 2. 957 LM , a (u" 1 ) Transmitter and receiver of the proposed turbo coded BLAST system. (Nt , Nr ) = (2, 1). case of (Nt , Nr ) = (2, 1), R = 1/2, and Gray mapped QPSK, i.e, m = 2, to illustrate the proposed system. The cases of other antenna configurations such as Nt = 4 and Nr > 1 will be discussed in Section III.G. A. System Description Fig. 2 shows the schematic diagram of the transmitter and receiver of the proposed system with (Nt , Nr ) = (2, 1), where DB is an one-block-unit delay operator. The number of delay elements in bits within one-block-unit delay is N/2, where N is the number of bits in each turbo codeword. Like the original system in [5], our system can transmit N code bits per block unit and transmit two message bits (or four code bits) per channel use. With the block-unit delay operator, the input to the BLAST mapper at the -th block unit is the block (v̄1, , v̄2, ) = (ū−1 , p̄ ), where ū−1 represents the interleaved message bits of the (−1)-th turbo codeword and p̄ represents the interleaved parity bits of the -th turbo codeword. Bits in v̄j, are used to label the QPSK signals transmitted through the j-th transmit antenna for j = 1, 2. The BLAST demapper tries to recover (ū−2 , p̄−1 ), (ū−1 , p̄ ), and (ū , p̄+1 ), · · · respectively, while the turbo decoder tries to recover (ū−2 , p̄−2 ), (ū−1 , p̄−1 ), and (ū , p̄ ), (ū+1 , p̄+1 ), · · · respectively. For the turbo decoder, efficiently decoding (ū , p̄ ) requires the information of (ū−1 , p̄ ) and (ū , p̄+1 ) passed from the BLAST demapper. For the BLAST demapper, efficiently demapping (ū−1 , p̄ ) requires the information of (ū−1 , p̄−1 ) and (ū , p̄ ) passed from the turbo decoder. With the application of delay operation (or equivalently the convolutional interleaver) to the turbo encoder and the BLAST mapper, the transmitter output at all block units are correlated. Similar phenomena can be observed in MIMO systems using delay diversity [33][34] or delay diversity in conjunction with channel coding [44][45]. For these systems, maximum likelihood sequence estimator (MLSE) or maximum likelihood decoding based on Viterbi decoder are employed to obtain optimum performance. Such detection or decoding requires the observation of a long MIMO sequence. For the proposed turbo coded BLAST system, the decoding trellis for optimum decoding is not available and the decoding delay will be too long even if the decoding trellis is available. In the following, we provide three suboptimum decoding methods, which do not require the observation of the long sequence. The first is iterative decoding within a single codeword (IDSC), the second and the third are both called iterative decoding between adjacent codewords, denoted IDAC-I and IDAC-II respectively. IDSC only requires the channel observation of (ū−1 , p̄ ) and (ū , p̄+1 ) to decode (ū , p̄ ). IDAC-I requires the channel observation of (ū−1 , p̄ ), (ū , p̄+1 ) and (ū+1 , p̄+2 ) to decode (ū , p̄ ). The required channel observation for IDAC-II is similar to that of IDSC. B. Iterative Decoding within a Single Codeword (IDSC) Now we present how to use IDSC to decode (ū , p̄ ) based on that LD,e (ū−1 ) has been obtained. Step 1 Through the BLAST demapper, we obtain the a posteriori LLR values, LM,p (p̄ ) computed by (2) with LM,a (ū−1 ) (or equivalently LD,e (ū−1 )) which is the LLR obtained in the decoding of the previous turbo codeword. Note that LM,a (ū−1 ) provides a half of the a priori LLR values for the demapper. The other half of a priori LLR values provided by LM,a (p̄ ) is zero. Step 2 By a way similar to Step 1, we obtain the a posteriori LLR values, LM,p (ū ) computed by (2) with LM,a (ū ) = LM,a (p̄+1 ) = 0̄, where 0̄ is the all zero N 2 -tuple. Step 3 The turbo decoder uses LD,a (ū ) (or equivalently LM,e (ū )) and LD,a (p̄ ) (or equivalently LM,e (p̄ )) as input to yield the a posteriori LLR output values, LD,p (ū ) and LD,p (p̄ ), after NI iterations within the turbo decoder. Step 4 The turbo decoder computes the LLR values in LD,e (ū ) = LD,p (ū ) − LD,a (ū ), LD,e (p̄ ) = LD,p (p̄ ) − LD,a (p̄ ) respectively which are then interleaved to become the LLR values in LM,a (ū ) and LM,a (p̄ ) that are fed into the BLAST demapper. Step 5 Through the BLAST demapper with the LLR values, LM,a (p̄ ) and LM,a (ū ), obtained in Step 4, LM,a (ū−1 ) (or equivalently LD,e (ū−1 )) obtained in decoding the previous turbo codeword, and LM,a (p̄+1 )=0̄, we update the a posteriori LLR values, LM,p (p̄ ) and LM,p (ū ), using (2). 958 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 6, AUGUST 2009 Step 6 Compute LM,e (ū ) = LM,p (ū ) - LM,a (ū ) and LM,e (p̄ ) = LM,p (p̄ ) - LM,a (p̄ ). Step 7 If the number of iterations between the turbo decoder and BLAST demapper is below the maximum limit NO , then go to step 3. Otherwise, we estimate ū through decision on LD,p (ū ) and store the LLR values in LD,e (ū ) which will be used in the detection of next block. C. Demapper EXIT Curves Now we investigate the demapper EXIT curves for the proposed system with (Nt , Nr ) = (2, 1). For our system, we must consider bits in v̄1,+1 = ū and v̄2, = p̄ for the calculation of demapper EXIT curves. In addition, the calculation is based on Ia (v̄1, ) = Ia (ū−1 ) = 1 and Ia (v̄2,+1 ) = Ia (p̄+1 ) = x, 0 ≤ x ≤ 1. For IDSC, Ia (v̄2,+1 )=0. Here, we also examine the effect of Ia (v̄2,+1 ) with various values. In addition, we use IM,e ((ū , p̄ )) to denote the average of IM,e (ū ) and IM,e (p̄ ) since the statistics of IM,e (ū ) and IM,e (p̄ ) are different. Denote the demapper EXIT function of the proposed system by TM , i.e., IM,e =TM (IM,a ). Suppose that Ia (v̄2,+1 )=1. For p̄ , we can completely cancel the interference from ū−1 and obtain LLR values LM,e (p̄ ) by using the assumption of Ia (ū−1 ) = 1, i.e, ū−1 is known. In this condition, the channel encountered by p̄ can be regarded as an 1 × 1 channel. Hence, IM,e (p̄ ) = TM (Nt = Eb |h|2 )]. By applying the same 1, Nr = 1) = E[J( 8R N 0 argument to ū based on Ia (v̄2,+1 )=1, we have IM,e (ū ) Eb |h|2 )]. Taking the average of IM,e (p̄ ) and = E[J( 8R N 0 Eb |h|2 )], IM,e (ū ), we have IM,e ((ū , p̄ )) = E[J( 8R N 0 which is independent of IM,a ((ū , p̄ )). Hence, the demap for Ia (v̄2,+1 ) = 1 is horizontal. This per EXIT curve TM phenomenon is very encouraging. We need to examine the condition that Ia (v̄2,+1 ) is less than 1. However, we can not obtain the close-form representation of IM,e (ū ) and hence IM,e ((ū , p̄ )) for Ia (v̄2,+1 ) other than 1. Instead, we obtain IM,e ((ū , p̄ )) through Monte Carlo simulation [5]. Fig. 1 shows TM for Ia (v̄2,+1 ) = 1, 0.5 and 0 respectively. We see that TM is almost horizontal for any investigated value of Ia (v̄2,+1 ). Hence, introducing delay elements between the turbo encoder and the BLAST mapper can effectively flatten the EXIT curves of the BLAST demapper. From Fig. 1, we also observe that larger Ia (v̄2,+1 ) results in larger IM,e ((ū , p̄ )). This result fits our intuition that more information provided by v̄2,+1 which is part of the adjacent turbo codeword (ū+1 , p̄+1 ) can provide more information helpful to the decoding of the current turbo codeword (ū , p̄ ). In the following section, we provide two practical methods for obtaining information of v̄2,+1 . D. Iterative Decoding between Adjacent Codewords (IDAC): IDAC-I and IDAC-II We now propose two methods of obtaining information of v̄2,+1 . For the first method, information of v̄2,+1 is obtained by turbo decoding all the coded bits of the adjacent turbo codeword, i.e., (p̄+1 , ū+1 ). For the second method, information of v̄2,+1 is obtained by convolutional decoding some coded bits of (ū+1 , p̄+1 ). The value of Ia (v̄2,+1 ) obtained by using the first method is in general larger than that obtained by using the second method. For example, the values of Ia (v̄2,+1 ) at Eb /N0 = 3.9 dB obtained by using the first and the second methods are 0.132 and 0.098, respectively. On the other hand, we can obtain improved LLR values LM,e (v̄2,+1 ) (or equivalently LD,a (v̄2,+1 )) by cancelling the interference from v̄1,+1 if information of v̄1,+1 , i.e., Ia (v̄1,+1 ) > 0, is available. Information of v̄1,+1 is obtained by using IDSC to decode (ū , p̄ ). We can repeat such procedures to gradually obtain larger values of Ia (v̄1,+1 ) and Ia (v̄2,+1 ) and hence improve the error performance of (ū , p̄ ). Such a decoding method is called iterative decoding between adjacent codewords (IDAC). IDAC using the first (second) method to obtain the information of v̄2,+1 is called IDAC-I (IDAC-II). IDACI can provide better error performance at the cost of longer decoding delay and higher complexity as compared to IDACII. Based on that LD,e (ū−1 ) has been obtained, IDAC-I is described as follows. Step 1 With LD,e (ū ) = 0̄, we use IDSC with NO iterations to decode (ū+1 , p̄+1 ) and obtain LD,e (p̄+1 ). Note that, in the demapper, the calculation of LM,p (p̄+1 ) is based on LM,a (ū ) = LM,a (p̄+1 ) = 0̄. Step 2 With LD,e (ū−1 ) and updated LD,e (p̄+1 ), we use a modified IDSC with NO iterations to decode (ū , p̄ ) and obtain LD,e (ū ) and LD,e (p̄ ), where IDSC is modified such that at the beginning LM,a (p̄+1 ) (or equivalently LD,e (p̄+1 )) is not zero. Step 3 With updated LD,e (ū ), we use IDSC with NO iterations to re-decode (ū+1 , p̄+1 ) and update LD,e (p̄+1 ). Step 4 Repeat Step 2 with updated LD,e (p̄+1 ) obtained in Step 3. Step 5 After repeating Steps 3 and 4 for NIDAC − 1 times, we can decode (ū , p̄ ) and obtain LD,e (ū ). In order that we can apply algorithm IDAC-II, the encoding must be somewhat modified. We replace the p̄ in Fig. 2 by the first N/2 code bits of the first rate-2/3 component code, RSC1, of the turbo code and replace the ū in Fig. 2 by the other N/2 code bits of the turbo code. In this way, v̄2, is a codeword of a rate-2/3 convolutional code and can be decoded by using the trellis of RSC1. Note that v̄2, contains only the first N/2 bits of RSC1. Based on that LD,e (v̄1, ) has been obtained, IDAC-II is described as follows. Step 1 With LM,a (v̄1,+1 ) = LM,a (v̄2,+1 ) = 0̄, we use the BLAST demapper to obtain the a posteriori LLR values, LM,p (v̄2,+1 ), according to (2). The MAP (maximum a posteriori) decoder of RSC1 uses LD,a (v̄2,+1 ) (or equivalently LM,e (v̄2,+1 )=LM,p (v̄2,+1 )-LM,a (v̄2,+1 )) as input to yield LD,p (v̄2,+1 ) and LD,e (v̄2,+1 ). The values in LD,e (v̄2,+1 ) (or equivalently LM,a (v̄2,+1 )) are fed into the BLAST demapper. After NO iterations between the MAP decoder of RSC1 and the BLAST demapper, we obtain the desired LD,e (v̄2,+1 ). Step 2 With LD,e (v̄1, ) and updated LD,e (v̄2,+1 ), we use a UENG et al.: TURBO CODED MULTIPLE-ANTENNA SYSTEMS FOR NEAR-CAPACITY PERFORMANCE 959 TABLE I T HRESHOLDS AND ACHIEVABLE BER FOR THE PROPOSED TURBO CODED BLAST SYSTEM AND THE ORIGINAL TURBO CODED BLAST SYSTEM IN [5]. N = 105 IS USED IN THE BER SIMULATION . C APACITIES FOR 2×1 AND 2×2 MIMO SYSTEMS ARE 3.25 D B AND 1.6 D B, RESPECTIVELY. Restricted IDSC IDSC Restricted IDAC-I IDAC-I Restricted IDAC-II IDAC-II Restricted IDSC IDSC Restricted IDAC-I IDAC-I Restricted IDAC-II IDAC-II Proposed 2x1 system Error performance Eb/No(dB) BER 4.21 3.13 × 10−5 4.12 8.37 × 10−7 3.88 7.90 × 10−7 3.86 7.40 × 10−7 4.05 1.24 × 10−6 4.01 1.61 × 10−6 Original 2x1 system Threshold Error performance (dB) Eb/No(dB) BER 4.4 5.00 1.20 × 10−6 Threshold (dB) 3.90 3.90 3.67 3.60 3.73 3.64 Proposed 2x2 system Error performance Eb/No(dB) BER 2.3 6.97 × 10−5 2.28 3.43 × 10−5 2.11 3.59 × 10−6 2.105 4.61 × 10−6 2.26 1.53 × 10−5 2.25 3.37 × 10−5 Original 2x1 system Threshold Error performance (dB) Eb/No(dB) BER 2.1 2.42 5.23 × 10−5 Threshold (dB) 1.99 1.99 1.88 1.86 1.98 1.94 modified IDSC with NO iterations to decode (v̄1,+1 , v̄2, ) and obtain LD,e (v̄1,+1 ) and LD,e (v̄2, ), where IDSC is modified such that at the beginning LM,a (v̄2,+1 ) (or equivalently LD,e (v̄2,+1 )) is not zero. Step 3 Repeat Step 1 with updated LD,e (v̄1,+1 ) (or equivalently LM,a (v̄1,+1 )). Step 4 Repeat Step 2 with updated LD,e (v̄2,+1 ). Step 5 After repeating Steps 3 and 4 for NIDAC − 1 times, we can decode (v̄1,+1 , v̄2, ) and obtain LD,e (v̄1,+1 ) and LD,p (ū ). In summary, LM,a (v̄2,+1 ) is obtained by turbo decoding of (v̄1,+2 , v̄2,+1 )=(ū+1, p̄+1 ) in IDAC-I while in IDAC-II, LM,a (v̄2,+1 ) is obtained by MAP decoding of the convolutional code v̄2,+1 . The advantage of IDAC-II over IDACI is that in decoding (ū , p̄ ), there is no need to refer to the channel output block containing v̄1,+2 . Throughout this paper, we use NI = 20, NO = 3, and NIDAC = 3, unless the parameters are otherwise specified. Fig. 3. EXIT curves of (ū , p̄ ) and (ū+1 , p̄+1 ) for the proposed turbo Eb = 3.64 dB. coded BLAST system using IDAC-I. Nt = 2, Nr = 1, N 0 between (ū , p̄ ) and (ū+1 , p̄+1 ). In fact, we only consider the information exchange between bits in v̄1,+1 = ū and v̄2,+1 = p̄+1 since v̄1,+1 and v̄2,+1 are correlated by the BLAST mapper. The EXIT charts of (ū , p̄ ) using IDACI are calculated based on Ia (v̄1, ) = 1 , Ia (v̄2,+2 ) = 0, and N =2097152. We run IDSC on (ū , p̄ ) using 20 iterations within the turbo decoder, i.e, NI = 20 to obtain Ie1 (v̄1,+1 ) for various Ia1 (v̄2,+1 ). Similarly, we run IDSC on (ū+1 , p̄+1 ) with NI = 20 to obtain Ie2 (v̄2,+1 ) for various Ia2 (v̄1,+1 ). Fig. 3 shows the EXIT charts for our system using IDAC-I with 3 iterations between the turbo decoder and the demapper, i.e., NO = 3. Following a similar method, we can obtain the EXIT charts for our system using IDAC-II which are similar to those of the case of IDAC-I. From Table I, we see that our system can achieve better thresholds as compared to the original system. For example, the threshold of our system using IDAC-I is 3.60 dB while the threshold of the original system is 4.4 dB. Note that the capacity of the 2 × 1 MIMO system is at Eb /No = 3.25 dB. Since the EXIT curves of the MIMO detectors of our MIMO systems are close-to-horizontal, we also investigate a reducedcomplexity version of IDSC for which the soft output of the turbo decoder is not used to update the LLR output of the MIMO demapper. This reduced-complexity version of IDSC is called restricted IDSC. In restricted IDSC, Steps 5 and 6 described in Section III.B are skipped and we use NO = 1 only. We can employ restricted IDSC in IDAC-I to obtain a reduced-complexity version of IDAC-I called restricted IDACI. Similarly, we can obtain restricted IDAC-II. Also included in Table I are the thresholds of our proposed system using restricted IDSC, restricted IDAC-I and restricted IDAC-II. We see that our system using these restricted versions can achieve satisfactory thresholds. E. Thresholds for Various Detection-Decoding Algorithms Table I summarizes the thresholds of the proposed system using IDSC, IDAC-I or IDAC-II and the original system. For IDSC, we consider information exchange between the demapper and the decoder based on the condition of Ia (v̄1, ) = 1 and Ia (v̄2,+1 ) = 0. This case has been discussed in Section III.C. For IDAC-I, we need to consider information exchange F. BER Results for Various Detection-Decoding Algorithms BER results for the proposed system and original system with long interleavers are shown in Fig. 4. Remember that throughout this paper, we use NI = 20, NO = 3, and NIDAC = 3, unless the parameters are otherwise specified. The results of achievable BER are summarized in Table I. To match 960 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 6, AUGUST 2009 Fig. 4. BER of the proposed turbo coded BLAST system (N = 105 ) and the original turbo coded BLAST system in [5] (N = 3 × 105 ). Nt = 2, Nr = 1. the thresholds derived by EXIT charts, we can replace the interleavers Π1 and Π2 of our system in Fig. 2 by a single interleaver Π which permutes all the turbo coded bits including ū and p̄ . However, from BER results, we find that our system using Π1 and Π2 achieves a slightly better error performance as compared to our system using Π. Hence, in the following, we use the transmitter shown in Fig. 2 for our system. Our system using the restricted version will result in only slightly worse error performance as compared to not using the restricted version. In contrast, the error performance of the original system in [5] with updating demapper is significantly better than that of not updating demapper since the slope of TM (Nt = 2, Nr = 1) is not flat and LM,e ((ū , p̄ )) can be improved by LM,a ((ū , p̄ )) which can be obtained by complex iterative demapping and decoding between the decoder and demapper. Hence, in the following, for the proposed system, we consider only the restricted version in the detection and decoding. However, for the original system in [5], we still use the detection and decoding exactly the same as the one provided in Section II. BER results for the proposed system and the original system with short interleavers are shown in Fig. 5. The decoding delays of our system using IDSC, IDAC-I, and IDAC-II are 2N , 3N , and 2N code bits, respectively. From Fig. 4 and Fig. 5, we see that IDAC-II can provide better error performance at the cost of higher complexity as compared to IDSC, while IDAC-I can provide better error performance at the cost of longer decoding delay and higher complexity as compared to IDAC-II. For a fair comparison based on the same decoding delay, we may compare the proposed system using IDAC-I to the original system with triple interleaver size. We see that the proposed system is superior to the original system at low SNR, which results from the superior behavior of convergence capability of our system as indicated in the EXIT charts. In particular, the superiority is very significant for the cases of long interleavers. With the triple interleaver size, the original system will have slightly better error performance at the error floor region as compared to the proposed system. Compared to the LDPC codes and RA codes optimized for multiple-antenna systems given in [8] and [9], respectively, Fig. 5. BER of the proposed turbo coded BLAST system using restricted IDSC, IDAC-I, and IDAC-II, and the original turbo coded BLAST system in [5]. Nt = 2, Nr = 1, our turbo coded BLAST system can have similar thresholds. In addition, our system can achieve slightly better BER performance. Taking the case of (Nt , Nr ) = (2, 1) as example, we find that our system using N = 105 and restricted IDAC-I can achieve a BER of 7.90 × 10−7 at Eb /N0 = 3.88 dB while the LDPC and RA coded BLAST systems using N = 105 can achieve a BER of 10−4 at Eb /N0 = 4.0 dB and 3.95 dB, respectively. G. Extensions to Other MIMO Configurations The thresholds and achievable BER for (Nt , Nr ) = (2, 2) are also given in Table I. As compared to the original system, the proposed system provides only slightly better thresholds. The reason can be easily explained by the associated EXIT curves which are not given here. Although TM (Nt = 2, Nr = 2) is not close-to-horizontal, the slope of TM (Nt = 2, Nr = 2) is lower than that of TM (Nt = 2, Nr = 1). Hence, the room of improvement in threshold by using delay elements for the case of (Nt , Nr ) = (2, 2) is not as large as the case of (Nt , Nr ) = (2, 1). Compared to the IRA codes optimized for multipleantenna systems given in [7], our system can achieve similar BER performance and threshold. From Table I, we find that our system using restricted IDAC-I and N = 105 can achieve a BER of 3.59 × 10−6 at Eb /No = 2.11 dB while the IRA coded MIMO in [7] with N = 2 × 105 can achieve similar BER at Eb /No = 2.2 dB. In addition, the thresholds for our system using restricted IDAC-I and IRA coded MIMO in [7] are 1.88 dB and 1.9 dB, respectively. Note that the capacity of the 2 × 2 MIMO system is at Eb /No = 1.6 dB. The transmitter for our system with Nt = 4 is similar to the case of Nt = 2 except for some differences. Now, we have K = mNt = 8 and q = N/K = N/8. In addition, the number of delay elements in bits within one-block-unit delay equals N/4. Equivalently, there are q = N/8 MIMO transmissions within one block unit. The turbo coded bits are divided into four streams for the four transmit antennas. Bits in both the first stream and the second stream are delayed by N/4 code bits, before being fed to the transmitted antennas. The resultant system is called Type-I system. From the analysis UENG et al.: TURBO CODED MULTIPLE-ANTENNA SYSTEMS FOR NEAR-CAPACITY PERFORMANCE Binary source u 961 Interleaver " Turbo Encoder p" Ȇ Demultiplexer v1, " v2," v3," 8PSK Signal Mapper STBC Encoder 8PSK Signal Demapper STBC Decoder Channel Decoded Output Turbo Decoder Deinterleaver -1 Ȇ Ȇ Multiplexer Demultiplexer Fig. 7. Transmitter and receiver of a turbo coded Alamouti STBC system in [29]. STBC [10], as inner codes. Therefore, we have Nt = 2. We will increase the constellation size and the rate of turbo code to compensate the rate loss due to the inner Alamouti STBC. Fig. 6. BER of the proposed turbo coded BLAST systems using restricted IDSC (N = 105 ) and the original turbo coded BLAST system in [5] (N = 3 × 105 ). Nt = 4, Nr = 1. of EXIT charts for (Nt , Nr ) = (4, 1) with R = 1/2, we see that the threshold of Type-I system using restricted IDSC is at Eb /N0 = 10.4 dB while the threshold of the original system is at Eb /N0 = 11.7 dB. Note that the capacity of such a MIMO system is 6.65 dB. Although the threshold of TypeI system is better than that of the original system for the case of (Nt , Nr ) = (4, 1), the threshold of Type-I system is still somewhat distant from the capacity. The reason can be easily explained by the associated EXIT curves, which are not given here. In case of (Nt , Nr ) = (4, 1), although the detector EXIT curve of Type-I system has been flattened as compared to that of the conventional system, the detector EXIT curve of Type-I system is not close to horizontal as in the case of (Nt , Nr ) = (2, 1). Hence, the EXIT curve of the detector with (Nt , Nr ) = (4, 1) can not match the closeto-horizontal EXIT curve of the turbo decoder well. We can further flatten the detector EXIT curve with (Nt , Nr ) = (4, 1) by introducing additional delay elements between the turbo encoder and the BLAST mapper as follows. The turbo coded bits associated with the j-th transmitted antenna are delayed by (4 − j)N/4 bits before being fed to the j-th transmitted antenna for j = 1, 2, 3, 4. The resultant system is called Type-II system. The resultant detector EXIT curves under the conditions of Ia (v̄2,+3 ) = Ia (v̄3,+2 ) = Ia (v̄4,+1 ) = x, Ia (v̄3,+3 ) = Ia (v̄4,+3 ) = Ia (v̄4,+2 ) = 0, and Ia (v̄1, ) = Ia (v̄2, ) = Ia (v̄3, ) = Ia (v̄1,+1 ) = Ia (v̄2,+1 ) = Ia (v̄1,+2 ) = 1 are all close-to-horizontal for the investigated cases of x=0, 0.5, and 1 respectively and hence can match the decoder EXIT curve well. The threshold of Type-II system using (Nt , Nr ) = (4, 1), restricted IDSC, i.e., x = 0, is 7.4 dB which is closer to the capacity as compared to Type-I system. The BER results shown in Fig. 6 verify the prediction obtained by the analysis of EXIT charts. IV. T URBO C ODED A LAMOUTI STBC S YSTEMS Since the Alamouti STBC has the advantage of low detector complexity, Alamouti STBC in conjunction with the channel coding is of practical significance as well. Now we investigate turbo coded multiple-antenna systems using the Alamouti A. A turbo coded Alamouti STBC system Let xi be a complex number representing a constellation point. The output of the Alamouti STBC encoder at antennas 1 and 2 for the input (x0 , x1 ) are (x0 , −x∗1 ) and (x1 , x∗0 ), respectively. The channel model is the same as that described in Section II.B except that the fading coefficients are constant over two consecutive MIMO transmissions and change independently from every two MIMO transmissions. For Nr = 1, the received signals at time instants 0 and 1 respectively denoted by r0 and r1 are [r0 , r1 ] = [h0 , h1 ]G+[n0 , n1 ], where h0 and h1 are channel coefficients from transmit antennas 1 x0 −x∗1 and 2 to the receive antenna, respectively, G = , x1 x∗0 and n0 and n1 are the AWGN at time instants 0 and 1 respectively. By utilizing the orthogonality of G, it is easy to verify that the equivalent channel model is given as z0 = h∗0 r0 + h1 r1∗ = (|h0 |2 + |h1 |2 )x0 + h∗0 n0 + h1 n∗1 z1 = h∗1 r0 − h0 r1∗ = (|h0 |2 + |h1 |2 )x1 − h0 n∗1 + h∗1 n0 . (4) According to (4), the symbols transmitted from two antennas can be separated at the receiver and hence the Alamouti STBC plays a key role of transforming a 2 × 1 channel into an 1 × 1 channel. The transmitter and receiver of the turbo coded Alamouti STBC system in [29] are shown in Fig. 7, where a rate-2/3 turbo code, an 8PSK signal mapper, and the Alamouti STBC are employed. Through such a system, we can transmit 2 bits per MIMO transmission. The rate-2/3 turbo code is obtained by uniformly puncturing the parity bits of a rate-1/3 turbo code for which the generator matrix of the constituent codes is (1, 5/7)8 . At the transmitter, we first de-multiplex the -th interleaved turbo codeword (ū , p̄ ) of size of N = 3w/2 bits into three sub-blocks v̄1, , v̄2, , and v̄3, of equal size of w/2 bits before been fed into the 8PSK signal mapper. At the -th block unit, bits in v̄1, , v̄2, , and v̄3, are used as the first-level, second-level, and third-level labeling bits of the 8PSK signals, respectively. We can apply the turbo principle to this system by iteratively performing decoding and detection between the turbo decoder and the MIMO detector which consists of the Alamouti STBC decoder and a MAP signal demapper. Due to the orthogonality of G, the output of the STBC decoder is 962 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 6, AUGUST 2009 0.9 0.85 IM,e(ul,pl) [ID,a(ul,pl)] 0.8 0.75 0.7 0.65 Detector, propos ed, mixed labeling, x = 0 0.6 Detector, propos ed, mixed labeling, x = 0.5 0.55 Detector, propos ed, mixed labeling, x = 1 Detector, original, Gray labeling 0.5 Detector, original, mixed labeling 0.45 0.4 Decoder 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 IM,a(ul,pl) [ID,e(ul,pl)] Eb Fig. 8. EXIT curves of MIMO detector ( N = 5.0 dB) and turbo decoder 0 for the proposed turbo coded Alamouti STBC system and an original turbo coded Alamouti STBC system in [29]. (Nt , Nr ) = (2, 1). not updated after sending z0 and z1 to the signal demapper. In other words, there is no loop between the STBC decoder and the turbo decoder. Fig. 8 shows the detector EXIT curves of the STBC system in [29] which is also illustrated in Fig. 7. Both the Gray labeling and mixed labeling in [16] are investigated. With mixed labeling, the eight sequential labels for 8PSK signal points are {000, 100, 010, 110, 011, 111, 001, 101}. Also included in Fig. 8 is the decoder EXIT curve of the rate-2/3 turbo code. We see that the detector EXIT curve with Gray labeling is almost flat and can match the close-to-horizontal EXIT curve of the turbo decoder. However, the detector EXIT curve with mixed labeling is not flat and can not match well with the decoder EXIT curve. Hence, this STBC system with Gray labeling can achieve a lower threshold as compared to the same STBC system using mixed labeling. On the other hand, we see that using mixed labeling can achieve a lower error floor as compared to using Gray labeling from the BER results shown in Fig. 9, where NO = 3 and NO = 5 are respectively used in the cases of Gray labeling and mixed labeling for near-convergence performance. The reasoning is similar to the argument for comparing the BICM using Gray labelling and the BICM using the mixed labelling over the Rayleigh fading channel [40] [41] [17]. In the following, we propose a turbo coded Alamouti STBC system with mixed labelling which can achieve not only a lower threshold but also a lower error floor as compared to the original turbo coded Alamouti STBC system with Gray labelling. B. Proposed turbo coded Alamouti STBC system The mixed labelling for 8PSK has the characteristic that it can be partitioned into two Gray-mapped QPSK constellations indexed by the first labelling bit. In other words, the four signal points of the 8PSK constellation with the same first labelling bit can be regarded as a Gray-mapped QPSK constellation. Based on this characteristic, the implementation of the proposed system is the same as the original system in Fig. 7 using mixed labelling except that we delay v̄1, by w/2 bits Fig. 9. BER of the proposed turbo coded Alamouti STBC system using restricted IDSC and IDAC-I (N = 105 ) and an original turbo coded Alamouti STBC system in [29] (N = 3 × 105 ). Nt = 2, Nr = 1. before been fed into the 8PSK signal mapper. At the -th block unit, bits in v̄1,−1 , v̄2, , and v̄3, are used as the first-level, second-level, and third-level labelling bits of the 8PSK signals, respectively. For our system, the design of the interleaver and demultiplexer may somewhat affect the error performance. In this paper, we consider the design such that all the bits in v̄1, and v̄2, are message bits and all the bits in v̄3, are parity bits of the turbo code. With the introduction of the delay elements, the minimum symbol-wise Hamming distance of our system is likely to be greater than that of the original system using mixed labelling. Note that the symbol-wise Hamming distance and squared product distance determine the diversity order and coding gain, respectively, for fixed values of Nt and Nr [41]. The symbol-wise Hamming distance plays a more important role in the performance at the error-floor region as compared to the squared product distance. Hence, our system may achieve a lower error floor as compared to the original system using either mixed or Gray labelling. Using an approach similar to the proposed BLAST system, we can obtain the detector EXIT curves of our turbo coded Alamouti STBC system based on the conditions of Ia (v̄1,−1 ) = 1 and Ia (v̄2,+1 ) = Ia (v̄3,+1 ) = x, 0 ≤ x ≤ 1. The resultant detector EXIT curves are also shown in Fig. 8. We find that the detector EXIT curves are close to horizontal and hence can match the decoder EXIT curve well. In addition, we find that the close-to-horizontal detector EXIT curve for the proposed STBC system lying above the close-to-horizontal detector EXIT curve for the original STBC system in Fig. 7 using Gray labelling even for the worst case of Ia (v̄2,+1 ) = Ia (v̄3,+1 ) = 0. This implies that there is room of performance improvement which can be obtained from improving the decoding of our STBC system by employing the information passed from the adjacent turbo codeword. Such kind of performance improvement can not be obtained by simply extending the code length of the original STBC system. Hence, the proposed STBC system may achieve a better threshold as compared to the original STBC system if the iterative decoding for the proposed system is properly designed. UENG et al.: TURBO CODED MULTIPLE-ANTENNA SYSTEMS FOR NEAR-CAPACITY PERFORMANCE Like the proposed BLAST system, we can use IDSC, IDACI, or IDAC-II to decode the proposed turbo coded Alamouti STBC system since the introduction of the delay elements. For simplicity of presentation, only the results of using restricted IDSC and restricted IDAC-I are presented here. Based on the assumption of Ia (v̄1,−1 ) = 1, we can obtain the associated thresholds of our proposed STBC system by a way similar to the proposed BLAST system. The thresholds of our STBC system using restricted IDSC and restricted IDAC-I are 4.59 dB and 4.45 dB, respectively, which are lower than the thresholds of the original STBC system in Fig. 7 (4.76 dB and 4.80 dB respectively for Gray labeling and mixed labeling). We see that using restricted IDAC-I, our STBC system can achieve BER = 1.67 × 10−8 at Eb /N0 of 4.66 dB, which is lower than the thresholds of the original STBC system using Gray labeling and mixed labeling. This verifies the claim “This implies · · · extending the code length of the original STBC system.” given at the last paragraph. From the BER results shown in Fig. 9, we see that the advantage of introducing block delay is obvious. V. C ONCLUSIONS We introduce a convolutional interleaver with block delay into the turbo coded BLAST system and the turbo coded Alamouti STBC system. The EXIT analysis shows that the block delay helps us flatten the detector EXIT curves and hence can match well with the EXIT curve of the turbo decoder. We devise various decoding algorithms with various complexities and error performances. Using turbo coded BLAST with a proper delay design, we can obtain thresholds very close to the capacity for Nt = 2, Nr = 1 and Nt = 2, Nr = 2, respectively. For Nt = 4, Nr = 1, we see that using turbo coded BLAST with various delay designs, we can lower the thresholds by various degrees as compared to not using the delay design. Using turbo coded Alamouti STBC with a proper delay design, we can obtain thresholds and error floors better than those of the original turbo coded Alamouti STBC without the delay design for Nt = 2, Nr = 1. ACKNOWLEDGMENT The authors are very grateful to the reviewers who provided valuable comments and suggestions which significantly enhance the quality of this paper. R EFERENCES [1] G. J. Foschini, “Layered space-time architecture for wireless communications in a fading environment when using multi-element antennas,” Bell Labs Tech. Journal, pp.41-59, Autumn 1996. [2] M. Sellathurai and S. Haykin, “TURBO-BLAST for high speed wireless communications,” in Proc. IEEE Wireless Comm. and Nework Conf., 2000, WCNC 2000, Sept. 2000, Chicago. [3] A. van Zelst, R. van Nee, and G.A. Awater, “Turbo-BLAST and its performance,” Proc. Vehicular Tech. Conf., vol 2, May 2001. [4] A. Stefanov and T. M. Duman, “Turbo coded modulation for systems with transmit and receive antenna diversity over block fading channels: system models, decoding approaches, and practical considerations,” IEEE J. Select. Areas Commun., vol. 19, pp. 958-968, May 2001. [5] S. ten Brink and B. M. Hochwald, “Detection thresholds of iterative MIMO processing,” in Proc. IEEE Int. Symp. Inf. Theory, Lausanne, Switzerland, June 2002, p.22. 963 [6] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, No. 3, pp. 389–399, March 2003. [7] G. Yue and X. Wang, “Optimization of irregular repeat-accumulate codes for MIMO systems with iterative receivers,” IEEE Trans. Wireless Commun., vol. 4, No. 6, pp. 2843–2855, Nov. 2005. [8] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check codes for modulation and detection,” IEEE Trans. Commun., vol. 52, No. 4, pp. 670–678, April 2004. [9] S. ten Brink and G. Kramer, “Design of repeat-accumulate codes for iterative detection and decoding,” IEEE Trans. Signal Processing, vol. 51, No. 11, pp. 2764–2772, Nov. 2003. [10] S. M. Alamouti, “A simple transmitter diversity scheme for wireless communications,” IEEE J. Select. Areas Commun., vol. 16, pp. 14511458, Oct. 1998. [11] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Trans. Inform. Theory, vol. 49, pp. 1456-1467, July 1999. [12] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: performance criterion and code construction,” IEEE Trans. Inform. Theory, vol. 44, pp. 744-765, March 1998. [13] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE Trans. Inform. Theory, vol. 28, pp. 55–66, Jan. 1982. [14] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. Inform. Theory, vol. 40, pp. 873-884, May 1992. [15] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,” IEEE Trans. Inform. Theory, vol. 44, pp. 927–946, May 1998. [16] X. Li and A. Ritcey, “Bit-interleaved coded modulation with iterative decoding,” IEEE Commun. Lett., vol. 1, no. 6, pp. 77–79, Nov. 1997. [17] X. Li and A. Ritcey, “Turbo trellis coded modulation with bit interleaving and iterative decoding,” IEEE J. Select. Areas Commun., vol. 17, no. 4, pp. 715–724, April 1999. [18] Y. L. Ueng, C. J. Yeh, and M. C. Lin, “On trellis codes with delay processor and signal mapper,” IEEE Trans. Commun., vol. 50, pp. 19061917, Dec. 2002. [19] S. Le Goff, A. Glavieux, and C. Berrou, “Turbo-codes and high spectral efficient modulation,” in Proc. ICC’94, pp. 1064-1070, 1994. [20] P. Robertson and T. Wörz, “Bandwidth-efficient turbo trellis coded modulation using punctured component codes,” IEEE J. Select. Areas Commun., vol. 16, pp. 206–218, Feb. 1998. [21] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Parallel concatenated trellis coded modulation,” in IEEE Conf. on Commun., pp. 974–978, 1996. [22] C. Fragouli and R. D. Wesel, ”Turbo-encoder design for symbolinterleaved parallel concatenated trellis coded modulation,” IEEE Trans. Commun., vol. 49, no. 3, pp. 425–435, March 2001. [23] Li Ping, B. Bai, and X. Wang, “Low-complexity concatenated two-state TCM schemes with near-capacity performance,” IEEE Trans. Inform. Theory, vol. 49, no. 12, pp. 3225–3234, Dec. 2003. [24] D. Sridhara and T. E. Fuja, “LDPC codes over rings for PSK modulation,” IEEE Trans. Inform. Theory, vol. 51, no. 9, pp. 3209–3220, Sept. 2005. [25] S.M. Alamouti, V. Tarokh, and P. Poon, “Trellis coded modulation and transmit diversity: design criteria and performance evaluation,” in Proc. IEEE ICUPC’98, pp. 703-707, Oct. 1998. [26] Y. Gong and K. B. Letaief, “Concatenated space-time block coding with trellis coded modulation in fading channels,” IEEE Trans. Wireless Commun., vol. 1, No. 4, pp. 580-590, Dec. 2002. [27] Z. Hong and B. Hughes, “Bit-interleaved space-time coded modulation with iterative decoding,” IEEE Trans. Wireless Commun., vol. 3, No. 6, pp. 1912-1917, Nov. 2004. [28] Y. L. Ueng, Y. L. Wu, and R. Y. Wei, “Concatenated spece-time block coding with trellis coded modulation using a delay processor,” IEEE Trans. Wireless Commun., vol. 6, no. 12, pp. 4452-4463, Dec. 2007. [29] G. Bauch, “Concatenation of space-time block codes and ”turbo”-TCM,” in Proc. IEEE ICC’99, pp. 1202-1206, 1-10 June, 1999. [30] T.H. Liew, J. Pliquett, B.L. Yeap, L-L. Yang, and L. Hanzo, ”Comparative study of space time block codes and various concatenated turbo coding schemes,” in Proc. PIMRC, vol. 1, pp. 741-745, 2000. [31] J. Hou, P. H. Siegel, and L. B. Milstein, “Design of multi-input multioutput systems based on low-density parity-check codes, ” IEEE Trans. Commun., vol. 53, No. 4, pp. 601–611, April 2005. [32] S. ten Brink, “Convergence behavior of iterative decoded parallel concatenated codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 17271737, Oct. 2001. 964 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 6, AUGUST 2009 [33] A. Wittneben, “Base station modulation diversity for digital SIMULCAST,” in Proc. Vehicular Technology Conf., vol. 1, pp. 848-853, May 1991. [34] J. H. Winters, “The diversity gain of transmit diversity in wireless systems with Rayleigh fading,” IEEE Trans. Veh. Technol., vol. 47, pp. 119-123, Feb. 1998. [35] A. Ashikhmin, G. Kramer, and S. ten Brink, “Extrinsic information transfer fuctions: model and erasure channel properties,” IEEE Trans. Inform. Theory, vol. 50, No. 11, pp. 2657–2673, Nov. 2004. [36] S. ten Brink, “Space-time turbo coding,” a Chapter in Space-Time Wireless Systems, H. Bölcskei, D. Gesbert, C. B. Papadias, and A.-J. van der Veen, pp. 322–341, Cambridge Univ. Press Nov. 2006. [37] J. L. Ramsey, “Realization of optimum interleavers,” IEEE Trans. Inform. Theory, vol. IT16, pp. 338-345, 1970. [38] T.J. Richardson, M.A. Shokrollahi, and R.L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001. [39] H. Jin, A. Khandekar, and R. J. McEliece, “Irregular repeat-accumulate codes,” in Proc. Int. Symp. Turbo Codes and Related Topics, Brest, France, Sept. 2000, pp. 1–8. [40] D. Divsalar and M.K. Simon, “The design of trellis coded MPSK for fading channels: performance criteria,” IEEE Trans. Commun., vol. 36, pp. 1004-1012, Sept. 1988. [41] X.N. Zeng and A. Ghrayeb, “Performance bounds for combined channel coding and space-time block coding with receive antenna selection,” IEEE Trans. Veh. Tech., vol. 55, no. 4, pp. 1441–1446, 2006. [42] J. Suh and M.M.K. Howlader, ”Design schemes of space-time block codes concatenated with turbo codes,” in Proc. IEEE 55th Vehicular Tech. Conf., pp. 1030-1034, vol. 2, Spring. 2002. [43] A. Sezgin, D. Wubben, R. Böhnke and V. Kühn, ”On EXIT-charts for space-time block codes,” in Proc. IEEE Inter. Symp. on Inf. Theory 2003, Yokohama, Japan, June 29-July 4, 2003. [44] T.A. Narayanan and B.S. Rajan, ”A general construction of space-time trellis codes for PSK signal sets,” in Proc. IEEE Global Telecommun. Conf., San Francisco, USA, pp. 1978-1983, vol. 4, Dec. 2003. [45] M. Tao and R. S. Cheng, ”Diagonal block space-time code design for diversity and coding advantage over flat fading channels,” in IEEE Trans. Signal Processing, vol. 52, no. 4, pp.1012-1020, April 2004. Yeong-Luh Ueng received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1997. At the same university, he received the M.S. and Ph.D. degrees in communication engineering from the Graduate Institute of Communication Engineering in 1999 and 2001, respectively. From 2001 to 2005, he was with a private company in Taiwan and focused on the design and development of various wireless chips including RF transceiver chips and Bluetooth and PHS baseband chips. Since December 2005, he has joined the faculty of National Tsing-Hua University, Hsinchu, Taiwan, where he is currently an Assistant Professor in the Department of Electrical Engineering and the Institute of Communications Engineering. His research interests include coding theory, wireless communications, and communication IC. He was elected an honorary member of the Phi Tau Phi Scholastic Honor Society. He is a member of IEEE. Chia-Jung Yeh received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1990. At the same university, he received the M.S. and Ph.D. degrees in communication engineering from the Graduate Institute of Communication Engineering in 2000 and 2009, respectively. He is the chief of the Information Management Office in National Education Radio, Taipei, Taiwan, Republic of China. His research interests include coding and communications theory. Mao-Chao Lin was born in Taipei, Taiwan, Republic of China, on December 24, 1954. He received the Bachelor and Master degree, both in electrical engineering, from National Taiwan University in 1977 and 1979, respectively. He also received the Ph.D. degree in electrical engineering from University of Hawaii in 1986. From 1979 to 1982, he was an assistant scientist of Chung-Shan Institute of Science and Technology at Lung-Tan, Taiwan. He is currently a Professor at the Department of Electrical Engineering, National Taiwan University. His research interests are in the area of coding theory and its applications. Chung-Li Wang received his B.S. in the Department of Electrical Engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2004, and his M.S. in the Graduate Institute of Communication Engineering from National Taiwan University, Taipei, Taiwan in 2006. From 2006 to 2008, he was a Ph.D. student in the Department of Electrical Engineering, University of Hawaii, Manoa, Honolulu, and is now in the Department of Electrical and Computer Engineering, University of California, Davis, U.S.A. His research includes modern coding theory and wireless communications.
© Copyright 2026 Paperzz