Ahmad Zuri Sha'ameri 2007 Configurable Adaptive Viterbi Decoder

Proceedings of the 2007 IEEE International Conference on Telecommunications and
Malaysia International Conference on Communications, 14-17 May 2007, Penang, Malaysia
Configurable Adaptive Viterbi Decoder for GPRS , EDGE and Wimax
Mohamed Farid Noor Batcha
Mimos Berhad
Malaysia
Email:[email protected]
capacity. This is bounded by regulations which
limit the channel capacity in terms of signal
power and bandwidth. With such limitations it is
realized that with the introduction of channel
coding, signals can be transmitted with lower
power and still be error free. Bandwidth could be
of waste if at good channel a system is running
using a code rate 1/3 encoder while a code rate
1/2 would be sufficient. This paper evaluates the
joint architecture of a constraint length 5 and 7
with both rates 1/2 and 1/3 Viterbi decoder and
also some performance were monitored with few
channel models to show the advantages of the
different Viterbi decoder under the different
channel conditions.
The efficient implementation of the
joint Viterbi architecture was then prototyped on
an FPGA and system verification was done.
Abstract – Error correction codes are used
widely in all wireless communication systems to
reduce data corruption. The most widely used
decoding algorithm is the Viterbi decoder which
is used with different parameters for different
standards requirements. This paper analyses
the different Viterbi decoders and implements a
reconfigurable adaptive Viterbi decoder for
GPRS, EDGE and Wimax technologies. The
high performance generic soft input hard
output Viterbi decoder is prototyped on a
FPGA.
1.0 INTRODUCTION
Channel coding is required in digital
communications over noisy channels to
maximize bit-error rate performance and
throughput. Under multipath fading conditions,
the coding scheme must be strong enough to
cater for random as well as bursty errors.
Various coding schemes are used in the
wireless packet data network of GPRS, EDGE
and Wimax to maximize channel capacity. GPRS
uses a Constraint length 5 and rate 1/2 Viterbi
decoder, while EDGE uses a constraint length 7
rate 1/3 with both tail biting and zero tail Viterbi
decoding [1][2]. The tail biting Viterbi decoder
is used on the header portion of the Viterbi
decoder while the zero tail Viterbi is used on the
data portion. Wimax 802.16e currently has the
Viterbi with constraint length 7 and rate ½ with
tail biting as mandatory and zero tail as optional
[3].
The complexity of the Viterbi decoder
grows exponentially with the increase of
constraint length. It is expensive in terms of area
size for EDGE systems to have two separate
engines, to support both the GPRS and EDGE
Viterbi decoder as EDGE still requires backward
compatibility to GPRS. Several ideas of generic
Viterbi decoders were suggested but require
changing the configuration to generate the
required constraint length Viterbi decoder but are
unable to adapt the changes on the fly [4].
In wireless communication, one of the
main challenges is to maximize the channel
1-4244-1094-0/07/$25.00 ©2007 IEEE.
Ahmad Zuri Sha’ameri
Digital Signal Processing Lab
University Technology Malaysia
Malaysia
2.0 CHANNEL MODEL
Three main categories of channel models are
characterized
i) AWGN only
ii) Fast Fading
iii) Slow Fading
The block diagram for the channel model is
shown in Figure 1.
The Channel comprises of two main
components (i.e.) AWGN ( G (t ) ) and Fading
( f (t ) ). The transmitted symbols r (t ) is
multiplied with the fading samples f (t ) and
237
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on January 5, 2009 at 20:42 from IEEE Xplore. Restrictions apply.
number of states = 2 L −1
added with Gaussian noise G (t ) to form the
received symbols R (t ) as described in (1).
(3)
The coding rate depicted in equation (1) could be
changed with puncturing where selected bits are
not transmitted and on the receiver side the
locations where the bits were punctured is filled
with an unbiased value [5].
R (t ) = f (t ).r (t ) + G (t )
(1)
The fading models assumed are flat
fading with a Rayleigh distribution. Nonfrequency selective model is chosen since the
work is strictly on channel coding and in practice
an equalizer is included as part of the overall
digital communication system. Fast fading is
assumed such that the coherence bandwidth is
larger than a symbol period. Thus, the fading is
independent between symbols. Assumptions
based for slow fading is limited to roughly
twenty symbols under deep fade. This is to avoid
enhanced methods of interleaving. Simulations
based on slow fading uses the standard block
interleaver. The system is also assumed to be
perfectly synchronized.
3.2 Viterbi Decoder
The Viterbi decoder is the common
method used to decode the convolutional codes.
It uses the maximum likelihood estimation
concept to predict the most likely transmitted
sequence.
P( Z | U ( m ') ) = max P( Z | U ( m ) ) over all U ( m ) (4)
(m )
Where Z is the received sequence and U
is
one of the possible transmitted sequences, and
chooses the maximum (the closest possible
received sequence). The algorithm basically
builds a trellis diagram of the most probable
paths, and after some depth the paths are traced
back to obtain the most likely transmitted
sequence. The Viterbi decoder is capable of
accepting soft bits or hard bits. Soft decision
gives the decoder more than two levels of
decision. Hard decision decoding provides the
decoder with only two levels {0,1} and performs
worse by around 2dB compared with soft
decision. The main computational blocks in the
Viterbi decoder are the Branch Metric
computation and the Add Compare Select
operation. The branch metric calculation will be
based on either soft bits or hard bits.
The trace back depth depends mainly on
the memory management of the algorithm. The
longer the trace back depth the larger the trellis
will grow, and the larger the memory
requirements. If the trace back depth is made too
short, the performance of the codes will be
affected drastically. An optimal trace back depth
of 5 * L (constraint length) is used for unpunctured codes as described in [5].
3.0 CHANNEL CODING METHODS
The methods of channel coding
discussed will be the convolution code and its
decoding algorithm the Viterbi decoder. The
block code used for burst error correction is the
fire code.
i) Convolutional Encoding
ii) Viterbi Decoder
3.1 Convolution Encoder
Convolution encoder basically builds
memory to the information bits. In convolutional
codes each block of k input bits is mapped onto
a block of length n bits. This gives the code rate
R of convolution codes as
k
R=
(2)
n
The n output bits are not only determined by the
present k information bits, but also by the
previous bits which go through some memory
structure determined by specified generator
polynomials.
There
are
different
types
of
convolutional codes, differentiated by its
constraint length L and its code rate R .
Constraint length determines the complexity of
the codes. The complexity is realized in the
decoder structure. Constraint lengths exceeding
L = 9 are too complex and are not realized using
the Viterbi decoder. The constraint length relates
to the number of states that exists on the decoder
portion. The number of states is determined by
4.0 IMPLEMENTATION OF JOINT
VITERBI ARCHITECTURE
The Viterbi decoder consist of 5 main
blocks as depicted in Figure 2.
Figure 2:Viterbi decoder blocks
238
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on January 5, 2009 at 20:42 from IEEE Xplore. Restrictions apply.
The ACS unit is the most critical part in the
design. This is the module that will determine
that throughput / latency, area size and power
efficiency of the design. The ACS unit requires
The BMC unit generates branch metric
for the ACS (Add Compare Select) unit for the
selection of the most probable path. The BMC is
adaptable to receive a rate 1/2 or rate 1/3
encoded data, which is set to a fixed soft value of
size 4 bits. Using 4 soft bits gives a performance
close to the ideal scenario. The branch metric
was computed using Manhattan distance in order
to reduce complexity in the design. Given X0
and X1, as the received 4 bit symbols for a code
rate 1/2, the branch metric was computed as
below:
BM 00 = X 0 + X 1
BM 01 = X 0 + (15 − X 1)
(5)
BM 10 = (15 − X 0 ) + X 1
BM 11 = (15 − X 0 ) + (15 − X 1)
2 K −1 ACS nodes, where K is the constraint
length of the code.
The basic structure of the ACS unit is
depicted in the below Figure 4:
The pipeline of the branch metric was
implemented using a straight forward
mechanism to build all possible branches from 0
to 7 as depicted in the Table 1.
Time 1
0
0
0
0
1
1
1
1
Figure 4: An ACS unit
Therefore for a constraint length 7 Viterbi
Decoder, the ACS unit consists of 64 ACS units.
Several papers have suggested optimized
implementation of the ACS unit with RAM
modules to take care of the feedback of the
updated path metrics. Such implementation
creates delay in the decoding process and may
impact on the system timing.
Due to the feedback property of the path
metric, the register size grows as the trellis is
built over time during decoding. To avoid
overflow, various suggestions were given to
introduce normalization [10]. For this paper,
since the application required small packet size,
normalization was not required if the path metric
size was 14 bits wide. Comparison was made by
introducing normalization versus increasing the
path metric size, and since due to the small
packet length, simply increasing the bit width
was more beneficial as normalization requires
comparator logic. If packet size were to increase,
normalization would at a certain point be more
efficient in area.
The surviving bit which is the output of
the ACS unit, is collected in the survivor
Time 2
Time 3
00
000
00
001
01
010
01
011
10
100
10
101
11
110
11
111
Table 1: Branch Metric Pipelining
At Time instance 2, the branch metric
for a code rate 1/2 is ready to be passed to the
ACS unit, while in time instance 3, the branch is
ready for a code rate 1/3. The hardware structure
of a branch metric unit is shown below in Figure
3.
K −1
. Full
memory for all N states, where N = 2
traceback length was implemented to achieve
high performance and the maximum memory
size required was 64x612 bits. The assumption
taken into consideration was that the memory
would be shared between other baseband
processing modules such as the equalizer and
Figure 3: Branch Metric Unit
239
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on January 5, 2009 at 20:42 from IEEE Xplore. Restrictions apply.
that the Viterbi would not contribute to the
overall memory growth in the design.
The traceback unit would start once the memory
module has finished processing the write
operations and is ready to be read. Implemented
in a FSM manner, the traceback traces the most
possible path and outputs the decoded data in the
reverse order. The final unit which is the LIFO
basically flips the data into its correct order.
Finally the control switch between the
different Viterbi decoder is set by a 2 bit control
input Config_K_r. The control input is
configured as mentioned in the Table 2.
Figure 5: Fast fading simulation results
Config_K_r[1:0]
Constraint Rate
Length
00
5
1/2
01
5
1/3
10
7
1/2
11
7
1/3
Table 2: Configuration of Viterbi Decoder
5.0 RESULTS
The performance of the reconfigurable
Viterbi was first evaluated using Matlab under
the three channel conditions discussed above.
The plots of the Viterbi decoder of the
configurable constraint length and rate is shown
in figures 4,5 and 6 under the respective channel
conditions. It is observed that under the slow
fading channel the constraint length 7 rate 1/3
gives an increase of 3dB compared to the
constraint length 5 rate 1/2, while in just
Gaussian noise the gain was around 1dB. Also in
the fast fading model the gain was around 2dB.
Figure 6: Slow fading simulation results
The hardware simulation was setup by
first implementing the reconfigurable Viterbi
decoder on an Altera APEXII 20K200E FPGA
development board. The hardware simulation
was setup using two boards, one being the
encoder and the other the decoder. Using the
UART protocol, two PC’s running
HyperTerminal was set as depicted in Figure 5.
Figure 5: Hardware setup
Figure 4: AWGN simulation results
Due to the slow communication speed of the
UART, buffers were implemented to allow the
handshake between the different clock domains.
240
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on January 5, 2009 at 20:42 from IEEE Xplore. Restrictions apply.
[3] IEEE Std 802.16e-2005 : “Part 16: Air interface
for Fixed and Mobile Broadband Wireless Access
Systems”
The UART was set to run at 115200bps while
the FPGA board was running on a 33 MHz
clock.
The overall gain in implementing the
shared hardware architecture of the different
Viterbi decoder is depicted in terms of Logic
Elements (LE) in Table 3.
[4] Abdulfattah Mohammad Obeid, A.Garcia
Ortiz,”Prototyping of a High Performance Generic
Viterbi Decoder” IEEE 2002.
[5] Bernard Sklar, Digital Communication,
Fundamentals and Applications, Prentice Hall,
2002.
Viterbi architecture
Logic Elements
K =5, rate 1/2
1106
K =7, rate 1/2
3962
K =7, rate 1/3
4187
Reconfigurable
4676
Viterbi
Table 3: Gain in terms of LE of shared
architecture
When compared with K=7, and rate 1/3 and K=5
rate 1/2 , the gain in LE of the reconfigurable
Viterbi is around 7%. The reconfigurable
architecture was further synthesized on design
analyzer of Synopsys, to achieve a speed and
throughput of 150MHz, due to the parallel
structure of the ACS units.
[6] Young Min Kim, William C.Lindsey ,”Adaptive
Coded-Modulation in Multipath Fading
Channels”,IEEE. 1999.
[7] David M. Mandelbaum,”On Forward Error
Correction with Adaptive Decoding”,IEEE
Transactions on Information Theory, March 1975.
[8] Yiquan Zhu, Mohammed Benaissa,
“Reconfigurable Viterbi Decoding Using a New
ACS Pipelining Technique”, IEEE 2003.
[9] S.Swaminathan,“An FPGA-based Adaptive Viterbi
Decoder” Master’s thesis, University of
Massachusetts, Amherst, Department of Electrical
and Computer Engineering, 2001.
6.0 CONCLUSIONS
[10] C.B. Shung, P.H. Siegel, G. Ungerboeck and
H.K. Thapar,“ VLSI architectures for metric
normalization in Viterbi algorithm,“ IEEE
International conference on
communications,vol.4,pp.1723-1728,1990.
Many suggested architecture, of the
Viterbi decoder compromise the speed with area,
[11] but for upcoming high data rate
technologies such as Wimax, speed is the more
critical issue. With an achievable speed of 150
MHz, the reconfigurable Viterbi decoder is able
to satisfy the requirements of Wimax. For the
GPRS and EDGE technologies, the
reconfigurable Viterbi decoder gives an area
advantage of 7% if there were two Viterbi cores
implemented independently.
Systems that require channel coding
adaptability to improve throughput would benefit
from the reconfigurable Viterbi decoder as to
transmit a rate 1/2 when the channel is good and
when the channel degrades, by switching to a
different configuration mode the system would
adapt to transmit with a rate 1/3.
7.0 REFERENCES
[1] GSM 05:03: “Channel coding”, Version 8.9.0
Release 1999.
[2] GSM 03.64: “Overall description of the GPRS
radio interface; Stage 2” Version 8.12.0. Release
1999.
241
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on January 5, 2009 at 20:42 from IEEE Xplore. Restrictions apply.