ASIC Implementation of LDPC Decoder Accelerating Message

ASIC Implementation of LDPC Decoder
Accelerating Message-Passing Schedule
Motivation:
Requirement is performance gain
(bit error performance and data rate)
with a small hardware overhead.
1
2
1.E+00
3
4
5
Concurrent Schedule
This Work
1.E-01
1.E-02
1.E-03
1.E-04
1.E-05
SNR [dB]
16
Average # of Iterations
Message-Passing Algorithm:
is an iterative algorithm for decoding LDPC
codes, is composed of row operation and
column operation.
Inherent parallelism of the algorithm makes it
suitable for hardware design.
Bit Error Rate
Low-Density Parity-Check Code:
is an error correcting code which achieves
information rates very close to the Shanon limit.
12
Concurrent Schedule
This Work
8
4
This Work: Accelerating decoding convergence 0
1
enables the decoder to improve Bit Error Performance
and Decoding Throughput within a limited delay.
2
3
SNR [dB]
4
5
Waseda University
Accelerating Message-Passing Schedule
message alpha
Parity check matrix for hardware sharing
Row Operation
message beta
Column Operation
Row Operation
Step1
Step2
Column Operation
Step3
 The decoding performance depends on the scheduling
of the row and column operations.
Concurrent Schedule
time
row 1
row 2
row 3
row 4
col. 1
col. 2
col. 3
col. 4
This Work : Accelerating Schedule
row 1
row 2
col.1-1 col.1-2 col.1-3
row 3
col.2-1 col.2-2 col.2-3
row 4
col.3-1 col.3-2 col.3-3
 The updated message by the row operations are fed into
the column operation immediately after the row operations.
 The number of column operations increases three times
compared to the concurrent schedule.
message alpha updated by row operation
message beta updated by column operation.
Waseda University
LDPC Decoder Chip Implementation
Summary of the LDPC Decoder Chip
....................... ....................... .......................
....
....
Row Operation Modules
Column Operation Modules
....
CFU1
....
CFU1
8
BFU1
CFU1
BFU1
....
8×3=24 CFUs
BFU1
....
BFU1
8
....
Controller
BFU1
....
8×6=48 BFUs
Parity Check
Module
Input
Module
....
....
BFU1
....
Output
Module
....
input data
SRAM for output data (6 banks)
SRAM for initial messages (6 banks)
SRAM for message beta (18 banks)
output data
Code length
3,072 [bits]
Code rate
0.5, (3,6)-regular
Design process
0.18um, 6Metal,CMOS
Chip size
5.0mm* 5.0mm
Gate count
96,945 (Decoder Core)
# of CFU
3 * 8 = 24
# of BFU
6 * 8 = 48
Total SRAM Area
7,113,932 [μm2]
PLL Area
266,136 [μm2]
Chip Density
49%
Clock Frequency
120 [MHz] (Max)
........... ........... ........... ........... ........... ...........
SRAM for
message β
SRAM for message alpha (18 banks)
Experimental Results.
(TSMC 0.18μm CMOS, SNR=4.5, Iteration limit=10, @120[MHz])
Total Area
[μm2]
Power
[mW]
Average #of
Iterations
Throughput
[Mbps]
Power/Thr.
[mW/Mbps]
Bit Error
Rate
Concurrent
Schedule
10,562,680
327.4
8.536
48
6.8
0.000141
Proposed
Schedule
11,238,066
528.9
3.391
122
4.3
0.000000
SRAM for
message β
LDPC Decoder Core
SRAM for
message α
SRAM for
message α
Waseda University