Design and Implementation of Parallel LTE Turbo Decoder
|School||Shanghai Jiaotong University|
|Course||Circuits and Systems|
|Keywords||Turbo decoder LTE parallel decoding QPP inerleaver|
As the development of wireless technology, the demand of customers for high-speed reliable wireless transmission of data, especially mobile communication, becomes more and more intensive. Recently, 3rd Generation Partnership Project (3GPP) proposed Long Term Evolution (LTE) which employs turbo decoder as channel code. The downlink data rate must achieve over 100Mbps as LTE requires. The parallel turbo decoder needs to be employed to meet the demand. However, the parallel decoding could corrupt the BER performance, especially for small blocks.In this paper, the parallel turbo decoder is optimized in algorithm and architecture level. An optimized parallel decoding algorithm is presented to improve the performance of error correction for small blocks in LTE standard. The greatest loss of BER performance caused by parallel decoding, where the block length is 40, is reduced by 0.19dB. No more hardware cost is added due to this. In the hardware, the VLSI architecture is designed and implemented to support 188 block sizes in LTE protocol. The decoder adopts 8 SISO and radix-4 recursion. And the solution to the problems caused by parallel decoding is provided. In order to reduce the complexity of interleaving, a novel configurable quadratic permutation polynomial (QPP) multistage network is proposed. The proposed 2n-input network can be configured to support 2i-parallelism decoders ( 0 ?i ?n) without additional hardware consumption. Besides, the architecture of a low-cost address generator is proposed to generate the addresses and control signals. In addition, the proposed network can be generalized to support arbitrary contention-free interleavers by cascading an additional specially-designed network. Since some block sizes can not employ radix-4 recursion due to the fact that they can not be divided by 16, the structure of a bi-mode ACS unit implementing both radix-2 and radix-4 recursion is addressed. Compared to general ACS unit, only the delay of one 2-input multiplex is added to the critical path. Moreover, memory architecture and address mapping method are optimized to avoid memory access contention.The implementation result of the proposed turbo decoder shows 300MHz frequency, 280Mbps throughput and 4.2mm2 area in 130 nm technology. The design has best area efficiency compared to other published works and the BER performance is also better than others.