High-Throughput Message-Passing Decoder Design for Improved Low Error Rate Performance

March 30, 2009
10:00am-11:00am
Interschool Lab, Room 750 CEPSR
Speaker: Zhengya Zhang, University of California, Berkeley

Abstract

In the past several decades, tremendous progress has been made in both communication theory and practical implementation of communication systems. However, practice often lags the most recent developments in theory possibly for two reasons: the cost of implementation is high, and the practical implementation incurs a non-negligible loss compared to the theoretical bounds. The two objectives of what is theoretically possible and what is achievable by implementation can be better aligned, so theory can be made more relevant and practice can be more powerful and efficient.

In this talk, a novel emulation-simulation framework will be presented on studying the low error rate performance of capacity-approaching low-density parity-check (LDPC) codes decoded using a message-passing algorithm. High-throughput hardware emulation uncovers combinatorial error structures that underpin the error floors. The captured errors are analyzed in functionally equivalent software simulation to illuminate the effects of word length, quantization, and algorithm design, thereby extending the theoretical discovery for practical usage. The emulation-simulation framework further allows the algorithm and implementation to be iteratively refined to improve the error-floor performance of message-passing decoders. An adaptive quantization scheme is first introduced to reduce the degradation of soft decoding. Then a reweighted message-passing algorithm is proposed to eliminate local minima caused by the remaining dominant errors. This improved algorithm is realized in a simple post-processor that compensates the message-passing decoding algorithm to achieve the near maximum-likelihood decoding performance. Results are demonstrated by a 47.7 Gb/s LDPC decoder operating without error floor down to the bit error rate level of 10-14. The 5.35 mm2, 65nm CMOS chip realizes a grouped parallel architecture that optimizes the area and power efficiencies by aggressively scaling down the interconnection overhead. The iterative emulation-simulation framework and systematic architectural exploration can be extended to other complex systems, thereby enabling the joint optimizations of algorithm, architecture, and implementation.


500 W. 120th St., Mudd 1310, New York, NY 10027    212-854-3105               
©2014 Columbia University