You are on page 1of 4

IEEE COMMUNICATIONS LETTERS, VOL. 17, NO.

3, MARCH 2013

545

Speech Quality Improvement Based on List Viterbi and Joint Source-Channel Decoding in UMTS
Yuejun Wei, Kedi Wu, Bin Xia, and Yuhang Yang

AbstractIn this letter, we propose a simple and efcient method, called Dual-CRC, to improve speech quality based on the list Viterbi algorithm (LVA) in the Universal Mobile Telecommunications System (UMTS). We also employ the wellknown joint source-channel decoding (JSCD) scheme together with LVA decoding to further improve the speech quality. Simulation results show that the two proposed decoders, the Dual-CRC LVA and the combination of the LVA and the JSCD, can signicantly improve the speech quality for UMTS, especially in low signal-to-noise ratio and low mean opinion score regions. Index TermsAMR speech, List Viterbi decoding, joint source-channel decoding, mean opinion score.
Fig. 1.

UE

Air Interface

Node B
Iub

RNC
Iu
VA decoded data

CN

Class C AMR Class B speech codec Class A

CC encoding CC encoding CRC attachment & CC encoding

VA decoding VA decoding VA decoding & CRC check Inner-loop power control Measured SINR AMR speech codec CRC result Outer-loop power control

Power transmitter

Power commander

Target SINR

Transmission process for AMR speech in UMTS Uplink.

I. I NTRODUCTION HE adaptive multi-rate (AMR) audio codec is a patented speech coding scheme adopted by the 3rd Generation Partnership Project (3GPP) as the standard codec for the Universal Mobile Telecommunications System (UMTS) [1]. As the most popular 3G network, UMTS has absorbed more and more subscribers. Although there exists power control in UMTS, the speech quality can be difcult to maintain with an increasing number of users due to increased multiuser interference, especially when the user equipment (UE) operates in poor radio conditions or is exposed to intensive interference. Simulation results in [2] show that the mean opinion score (MOS) of AMR speech decreases quickly when C/I is lower than 3dB. A lot of techniques have been developed to improve the speech quality in wireless networks. A well-known technique is joint source-channel decoding (JSCD) [3], [4], which utilizes the redundancy of the source information to improve the transmission quality. Another way to improve the speech quality is to employ a channel controlled source code, which uses adaptive code rate according to the radio condition or the load of the network [5]. For example, in general a UE uses the 12.2kbps speech codec. In poor radio conditions, it switches to lower bit rate such as 7.4kbps or 4.75kbps. By reducing the source bit rate, the UE can avoid call drop at the price of lower speech quality. In UMTS, convolutional channel codes are employed for AMR speech transmission. They are usually decoded using the Viterbi algorithm (VA). An enhanced Viterbi decoding
Manuscript received October 18, 2012. The associate editor coordinating the review of this letter and approving it for publication was M. Flanagan. Y. Wei, B. Xia, and Y. Yang are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China (e-mail: {yjwei, bxia, yhyang}@sjtu.edu.cn). K. Wu is with the Department of Wireless Research, Huawei Technologies, Shanghai, China (e-mail: wukedi@huawei.com). Digital Object Identier 10.1109/LCOMM.2013.012213.122322

algorithm called the list Viterbi algorithm (LVA) was developed in [6], which outperforms the Viterbi algorithm with the assistance of some error detection codes such as cyclic redundancy check (CRC). However, the LVA decoder cannot be employed directly in UMTS for reasons which will be explained in Section III. In this letter, we propose a simple and applicable scheme using a list Viterbi decoder to improve the speech quality signicantly with little modication of the current system architecture. We further improve the AMR speech quality by utilizing the JSCD and the LVA jointly. These schemes are more robust to radio channel conditions in terms of speech quality, especially in low MOS regions.

II. BACKGROUND A. The Transmission Process of AMR Speech in UMTS An AMR speech frame includes three classes of bit streams: Class A, B and C. The bits of Class A have more signicant impact on speech quality than those of Class B and C. In UMTS, the bits of Class A are attached with a 12-bit CRC, while those of Class B and C have no CRC attachment in order not to introduce further overhead [7]. Then, the bits in each Class are encoded with a convolutional code, as shown in Fig. 1. After Viterbi decoding of the convolutional codes, the decoded bits of Class A are checked using CRC to verify if they are correctly decoded. The CRC result is sent to the AMR speech codec to indicate whether the speech frame should be used in the source decoding process. If the CRC result is incorrect, the AMR speech codec usually discards the corresponding speech frame, in order to avoid introducing noise. In the UMTS radio network controller (RNC), the CRC result is also used for the outer-loop power control which is targeted for control of the block error rate (BLER).

1089-7798/13$31.00 c 2013 IEEE

546

IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 3, MARCH 2013

B. Joint Source-Channel Decoding Each AMR speech frame can be divided into four subframes, each of which contains different parameters used to synthesize the speech signal. Of these, the parameters of Class A, including line spectrum frequencies (LSF), adaptive codebook index in odd (AIo) and even (AIe) subframes, adaptive codebook gain (AG), and xed codebook gain (FG), contain considerable correlation between adjacent frames or subframes. The correlation of the parameters between adjacent frames can be modeled using a rst-order Markov model with certain transition probabilities. To utilize this residual information, a straightforward way is to use the transition probability at either the channel or source decoders as source side information (SSI) as proposed in [8][11]. The algorithm that uses the SSI only at the channel decoder is called source-controlled channel decoding (SCCD), and the one using the SSI only at the source decoder is called soft bit source decoding (SBSD). The iterative joint source-channel decoding (ISCD) algorithm uses SSI at both channel and source decoders, and transmits extrinsic information between them. It is shown that the ISCD usually achieves better performance in term of bit error rate (BER) than the non-iterative JSCD [3], [10], [11]. But from the extrinsic information transfer (EXIT) chart, it is shown that little additional gain can be achieved after two iterations [12]. C. List Viterbi Decoding Based on the maximum likelihood (ML) criterion, the Viterbi decoder always chooses the best path from all paths in the decoding trellis as the decode result. LVA [6] is a kind of enhanced Viterbi algorithm assisted by some error detection codes such as CRC. Taking the parallel list Viterbi algorithm (PLVA) as an example, the decoder simultaneously produces N globally best candidates after a trellis-based search, then checks these candidates separately using CRC and chooses the correct one as the decoded result. If no candidate passes the CRC check, it takes the best candidate as output, which matches the output of the Viterbi algorithm. Because the LVA decoder makes its decision by considering multiple paths in the trellis, it usually outperforms the Viterbi decoder which only takes the best candidate as output. From our simulations, the PLVA4 (4 list candidates) algorithm has about 0.3 to 0.8dB gain over the Viterbi algorithm for the convolutional codes in UMTS, the actual gain depending on the channel condition and the size of the coded blocks. III. P ROPOSED S CHEME At the beginning of this section, we will describe the problem which arises if we directly apply the LVA decoder to the AMR speech frame in UMTS; then we will propose our solutions. From the AMR speech frame structure and the principle of LVA decoding, we can see that the LVA decoder can be applied to Class A while Class B and C cannot be decoded with the LVA since they do not have CRC attachments. If the Viterbi decoder is replaced with the LVA decoder, the BLER of Class A will get lower. However, when the actual BLER of

BLER Target BLER

BLER

BLER

LVA
A B C A B C

Power Control
A B C

Fig. 2.

Impact of LVA decoding on the AMR speech.

Class A is lower than the target BLER, the power controller will continue to reduce the transmit power of the AMR speech signal, until the BLER of Class A is converged to the target BLER, as shown in Fig. 2. Under the impact of power control, the BLER of Class A will nally remain the same as that when applying Viterbi decoding. But the BLERs of Class B and C will get higher, which results in worse AMR speech quality. As we know, power control is essential for a CDMA system, so it is not feasible to simply replace Viterbi decoding with LVA decoding in UMTS. A straightforward way to solve this problem is to adjust the rate matching (RM) factor [13] to allocate less resources to Class A and more resources to Class B and C. With this approach, the BLER of Class A remains unchanged while the BLERs of Class B and C become lower. However, this approach is not robust because the coding gain of the LVA is not stable under different channel conditions. The RM factor should be adjusted based on a precise prediction of the coding gain. Otherwise, it will affect the transmit power and the speech quality. For example, if the actual coding gain of the LVA is lower than the predicted gain, which means more resources are allocated to Class B and C than are necessary, it will lead to higher transmit power than is needed, and consequently higher interference to other users, which will decrease the cell capacity. On the other hand, if the actual coding gain is higher than predicted, which means less resources are allocated to Class B and C than are necessary, it will lead to lower transmit power than is required, and will thus impair the speech quality. A. Dual-CRC Scheme To solve the aforementioned problem, we propose a simple and practical scheme called Dual-CRC, as shown in Fig. 3. In this scheme, the LVA decoder in the base station (NodeB) produces the CRC results of both Viterbi and LVA decoding, which we call VA CRC and LVA CRC respectively. Because the Viterbi decoding process is a part of the LVA decoding process, it is easy for the LVA decoder to produce the VA CRC which is actually the CRC of the rst candidate. We conclude that if the VA CRC result is correct, the LVA CRC result must also be correct, but not vice versa. In Fig. 3, (a) is the existing scheme with Viterbi decoding for Class A, and (b) is the proposed dual-CRC scheme with LVA decoding for Class A. In the proposed scheme, through the signaling interface between NodeB and RNC, NodeB transmits both the VA CRC and the LVA CRC results to RNC. RNC performs outer-loop power control with the VA CRC as usual, and takes the LVA CRC result as the bad frame indicator (BFI) to the AMR speech codec instead of the VA CRC result.

WEI et al.: SPEECH QUALITY IMPROVEMENT BASED ON LIST VITERBI AND JOINT SOURCE-CHANNEL DECODING IN UMTS

547

AMR Speech Codec


BFI (Bad Frame Indicator) BFI (Bad Frame Indicator)

AMR Speech Codec

RNC

Power Control
VA CRC VA Dec Bits

Power Control
VA CRC LVA CRC LVA Dec Bits

be set to be according to the decoded bit, e.g., bit 0 is mapped to + and bit 1 is mapped to . The detailed joint LVA and JSCD algorithm is as follows:
Viterbi Decoding Class C

NodeB

Viterbi Decoding CRC Class A

Viterbi Decoding Class B

Viterbi Decoding Class C

LVA Decoding CRC Class A

Viterbi Decoding Class B

(a)

(b)

Fig. 3. Dual-CRC scheme by using the LVA decoder. (a) The existing scheme with Viterbi decoding for Class A; (b) The proposed dual-CRC scheme with LVA decoding for Class A.
AMR Speech Codec

RNC

Power Control

BFI (Bad Frame Indicator)

Decoded bits

NodeB

VA CRC

JSCD CRC

PEC

JSCD
Feedback Values LVA CRC LVA Dec Bits Soft-Vales (LLRs) VA Dec Bits VA Dec Bits

LVA Decoding

Soft-Value Estimation

Viterbi Decoding

Viterbi Decoding

Class A

Class B

Class C

Algorithm 1 The joint LVA and JSCD algorithm 1: Perform LVA decoding for Class A and Viterbi decoding for Class B and C. Send out the CRC results of both Viterbi and LVA decoding for Class A bits. 2: Generate LLRs of the LVA decoded bits. If the LVA CRC is correct, set the LLRs to be in the manner described above, then go to Step 5. Otherwise, go to Step 3. 3: Perform JSCD decoding once, and output the modied decoded bits, the LLRs and the extrinsic information to be fed back to the channel decoder. The iteration number is increased by one. 4: If the iteration number is equal to the maximum allowed, perform parameter-level error concealment (PEC) as proposed in [14], and send the BFI to indicate a frame which cannot be recovered by PEC. The AMR speech codec does not need to perform frame-level error concealment for the modied frames. If the iteration number is smaller than the maximum allowed, go to Step 1. 5: Set the iteration number to zero, and send the decoding results to the AMR speech codec. The issue of misdetection of the CRC should be taken into consideration, since an undetected bad frame of Class A may result in noise. With a 12-bit CRC, the undetected frame error rate (UER) by Viterbi decoding with block error rate of Pbler is approximately Pbler /212 , and the UER of LVA decoding is less than N times that of Viterbi decoding, where N is the number of list candidates of the LVA decoder. When performing the ISCD with one iteration based on the LVA, the UER is approximately 2N Pbler /212 . Therefore, with the target BLER of 1% for AMR speech, the UER is approximately 1/51200 when jointly performing the PLVA4 and the ISCD with one iteration. This means that with a 20ms speech frame, a noisy frame would occur approximately once in every 17 minutes, which is acceptable in practice. IV. S IMULATION R ESULTS The simulation is performed based on a full UMTS uplink chain, with source coding performed by the AMR 12.2k speech coder. The Viterbi algorithm and the PLVA4 are utilized for convolutional decoding, and the ISCD, operating with either 0 or 1 iterations, is applied as the JSCD. If a speech frame fails the CRC, the AMR speech codec will discard this frame in the decoding process. A MOS estimator [15] is applied to evaluate the speech quality. The simulation results are shown as Fig. 5 - Fig. 8, for both additive white Gaussian noise (AWGN) and typical urban (TU) fading channels. The PLVA4 has about 0.3 to 0.5dB performance gain in terms of BLER over Viterbi decoding. With the proposed Dual-CRC scheme, more than 0.3 MOS gain can be obtained with the PLVA4 over Viterbi decoding in most signal-to-noise ratio (SNR) regions. From the simulation results, we can see that the ISCD has little impact on the BLER, while it improves the

CRC

Fig. 4.

The scheme by jointly utilizing the LVA and the JSCD.

The LVA decoding result of Class A and the Viterbi decoding results of Class B and C are sent to the AMR speech codec for source decoding. With the dual-CRC scheme, it does not need to modify the power control process or the target BLER, and it has no impact on any other system parameters such as the RM factor in the AMR speech frame. Meanwhile, as Class A is the most important in the AMR speech frame, the scheme can signicantly improve the speech quality and minimize the impact on the whole system. B. Jointly Utilizing the LVA and the JSCD Scheme The JSCD utilizes the SSI to recover the correlated parameters in the AMR speech frames, and the LVA enhances the channel coding performance. Considering the different principles of operation governing the JSCD and the LVA, we propose a scheme to further enhance the speech quality by joint utilization of the LVA and the JSCD. This scheme is shown in Fig. 4. After LVA decoding, we utilize the JSCD to recover the parameters of the AMR speech frame. Since the JSCD needs the soft value of each of the AMR speech bits, the LVA decoder is required to generate the log-likelihood ratio (LLR) for each decoded bit. However, it is usually not practical to generate all LLRs of every candidate in the list decoding due to hardware limitations. In our scheme, we only generate the LLRs when the LVA decoding is incorrect. If the LVA decoding result is incorrect, it is identical to the decoding result of Viterbi decoding, and the LLRs can be easily generated by a soft-in-soft-out decoder, such as the BCJR or the max-log-MAP decoder. If the LVA decoding result is correct, the LLR for each decoded bit can

548

IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 3, MARCH 2013

10

10 VA VA+ISCD,NumIteration = 1 PLVA4 PLVA4+ISCD,NumIteration = 0 PLVA4+ISCD,NumIteration = 1

VA VA+ISCD PLVA4 PLVA4+SBSD PLVA4+ISCD BLER of Class A bits 10


1

BLER of Class A bits

10

10

10

10

0.5

1.5 2 Eb/N0 (dB)

2.5

10

3.4

3.6

3.8

4.2 4.4 Eb/N0 (dB)

4.6

4.8

5.2

Fig. 5. BLER of VA, VA+JSCD, PLVA and PLVA+JSCD in AWGN channel.

Fig. 7.

BLER of VA, VA+JSCD, PLVA and PLVA+JSCD in TU channel.

3.5 Mean Opion Score (MOS) Mean Opion Score (MOS) VA VA+ISCD,NumIteration = 1 PLVA4 PLVA4+ISCD,NumIteration = 0 PLVA4+ISCD,NumIteration = 1 0.5 1 1.5 2 Eb/N0 (dB) 2.5 3

3.5

2.5

2.5

2 VA VA+ISCD,NumIteration = 1 PLVA4 PLVA4+ISCD,NumIteration = 0 PLVA4+ISCD,NumIteration = 1 3.4 3.6 3.8 4 4.2 4.4 Eb/N0 (dB) 4.6 4.8 5 5.2

1.5

1.5

Fig. 6. MOS of VA, VA+JSCD, PLVA and PLVA+JSCD in AWGN channel.

Fig. 8.

MOS of VA, VA+JSCD, PLVA and PLVA+JSCD in TU channel.

MOS signicantly. VA+ISCD(NumIteration=1) has 0.2 to 0.3 MOS gain over Viterbi decoding in most SNR regions, but the gain is a little smaller than that of the PLVA4. The PLVA4+ISCD(NumIteration=0) and the PLVA4+ISCD(NumIteration=1) have about 0.05 to 0.1 and 0.1 to 0.2 MOS gain over the PLVA4, respectively. V. C ONCLUSION Based on the characteristics of source and channel coding of AMR speech in UMTS, we propose a novel and practical decoding scheme to improve the speech quality, with little modication of the current system architecture. The speech quality can be signicantly improved by using the LVA decoder only, or through joint use of the JSCD and LVA decoders. R EFERENCES
[1] 3GPP TS26.090v10.0.0, Adaptive multi-rate (AMR) speech codec, transcoding functions, Apr. 2011. [2] H. Holma, J. Melero, J. Vainio, T. Halonen, and J. Makine, Performance of adaptive multirate (AMR) voice in GSM and WCDMA, in Proc. 2003 VTC Spring, pp. 21772181. [3] A. D. Subramaniam, W. R. Gardner, and B. D. Rao, Iterative joint source-channel decoding of speech spectrum parameters over an additive white Gaussian noise channel, IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 152162, Jan. 2006.

[4] T. Fingscheidt, T. Hindelang, R. V. Cox, and N. Seshadri, Joint sourcechannel (de-)coding for mobile communications, IEEE Trans. Commun., vol. 50, no. 2, pp. 200212, Feb. 2002. [5] T. Lundberg, P. de Bruin, S. Bruhn, S. Hakansson, and S. Craig, Adaptive thresholds for AMR codec mode selection, in Proc. 2005 VTC Spring, pp. 21772181. [6] N. Seshadri and C. E. Sundberg, List Viterbi decoding algorithms with applications, IEEE Trans. Commun., vol. 42, no. 2/3/4, pp. 104120, Feb./Mar./Apr. 1994. [7] 3GPP TS34.108v11.1.0, Reference radio bearer congurations used in radio bearer interoperability testing, Mar. 2012. [8] J. Hagenauer, Source-controlled channel decoding, IEEE Trans. Commun., vol. 43, no. 9, pp. 24492457, Sept. 1995. [9] T. Fingscheidt and P. Vary, Softbit speech decoding: a new approach to error concealment, IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 240251, Mar. 2001. [10] N. G ortz, On the iterative approximation of optimal joint sourcechannel decoding, IEEE J. Sel. Areas Commun., vol. 19, pp. 16621670, Sept. 2001. [11] R. Perkert, M. Kaindl, and T. Hindelang, Iterative source and channel decoding for GSM, in Proc. 2001 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 26492652. [12] M. Adrat, U. V. Agris, and P. Vary, Convergence behavior of iterative souce-channel decoding, in Proc. 2003 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 269272. [13] 3GPP TS25.212v10.0.0, Chanel coding and multiplexing, Oct. 2010. [14] T. Breddermann, S. Iwelski, and P. Vary, Bad parameter indication for error concealment in wireless multimedia communication, in Proc. 2010 VTC Fall, pp. 15. [15] ITU-T, P.862.1, Mapping function for transforming P.862 raw result scores to MOS-LQO, Nov. 2003.

You might also like