You are on page 1of 4

AN ADAPTIVE MULTI-RATE SPEECH CODER FOR

DIGITAL CELLULAR TELEPHONY

Erdal Paksoy, Juan Carlos De Martin†, Alan McCree, Christian G. Gerlach,


Anand Anandakumar, Wai-Ming Lai and Vishu Viswanathan

DSP Solutions R&D Center, Texas Instruments, Dallas, TX

ABSTRACT This concept is called adaptive multi-rate (AMR) coding and is


a form of network-controlled multimodal coding of speech [1].
We have developed an adaptive multi-rate (AMR) speech coder
The AMR concept is the centerpiece of ETSI’s GSM AMR
designed to operate under the GSM digital cellular full rate (22.8
standardization activity, which aims to define a new European
kb/s) and half rate (11.4 kb/s) channels and to maintain high qual-
cellular communication system designed to support an AMR
ity in the presence of highly varying background noise and chan-
mechanism in both the half rate and full rate channels. This paper
nel conditions. Within each total rate, several codec modes with
describes an AMR coder we have developed and submitted to the
different source/channel bit rate allocations are used. The speech
qualification phase of the GSM AMR competition. The source
coders in each codec mode are based on the CELP algorithm oper-
coder is based on Code-Excited Linear Prediction (CELP), and
ating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where
the channel coder is based on punctured convolutional codes. The
the lowest rate coder is a source controlled multi-modal speech
coder also includes a novel method for monitoring channel condi-
coder. The decoders monitor channel quality at both ends of the
tions and communicating the channel measurements and codec
wireless link using the soft values for the received bits and assist
mode commands between the base station and the mobile station.
the base station in selecting the codec mode that is appropriate for
a given channel condition. The coder was submitted to the GSM
AMR standardization competition and met the qualification 2. SYSTEM OVERVIEW
requirements in an independent formal MOS test. The coder is designed to operate in both the GSM full-rate
channel mode at a total bit-rate of 22.8 kb/s and the GSM half-
rate channel mode at 11.4 kb/s. In each channel mode, the coder
supports two codec modes. The available bits are allocated differ-
1. INTRODUCTION
ently among source coding, channel coding and signaling in each
In digital cellular communication systems, one of the major codec mode. Table 1 illustrates the allocation of bits in the four
challenges is that of designing a coder that is able to provide high codec modes.
quality speech throughout a wide variety of channel conditions.
Ideally, a good solution must provide the highest possible quality
in the clean channel conditions while maintaining good quality in Source Channel+ Total
Codec Mode
very heavily disturbed channels. Traditionally, digital cellular (kb/s) signaling (kb/s)
(kb/s)
applications use a single coding mode where a fixed source/chan- Half Rate Mode 0 7.45 3.95 11.4
nel bit allocation provides a compromise solution between clean
and degraded channel performance. Clearly, a solution which is Half Rate Mode 1 5.15 6.25 11.4
well suited for clean channels would use most of the available bits Full Rate Mode 0 11.85 10.95 22.8
for source coding with only minimal error protection, while a solu- Full Rate Mode 1 7.45 15.35 22.8
tion designed for poor channels would use a lower rate speech
coder protected with a large amount of forward error correction Table 1: Rate allocation for codec modes
(FEC).
One way to obtain good performance across a wide range of 3. SPEECH CODING
conditions is to allow the network to monitor the state of the com-
munication channel and direct the coders to adjust the allocation of Each codec mode uses a CELP coder based on the one we
bits between source and channel coding accordingly. This can be developed for the GSM enhanced full rate standardization activity
implemented via an adaptation algorithm whereby the network in 1995 [2], and many features are the same across codec modes.
selects one of a number of available speech coders, called codec The frame size for all source coders is 20 ms with lookahead of 5
modes, each with a predetermined source/channel bit allocation. ms for LPC analysis. The LPC parameters are coded once per
frame in the Line Spectral Frequency (LSF) domain using a 4-
stage, 26-bit multi-stage vector quantizer (MSVQ) which is
† Juan Carlos De Martin is currently with CENS-CNR at the Poly- searched using an M-Best search algorithm. A perceptual weight-
technic of Turin, Italy. E-mail: demartin@polito.it. ing function is used to reflect the importance of the Bark scale for
LSF quantization [3]. The remaining parameters are updated once ment described in Section 5. The class 0 bits are encoded with the
per subframe. All source coders have four 5 ms subframes, with highest bit rate (punctured) convolutional code and are also pro-
the exception of Half Rate Mode 1 where there are two subframes tected by the 7-bit CRC, which acts as a parity check. When the
of 10 ms each. In all codec modes, the pitch lag is coded using a CRC signals a bad frame, all of the previous frame’s parameters
delta-search, adaptive codebook search algorithm where the first are repeated and muted, with the exception of the fixed excitation
pitch lag in each frame is coded using 8 bits and the remaining indices which are still decoded from the bit stream. The class 1
lags are coded differentially with respect to the previous lag with 5 and 2 bits are coded with (punctured) convolutional codes with
bits each. The fixed excitation is obtained from a sparse ternary lower bit rates.
codebook searched using an M-best algorithm, and the pulse loca-
tions and signs are encoded and transmitted. The fixed and adap- 5. SIGNALING AND LINK ADAPTATION
tive excitation gains are jointly vector quantized with a 7-bit
codebook, where the fixed excitation gain component is coded dif- An overview of the AMR coding system, including both
ferentially with respect to a predicted gain estimated from previ- mobile station and base station, is shown in Figure 1. In general,
ous gain values. adaptation depends on the current state of the communication
Full Rate Mode 1 and Half Rate Mode 0 use an identical channel. Since channel estimation is done at the decoder, the
CELP coder at a bit rate of 7.45 kb/s. Full Rate Mode 0 is a similar receiver needs to signal to the encoder through the reverse link
coder, with a higher rate used for fixed excitation coding resulting some information needed for mode selection. The rate control
in a rate of 11.85 kb/s. mechanism varies depending on the direction of transmission, due
Half Rate Mode 1 operates at a bit rate of 5.15 kb/s. It uses to a constraint that the codec mode control mechanism must be
a source-controlled multimodal CELP coder where each input located in the base station.
speech frame is classified into one of two source coding modes
based on a voiced/unvoiced decision. The voiced mode is coded in 5.1. Channel Analysis and Mode Selection
the same way as in the other codec modes. In the unvoiced mode, The adaptation algorithm is based on the channel measure-
no adaptive codebook is used since unvoiced signals do not con- ment which is an estimate of the carrier to interference ratio (C/I).
tain a periodic component. The fixed excitation is encoded with a This estimate is based on the soft-values for the received bits as
stochastic codebook, using gain-matched analysis-by-synthesis provided by the demodulator/equalizer. These values are good
[4]. The fixed excitation gain is coded using the same codebook as indicators of the reliability of the bits. We have found that a mov-
the one used in the voiced mode. The mode information is not ing average of the absolute values of the soft bits is a good estima-
transmitted explicitly, but is signaled using a reserved value of the tor of the current C/I of the channel. Codec mode decisions are
pitch lag of the first subframe in each frame. Since unvoiced made by comparing this moving average value to a predetermined
frames in Half Rate Mode 1 require fewer source coding bits than threshold, and by using additional hysteresis rules designed to
voiced frames, the excess bits are reserved for future use. Table 2 ensure smoother codec mode transitions. Because of their differ-
illustrates the source bit allocation in all four codec modes. ent characteristics, the full rate and half rate channels require the
various parameters of the adaptation mechanism to be tuned sepa-
Half Rate 1 rately.
Half Full Full
Parameter Rate 0 voiced unvoiced Rate 0 Rate 1
5.2. In-Band Signaling
LPC 26 26 26 26 26
The signaling of all information needed for codec mode
Pitch Lags 23 13 8 23 23
adaptation is done in-band, using some of the bits normally avail-
Fixed Excitation 72 50 24 160 72 able for source and channel coding. Adaptation requires the trans-
Gains 28 14 14 28 28 mission of two different kinds of information: a codec mode
Total bits/frame 149 103 72 237 149 command sent from base station to mobile via the downlink chan-
nel, and channel measurement information sent from mobile to
Rate (bits/s) 7450 5150 3600 11850 7450 base station via the uplink channel.
Table 2: Bit Allocations For uplink transmission, the base station monitors the chan-
nel condition and decides which mode the mobile station should
use. The base station communicates this information in the form
4. CHANNEL CODING AND ERROR
of a codec mode command, transmitted in the downlink. Upon
CONCEALMENT reception, the mobile station encoder switches to the indicated
The channel coding for each codec mode uses rate-compat- mode.
ible punctured convolutional codes, as well as a CRC protecting The objective of our codec mode transmission scheme is to
the most important bits in each frame. In each codec mode, the send the mode information accurately and frequently enough to
source bits are divided into two or three classes, numbered 0, 1, make the adaptation mechanism work effectively, but using as few
and 2 in order of decreasing perceptual importance. Bits in class 0 bits as possible to minimize overhead. We have chosen to send the
include the first two stages of the LSF MSVQ, the most significant codec mode command by means of a variable-length code, using
bits of the pitch lags and the codebook gains, as well as the in- one information bit per frame. This variable length code is shown
band signaling of the codec mode command and channel measure- in Table 3. This table applies to full rate and half rate modes sepa-
MOBILE BASE
Speech In Encoded bits Soft bits Speech Out
Speech/Channel Speech/Channel
UP-LINK CHANNEL
Encoding Decoding

Down-link Channel Measurement


Down-link Channel Measurement Up-link
Channel Analysis
Up-link Codec Mode

Down-link Up-link
Channel Analysis Mode Selection

Down-link
Mode Selection
Up-link Codec Mode
Down-link Codec Mode

Speech/Channel Speech/Channel
DOWN-LINK CHANNEL
Decoding Soft bits Encoded bits Encoding
Speech Out Speech In

Figure 1. Overview of AMR Coding Scheme

rately. Since this signaling bit is important for reliable operation, it this problem by explicit transmission of the codec mode index as
is included in the class 0 bits of the channel coding. Notice that in a header to the channel bitstream for each frame. There are only
addition to the two AMR modes, the codec mode command can two codec modes in each channel mode, so only one bit is
also signal switching to any number of extended modes, which required to transmit this index. Since this bit is not protected by
include the existing GSM standards, as well as future options such channel coding, a 3-bit repetition code is used to provide robust-
as wideband coding. ness to bit errors.
To handle the extended modes shown in Table 3, a “codec
mode beacon” is also sent with each frame, both up- and down-
AMR 0 0
link. This beacon uses a variable length code to signal the mode
AMR 1 10 used to code the current frame, including extended modes. The
AMR Wideband 0 110 beacon is also sent using one channel bit per frame, and the vari-
AMR Wideband 1 1110 able-length code is the same as the one used to code the codec
mode command. Since this bit goes into the channel unprotected,
GSM FR 11110 the decoder must wait for multiple frames of new beacon mode
GSM EFR 111110 information before switching to a different extended codec mode.
GSM HR 1111110
6. LISTENING TEST RESULTS
Table 3: Variable Length Codec Mode Command
The coder was extensively tested in accordance with the
For downlink transmission, based on the received bits and GSM AMR qualification test plan by an independent laboratory.
possibly other information that may be available, the mobile sta- Both full rate and half rate coders were tested in four experiments.
tion computes a downlink channel measurement which is repre- All tests were done using the Mean Opinion Score (MOS) rating
sentative of the state of the channel. The mobile station cannot scale, except for the background noise tests which were scored on
autonomously decide which mode to use. Hence, this measurement the degradation MOS (DMOS) rating scale. In these experiments,
is quantized and transmitted back on the uplink to the base station. a distinction was made between static and dynamic error condi-
This is done in-band using one-bit delta modulation. The base sta- tions. In static tests, for each condition in the test, the C/I ratio of
tion then decides which codec mode it will use for the downlink the channel was held constant. Here, each codec mode of the
transmission of the next frame. AMR candidate was tested and the score for the AMR coder in a
given condition was taken to be the score of the best codec mode
5.3. Mode Information in that condition. In dynamic tests, realistic yet challenging com-
munication scenarios were simulated, resulting in error condi-
One problem in designing an AMR coder is that the channel tions where the C/I ratio varies drastically during a 50-second
decoder must know which mode has been used to encode a given time interval.
frame before it can successfully decode it. We have chosen to solve A subset of the static test results are summarized in Tables
4 and 5. All the scores in the tables are for flat, clean speech, Condition AMR FR EFR ∆MOS
except for the tandem condition where the input material was IRS
Dynamic EP 1 4.29 3.67 +0.62
filtered. In the full rate channel, it can be seen that the AMR can-
didate is essentially equivalent to the GSM Enhanced Full Rate Dynamic EP 2 4.21 3.73 +0.48
(EFR) in the clean channel and in tandem, but that it easily outper- Dynamic EP 3 3.86 3.01 +0.85
forms it in degraded channels, thanks to the large amount of chan- Dynamic EP 4 4.25 3.61 +0.64
nel protection available in Full Rate Mode 1. In fact, the AMR
candidate is equivalent to the 16 kb/s ITU G.728 standard for both Dynamic EP 5 4.16 2.75 +1.41
C/I=10 dB and C/I=7 dB. Table 6: Dynamic Conditions in Full Rate Channel

Condition AMR FR EFR G.728 Condition AMR HR GSM FR ∆MOS


No Errors 4.21 4.42 4.06 Dynamic EP 1 3.63 3.56 +0.07
C/I = 10 dB 4.19 3.79 - Dynamic EP 2 3.55 3.59 -0.04
C/I = 7 dB 3.94 3.35 - Dynamic EP 3 2.92 2.68 +0.24
C/I = 4 dB 3.48 1.81 - Dynamic EP 4 3.41 3.56 -0.15
Tandem 4.00 3.98 - Dynamic EP 5 2.94 2.78 +0.16
Table 4: Clean Speech in Full Rate Channel Table 7: Dynamic Conditions in Half Rate Channel

In the half rate channel, our AMR coder provides high qual- 7. CONCLUSIONS
ity for clean speech as demonstrated by the fact that it is statisti-
cally equivalent to G.728 for a single encoding and to G.729 in We have developed a complete AMR solution for both full-
tandem. In all three error conditions listed below, our AMR coder rate and half-rate GSM channels. Extensive formal testing has
still provides adequate performance as it is at least statistically shown that this coder is clearly superior to non-adaptive reference
equivalent to the GSM Full Rate coder. coders for realistic channel conditions and meets the GSM AMR
qualification requirements.

Condition AMR HR GSM FR G.728 G.729


8. REFERENCES
No Errors 4.08 - 4.27 -
[1] A. Gersho and E. Paksoy, “Variable Rate Speech Coding”, in
C/I = 10 dB 3.60 3.52 - -
Proceedings of the Seventh European Signal Processing
C/I = 7 dB 2.90 3.10 - - Conference, 1994, Edinburgh.
C/I = 4 dB 2.13 1.75 - - [2] W. LeBlanc, C. Liu, V. Viswanathan, “An Enhanced Full
Tandem 3.38 3.35 - 3.56 Rate Speech Coder for Digital Cellular Applications”, IEEE
International Conference on Acoustics, Speech and Signal
Table 5: Clean Speech in Half Rate Channel Processing, Volume 1, pp. 569-572, 1996, Atlanta.
[3] A. McCree and J. C. De Martin, “A 1.7 kb/s MELP Coder
Static tests were also performed for two different types of with Improved Analysis and Quantization”, IEEE
acoustic background noise, namely street and car noise, with flat International Conference on Acoustics, Speech and Signal
source material. These tests showed similar performance improve- Processing, Volume 2, pp. 593-596, 1998, Seattle.
ment for the AMR candidate as compared to the non-adaptive ref- [4] E. Paksoy, A. McCree and V. Viswanathan, “A Variable-Rate
erence coders. Multimodal Speech Coder With Gain-Matched Analysis-by-
The dynamic test results are tabulated below for the full rate Synthesis”, IEEE International Conference on Acoustics,
and half rate channels. Each row corresponds to one of five simu- Speech and Signal Processing, pp. 751-754, 1997, Munich.
lated channel scenarios. It can be seen that in the Full Rate channel
the AMR candidate significantly outperforms EFR for the same
channel, sometimes by as much as 1.4 on the MOS scale. This
clearly shows the advantage of dynamic adaptation in changing
channel conditions. The half rate AMR candidate, on the other
hand, is essentially equivalent to GSM FR at half the bit rate.