You are on page 1of 5

AMR Call Quality Measurement Based on

ITU-T P.862.1 PESQ-LQO


Ming-Ju Ho and Ayman Mostafa
Cingular Wireless
5565 Glenridge Connector
Atlanta, GA, 30342

Abstract- Adaptive Multi-Rate (AMR) codecs are used popularly measurement samples under low C/I conditions from field
in most GSM and W-CDMA networks. This document describes measurements. A controlled lab test environment is necessary
the AMR call quality measurement results of PESQ-LQO (ITU-T to evaluate the relationships between MOS based on the ITU-
P.862-1) based Mean Opinion Score (MOS) vs. several key RF P862.1 PESQ-LQO algorithm and various RF parameters.
parameters under a controlled lab test environment. Frequency
The focus is to analyze the correlations between MOS, which
hopping and Rayleigh fading were included. The codecs that
were evaluated include Full Rate 12.2, 7.4, 5.9, 4.75 and Half Rate is close to customer perceived quality, and several key
7.4, 5.9, 4.75. The RF parameters studied here were C/I, performance parameters like C/I, FER, RXQUAL and FER.
RXQUAL and FER. MOS scores were collected via the tool The paper is organized as follows: Section II discusses the lab
based on the ITU P.862.1 PESQ-LQO algorithm. The Radio Link test setup and test scenarios. Section III presents the test
Timer (RLT) was set 64 and the Transmit Power Control (TPC) results of MOS vs. RXQUAL. Test results of MOS vs. FER
and Discontinuous Transmission (DTX) mechanisms were turned are discussed in Section IV. Section V addresses the
off for all fixed codec cases. relationship between the C/I and MOS. Finally, some
concluding remarks are drawn in Section VI.
I. INTRODUCTION
II. TEST SETUP
Speech Quality is a complex psycho-acoustic phenomenon
within the process of human perception. It is generally
expressed as a Mean Opinion Score (MOS) based on Figure 1 shows the actual lab test setup. Two GSM base
subjective listening tests identified in the ITU-T stations were used to support this test: one is configured as a
Recommendation P.800 [1], the average of many individual desired based station equipped with two radios (one for BCCH
opinions obtained from a number of listeners based on their and one for TCH); the other one is used to simulate
experiences and expectations regarding voice communication interference by transmitting BCCH at all hopping frequencies.
and it is one of the most important elements for Voice network The frequency hopping includes 5 frequencies spanned at 2
measurements. The ITU-T P.862.1 (PESQ-LQO) [2] provides MHz bandwidth.
a uniform 3rd order mapping function from raw P.862 PESQ ™ Desired base station:
(Perceptual Speech Quality Measure) algorithm score to the • BCCH: CH #670 (1961.8 MHz)
Listening Quality Objective (LQO) Mean Opinion Score. • TCH: 5 hoppers (sequential hopping): CH
Rapid development of network elements such as low-bit-rate #672, #674, #676, #678 and #680
speech codecs, compression circuits, voice activity detectors, • All BCCH TRX TCH time slots locked
comfort noise generators, adaptive level control, speech • All but one hopping TRX locked.
enhancer (i.e. echo cancellation, noise reduction) and other • In the unlocked hopping TRX, all but 1 time
network circuits all have an influence on voice quality. Due to slot locked.
the effects of one or more of the network elements mentioned • MA list including all the 5 frequencies used
above, a speech sample may suffer from a variety of by the 5 interfering BCCHs.
degradations including clipping (time, amplitude), ™ Interfering base station:
delay/latency/jitter (fixed, variable delay), frequency shift, • BCCH 1: CH #672
distortion, noise interference and channel errors. The voice • BCCH 2: CH #674
quality measurement is more challenging for systems • BCCH 3: CH #676
deploying Adaptive Multi-Rate (AMR) speech codecs • BCCH 4: CH #678
including Full Rate (FR) and Half-Rate (HR). The AMR
• BCCH 5: CH #680
codec concept is popular because of its adaptability to be
The MA list includes all the 5 frequencies used by the 5
tailored to the specific needs of network operators. The AMR
interfering BCCHs. The RLT was set at 64 to maintain the call
radio resource algorithm adjusts the codec dynamically to
connection as long as possible. Transmit Power Control (TPC)
extend coverage by operating at lower Carrier-to-Interference
and Discontinuous Transmission (DTX) functions were turned
ratio (C/I) with robust FR codec and to increase capacity by
off for all fixed codec cases. Only downlink MOS
operating HR when C/I is higher. Due to varying RF channel
measurement is performed here due to the complexity to
conditions, it is very difficult to collect sufficient speech

U.S. Government work not protected by U.S. Copyright


generate uplink interference. A 16-bit wave (WAV) file was
transmitted from the LCG (Local Call Generator) to the test
handset. MOS scoring was done per PESQ-862.1 algorithm. Table 1 Lab test equipment and components.
TU3 propagation model was used in this test with the Item Model/Version Quantity
following settings configured in the RF channel emulator: 30 dB attenuator BIRD 75-A-MFN-30 7
¾ Path 1, delay = 0 µs, loss = 3 dB Duplexer K&L WSD00032 2
¾ Path 2, delay = 0.2 µs, loss = 0 dB Circulator Ferrocom 20A100-41 2
Terminator Weinschel M1418 2
¾ Path 3, delay = 0.6 µs, loss = 2 dB RF channel emulator Spirent TAS 4500FLEX 1
¾ Path 4, delay = 1.6 µs, loss = 6 dB LNA Mini-circuit ZHL- 1
¾ Path 5, delay = 2.4 µs, loss = 8 dB 1724HLN
2-way splitter/combiner Mini-circuit ZAPD-2 1
¾ Path 6, delay = 5 µs, loss = 10 dB
8-way splitter/combiner Mini-circuit 1
¾ Fading: Rayleigh, classic6 Shielded box Rohde & Schwarz 1
¾ No Log-normal shadowing CMU-Z11
The scope of this work is focusing in the following popular Spectrum Analyzer Agilent 8594E 2
codecs. Handset Nokia 6230 1
¾ Full Rate 12.2 SoundBlaster card Creative 1
¾ Full Rate 7.4 XCAL-W 2.40.00 S/W: 2.40.00 (1013) 1
¾ Full Rate 5.9 LCG (Land Call LCG-H10, 1
¾ Full Rate 4.75 Generator) S/W:2.35.009
¾ Half Rate 7.4 GSM BTS Nokia UltraSite 2
¾ Half Rate 5.9
¾ Half Rate 4.75 III MOS vs. RXQUAL
Two spectrum analyzers were used to monitor C and I signals
from direction coupler ports. C/I was adjusted by changing the There is no direct relationship between the MOS and Received
output attenuation from the RF channel emulator. In each test Signal Quality (RXQUAL):
scenario, average C/I was adjusted from 20 dB down to 1 dB (1) RXQUAL is a coarse Bit Error Rate (BER)
or till the call was continuously dropped. The adjustment is indication over a 480 ms SACCH multiframe.
followed the below sequence: 20-17-14-12-10-9-8-7-6-5-4-3- (2) The BER is evaluated before channel decoding.
2-1. The MOS scoring sample was available for every 6 to 8 However, RXQUAL is still a very popular quality parameter
seconds, RXQUAL, FER and C/I outputs were available every used by the network operators because it is likely available
half second (per SACCH message). In each test case more and accuracy are fulfilled between different OEM vendors
than 1000 MOS samples were collected for about three hours even standard does not clearly specify how BER is estimated.
of test time. Due to the reporting of MOS (every 6 to 8 seconds) and
RXQUAL sample (every half second) is independent and not
Spectrum
Analyzer
synchronized, there are possible 12 to 16 RXQUAL samples
for each MOS scoring sample. In Figure 3, each circle data
Shielded Box
Circulator point represents a MOS sample and its associated RXQUAL
LNA
PESQ-LQO
Test Handset
2W splitter/
combiner
TAS 4500
Channel emulator
Circulator
average. The RXQUAL average was done in linear scale
laptop
where, for example, 1 RXQUAL_0 and 1 RXQUL_1 yields
Spectrum
average RXQUAL = 0.5. All data samples then are sorted into
Analyzer

eight RXQUAL bins based on the following rules.


Local Call
Generator
- RXQUAL 0: avg. RXQUAL < 0.5
(LCG)
- RXQUAL 1: 0.5 <= avg. RXQUAL < 1.5
GSM BTS
- RXQUAL 2: 1.5 <= avg. RXQUAL < 2.5
MSC/BSC
BCCH (f1) 30 dB Attenuator
2W splitter/
combiner
Duplexer - RXQUAL 3: 2.5 <= avg. RXQUAL < 3.5
TCH (f2-f7 hopping) 30 dB Attenuator

PSTN - RXQUAL 4: 3.5 <= avg. RXQUAL < 4.5


GSM BTS - RXQUAL 5: 4.5 <= avg. RXQUAL < 5.5
BCCH (f2) 30 dB Attenuator - RXQUAL 6: 5.5 <= avg. RXQUAL < 6.5
BCCH (f3)

BCCH (f4)
30 dB Attenuator

30 dB Attenuator
8W splitter/
Duplexer
- RXQUAL 7: avg. RXQUAL >= 6.5
combiner

BCCH (f5) 30 dB Attenuator

BCCH (f6) 30 dB Attenuator

Figure 1 Lab test setup.


FR 7.4 Standard Deviation
4.5
0.9
4 0.8
0.7 FR 4.75
3.5
FR 5.9
0.6
FR 7.4

MOS
3 0.5
MOS

FR 12.2
0.4
2.5 HR 4.75
0.3
HR 5.9
2 0.2 HR 7.4
0.1
1.5
0.0
1
RQ_0 RXQUAL 1 RXQUAL 2 RXQUAL 3 RXQUAL 4 RXQUAL 5 RXQUAL 6 RQ_7 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 RXQUAL
RXQUAL
Figure 5 Standard deviation for MOS. Vs. RxQual
Figure 3 An example of binning data samples into RXQUAL bins.

IV MOS vs. FER


Figure 4 and 5 plot the average and standard deviation MOS
value for each RXQUAL bin for different codec. Observed Frame erasure rate (FER) is measured after CRC and is more
from Figure 4 there is about 0.5 MOS scale difference close to speech quality. When a speech frame erasure happens,
between the full rate and the half rate codecs at RXQUAL_7.
the result can be anything from brief audible pops and clicks
The average MOS is less than 3.0 when the RXQUAL is 4 or
to completely unintelligible intervals of noise or even a call
higher for all three half rate codec tested here. The standard
being dropped. However, downlink FER information is
deviation values shown in Figure 5 indicate the magnitude of usually not available unless Enhanced Measurement Report
data spreading from the average (mean) values shown in (EMR) capable handsets are deployed. Unlike RXQUAL that
Figure 4. The points at RXQUAL_0 should be ignored was already defined by the standard [3] into 8 categories
because not sufficient data points (the objective of the test is to (RXQUAL_0 to RXQUAL_7), there is no FER bin definition
focus on when the quality degrades therefore much less time or requirement. We can derive trend lines from the data points
was spent on the low RXQUAL categories. For full rate directly. Figure 6 shows a FR 4.75 codec test case data points.
codecs except 12.2, the standard deviation does not increase Each data point represents a MOS scoring and associated
till the RXQUAL value increases at 5 or higher. For less average FER for all FER samples. Figure 6 also illustrates the
robust half rate codecs, the standard deviation increases
methodology for evaluating the relationship between the MOS
significantly at RXQUAL_4 then decreases as RXQUAL
and FER.
reaches 6 or 7 because the MOS score is already very bad.
• It is noted that a single trend line is very difficult to
fit the data points if the range is very big. A multi-
avg MOS vs. RXQUAL
(TU3, 5 hoppers)
curve fitting are used.
4.5 • The trend line # 1 will fit data points whose FER is
less or equal to 10% and MOS is larger or equal to
4
2.0.
Avg. MOS (PESQ P862.1)

3.5 • The trend line # 2 will fit data points whose FER is
3
larger than 10% and MOS is smaller or equal to 2.4.
FR 12.2
FR 7.4
• Other data points did not fall into the above two
2.5
FR 5.9 regions are considered as “irregularities” and will be
2
FR 4.75 discarded.
HR 7.4
Table 2 and 3 list the coefficients of determination of different
1.5 HR 5.9
HR 4.75 types of trend lines. The coefficient of determination r is a
1 measure of the correlation between the dependent and
0 1 2 3 4 5 6 7 independent variables in a regression analysis. It gives the
RXQUAL proportion of the variance (fluctuation) of one variable that is
Figure 4 RXQUAL vs. average MOS. predictable from the other variable, 0 < r 2 < 1. For example, if
r = 0.922, then r 2 = 0.850, which means that 85% of the total
variation in y can be explained by the linear relationship
between x and y (as described by the regression equation).
The other 15% of the total variation in y remains unexplained.
Generally, the higher order of polynomial provides better
fitting. However, the equation is more complicated and
sometimes the improvement is very limited. From Table 2, we
believe polynomial trend lines with order of 3 provide 4.2
sufficient correlation for data points whose FER is ” 10% and 4
FR 12.2
MOS is • 2.0. From Table 3, we choose power trend lines for FR 7.4
3.8
data points whose FER is • 10% and MOS is ” 2.4. The HR 7.4
equations of the trend lines are listed in Table 4. The relation 3.6 FR 5.9
HR 5.9
between MOS vs. FER based on the trend lines are plotted in 3.4 FR 4.75
Figures 7 and 8.

MOS
HR 4.75
3.2

FR 4.75 3

3.8 2.8
3.6 Trendline 1
2.6
3.4
3.2 2.4
3
2.2
2.8
Discarded Area 0 1 2 3 4 5 6 7 8 9 10
2.6
MOS

FER(%)
2.4
2.2 Trendline 2
Figure 7 MOS vs. FER for FER ” 10% based on the polynomial trend lines
2 with order of 3.
1.8
1.6
2.5
1.4
Discarded Area
1.2
2.3 FR 12.2
0 5 10 15 20 25 30
FR 7.4
FER (%) HR 7.4
2.1 FR 5.9
Figure 6 MOS vs. FER for FR 4.75 codec. HR 5.9
1.9 FR 4.75
Table 2 Coefficients of determination for different types of trend lines to fit the data points whose MOS HR 4.75
FER is ” 10% and MOS is • 2.0.
poly-2 poly-3 poly-4 poly-5 poly-6 exp linear 1.7
FR 12.2 0.77 0.78 0.78 0.78 0.78 0.73 0.74
FR 7.4 0.80 0.80 0.81 0.81 0.81 0.78 0.78 1.5
HR 7.4 0.87 0.87 0.87 0.87 0.87 0.85 0.85
FR 5.9 0.78 0.79 0.79 0.79 0.79 0.73 0.74
HR 5.9 0.85 0.86 0.86 0.86 0.86 0.83 0.82 1.3
FR 4.75 0.75 0.75 0.75 0.75 0.75 0.71 0.71
HR 4/75 0.81 0.81 0.81 0.81 0.81 0.76 0.76
1.1
10 15 20 25 30 35 40 45 50
Table 3 Coefficients of determination for different types of trend lines to fit the data points whose FER(%)
FER is • 10% and MOS is ” 2.4.
poly-2 poly-3 poly-4 poly-5 poly-6 exp linear power
Figure 8 MOS vs. FER for FER > 10% based on the power trend lines.
FR 12.2 0.77 0.78 0.78 0.78 0.79 0.75 0.69 0.80
FR 7.4 0.52 0.52 0.52 0.52 0.52 0.54 0.51 0.54
HR 7.4 0.69 0.70 0.70 0.70 0.70 0.65 0.61 0.71
FR 5.9 0.44 0.44 0.45 0.45 0.45 0.44 0.42 0.46 V. MOS vs. C/I
HR 5.9 0.75 0.76 0.76 0.76 0.76 0.72 0.70 0.76
FR 4.75 0.15 0.16 0.16 0.16 0.16 0.16 0.15 0.15
HR 4/75 0.62 0.62 0.63 0.63 0.63 0.63 0.57 0.67 The relationship between the MOS and C/I is very useful to
help the selection of appropriate Link Adaptation thresholds
Table 4 Equations of the trend lines. which is implemented in most live networks to provide better
x: FER (%), y: MOS
Codec x ” 10 x > 10
performance and capacity. In Reference [4], C/I vs. MOS
FR 12.2 y = -0.0009x + 0.0265x2 - 0.3474x + 4.2007
3
y = 5.3431x-0.3812 were measured but are in clean speech and error conditions.
FR 7.4 y = -0.0013x3 + 0.0291x2 - 0.3301x + 3.9861 y = 4.7186x-0.3464
HR 7.4 y = -0.0006x3 + 0.018x2 - 0.2904x + 3.9909 y = 5.1599x-0.3692
The clean speech performance requirements were set for the
FR 5.9 y = -0.0015x3 + 0.0364x2 - 0.3556x + 3.7615 y = 4.4535x-0.3406 best codec mode in each error condition. In Figure 8, all C/I
HR 5.9 y = -0.0011x3 + 0.0293x2 - 0.3198x + 3.7103 y = 5.0286x-0.3716
FR 4.75 y = -0.0009x3 + 0.0262x2 - 0.2885x + 3.4971 y = 3.1844x-0.2225 measurement samples were sorted into 0.1 scale MOS bins
HR 4.75 y = -0.0004x3 + 0.0186x2 - 0.2663x + 3.4638 y = 4.6561x-0.3504 and polynomial trend lines with order of 4 were plotted. The
following LA thresholds were derived for the following codec
set implementation: {FR 12,2 FR 7.4, FR 5.9, FR 4.75; HR
7.4, HR 5.9, HR 4.75}.
• Full Rate Threshold 1 = 9 dB
• Full Rate Threshold 2 = 12 dB
• Full Rate Threshold 3 = 15 dB
• Half Rate Threshold 1 = 15 dB
• Half Rate Threshold 1 = 16 dB
avg CIR vs. MOS
ACKNOWLEDGMENT
4.5
FR Th1 FR Th2 FR Th3 The authors would like to thank Vincent Cordaro, Kaushik
Gohel, Dipesh Shah of Cingular Wireless and Emilio Diarte of
4
Nokia for their comments and continued support throughout
3.5 this endeavor.
MOS (PESQ P862.1)

3 FR 12.2
FR 7.4
2.5
FR 5.9
REFERENCES
2 FR 4.75 [1] ITU-T Recommendation P.800 : Methods and procedures for conducting
HR 7.4 subjective evaluation of transmission quality
HR Th2
1.5 HR 5.9

HR Th1 HR 4.75 [2] ITU-T Recommendation P. 862.1 (11/2003): Mapping function for
1 transforming of P.862 raw result scores to MOS-LQO.
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
C/I (dB) [3] ETS 300 578 (GSM 05.08), “Radio subsystem link control,” ETSI
Figure 8 MOS vs. average C/I (trend lines: polynomial order of 4. recommendation, 1997.

VI. Conclusions [4] 3GPP TR 26.975 V6.0.0 (2004-12); Performance characterization of the
Adaptive Multi-Rate (AMR) speech codec (Release 6).
The relationship between ITU-T P.862.1-based MOS AMR
call quality and several important RF parameters like C/I, FER
and RXQUAL were evaluated via controlled lab
measurements. Frequency hopping and Rayleigh fading
conditions were included. The information can be used to
build a quality metric for evaluating GSM AMR network
performance.
(1) The average MOS is less than 3.0 for different
RXQUAL in different codec:
- RXQUAL is 4 or higher for half rate codecs
7.4, 5.9 , 4.75
- RXQUAL is 5 or higher for full rate 12.2
- RXQUAL is 6 or higher for full rate codecs
7.4, 5.9 , 4.75
(2) Two trend lines were derived to represent the
relationship between MOS and FER in two different
FER regions. For the more interesting area: FER is
less or equal to 10% and MOS is larger or equal to
2.0, the confidence level can be around 80% or
higher.
(3) Based on the relationship between the MOS and C/I.
appropriate link adaptation thresholds can be
selected.

You might also like