Professional Documents
Culture Documents
The Performance of the ITU-T P.862.1 Standard (PESQ-LQO) on AMR Live Networks
2005-11-07
Ascom 2009. All rights reserved. TEMS is a trademark of Ascom. All other trademarks are the property of their respective holders. No part of this document may be reproduced in any form without the written permission of the copyright holder. The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ascom shall have no liability for any error or damage of any kind resulting from the use of this document.
2 (12)
Rev A
Ascom 2009
Open
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQ-LQO) on AMR Live Networks Technical Paper
Contents
Abstract ...............................................................................................4 1. Myth and reality about the AMR codecs expected speech quality ..................................................................................................4 2. Evaluation test of the ITU-T speech quality standard in AMR live network conditions ....................................................................7
2.1. 2.2. 2.3. 2.4. 2.5. Test design .........................................................................................7 Evaluation procedure ..........................................................................8 Results of the PESQ algorithm performance on AMR live networks 10 Comments on the performance and the accuracy of the PESQ algorithm as implemented in TEMS Automatic.................................................11 Comments on the AMR PESQ tuned solution ..................................11
3.
Conclusions ............................................................................12
Ascom 2009
Open
Rev A
3 (12)
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
Abstract
A comprehensive test has been performed in order to evaluate the PESQ performance on AMR live networks. The test was necessitated by the fact that the speech quality standard has not been tested and validated on AMR live networks yet.
1.
Myth and reality about the AMR codecs expected speech quality
The operability of 3G networks is complex. It has been demonstrated that some estimated performance levels described in 3GPP documents are only met with great difficulty or at lower values than expected. The functionality of the AMR codec within a live UMTS network might be one of these cases. Designed to ensure high capacity, both the AMR FR and more importantly the AMR-HR codecs are expected to perform higher speech quality than the EFR codec, especially at low C/I values (below 6-8dB). The 3GPP document [1] provides some informative speech quality values for both the AMRFR and the HR codecs. These values have been obtained based on a single listening test on different simulated RF scenarios. The results presented in the 3GPP document are shown in Charts 1 and 2.
4 (12)
Rev A
Ascom 2009
Open
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
M OS 5.0
4.0
3.0 EFR 12.2 10.2 7.95 7.4 6.7 5.9 5.15 4.75 No Errors 4.01 4.01 4.06 3.91 3.83 3.77 3.72 3.50 3.50 4.06 C/I=16 dB C/I=13 dB 4.01 4.13 3.96 4.01 3.94 C/I=10 dB 3.65 3.93 4.05 4.08 3.98 3.80 C/I= 7 dB 3.05 3.44 3.80 3.96 3.84 3.86 3.69 3.58 3.52
2.0
Conditions C/I= 4 dB 1.53 1.46 2.04 3.26 3.11 3.29 3.59 3.44 3.43 1.43 1.39 1.87 2.20 2.43 2.66 C/I= 1 dB
1.0 EFR 12.2 10.2 7.95 7.4 6.7 5.9 5.15 4.75
Chart 1. Family of curves for Experiment 1a (Clean speech in Full Rate) (3GPP TR 26.975)
Ascom 2009
Open
Rev A
5 (12)
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
M OS 5.0
4.0
3.0
2.0
EFR 7.95 7.4 6.7 5.9 5.15 4.75 FR HR Conditions No Errors 4.21 4.11 3.93 3.94 3.68 3.70 3.59 3.50 3.35 4.04 3.93 3.96 3.95 3.90 3.82 3.60 3.46 C/I=19 dB C/I=16 dB C/I=13 dB 4.21 3.37 3.52 3.53 3.72 3.60 3.42 3.50 C/I=10 dB 3.74 2.53 2.74 3.10 3.19 3.38 3.30 3.14 3.24 C/I= 7 dB 3.34 1.60 1.78 2.22 2.57 2.85 3.10 2.74 2.80 1.21 1.33 1.84 2.00 1.50 1.92 C/I= 4 dB 1.58
Chart 2. Family of curves for Experiment 1b (Clean Speech in Half Rate) (3GPP TR 26.975) It should be noted that the 3GPP document states that the subjective test is valid only on the used database and that another database could exhibit different results. This statement becomes important especially if the databases represent live network conditions instead of simulated conditions. Important Note: MOS values are provided in these figures for information only. Mean Opinion Scores can only be representative of the test conditions in which they were recorded (speech material, speech processing, listening conditions, language, and cultural background of the listening subjects, etc.). Listening tests performed with other conditions than those used in the AMR Characterization phase of testing could lead to a different set of MOS results. On the other hand, the relative performances of different codec under test conditions is considered more reliable and less impacted by cultural difference between listening subjects. Finally, it should be noted that a difference of 0.2 MOS between two test results was usually found not statistically significant (3GPP TR26.975). The ITU-T standard P.862/P.862.1 (PESQ-LQO) has been developed, tested, validated, and calibrated for all types of applications (wireless, VoIP, fixed networks) using different speech codecs.
6 (12)
Rev A
Ascom 2009
Open
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
Since the standards P.862 and P.862.1 have been empowered by the ITU-T (in February 2001 and September 2003, respectively), various 3G networks (UMTS) using the AMR codec have been deployed and are currently running in different markets. PESQ-LQO has been thoroughly analyzed within AMR simulated conditions, such as all codec bit rates with different error patterns generated by a large scale of C/I values. However, none of the databases used for the standards development, training, testing, validation, or calibration contained live AMR network conditions. Our tests, however, have been performed on different live UMTS networks, in various markets. Using PESQ-LQO as the speech quality evaluation metric, the results showed slightly lower performance speech quality than the results presented in the 3GPP document [1]. This could have been caused by either the fact that the 3GPP document regards only simulated conditions, or by the fact that the PESQ algorithm exhibits limitations when the degraded conditions are characteristic to the AMR live networks. Since the tests on the UMTS networks speech quality have been performed with an objective speech quality metric, characterized by a defined accuracy ([2], [3]), which has not been evaluated on AMR live networks before, it was decided that a custom subjective test must be performed in parallel in order to evaluate the PESQ algorithms accuracy on these networks.
2.
Evaluation test of the ITU-T speech quality standard in AMR live network conditions
Ascom 2009
Open
Rev A
7 (12)
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
Test conditions Besides the clean AMR coding conditions and the MNRU conditions, uplink and downlink field recordings of the source speech running through AMR networks have been used. In order to get the complete behavior of the ITU-T speech quality standard with AMR live network conditions, speech samples in AMR HR, AMR FR, both 850Mhz and 1900Mhz bandwidths, have been collected. The Cingular/AT&T networks in and around Reston, VA have been used for the tests. Care has been taken in order to ensure that the whole speech quality range has been covered. With this in mind, data collection routes were used in areas ranging from very good RF coverage to very poor coverage. Within each network and link, 4 speech samples per talker have been collected per each 0.25MOS bin. Thus, approximately 80 speech samples have been collected for each network. Subjective data The subjective test has been performed by Dynastat Lab, an ITU-T certified lab with more than 20 years of experience in subjective speech quality evaluation. The lab performed an ACR test designed to feature four MOS panels that consisted of eight voters each. Therefore thirty-two voters scored each speech sample tested. This resulted in an average standard per individual MOS score of about 0.7MOS. Details regarding the test procedure are presented in the DYNASTAT report [4].
8 (12)
Rev A
Ascom 2009
Open
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
The Pearson correlation coefficient R (1) gives a basic measure for goodness of fit between objective and subjective scores, describing the extent to which data points are scattered with respect to the linear mapping y=ax+b.
r=
,
2
i =1N.
(1)
MOSi denotes the subjective score, and SQMi denotes the objective score for the sample i. The residual error (absolute error) (2) is calculated as the result of the application of the correlation line y=x to a measurement data set
Er = MOSi SQMi .
(2)
The residual error distribution expresses the percentage of the situations for which the algorithm exhibits errors with values within a certain MOS bin. The MOS bins are uniformly distributed on the 1 to 5 subjective scale and are 0.25MOS wide. An accurate algorithm should exhibit errors below 0.75MOS in at least 90% of the situations. The prediction error is given by (3) and it gives the average standard error of the objective estimator of the subjective opinion
EP =
(MOS
SQM i )
N 1
i = 1...N
(3)
N denotes the number of samples considered in the analysis. MOSi and SQMi represent the subjective and objective scores, respectively. Accuracy of the PESQ implementation in TEMS Automatic In order to also verify the PESQ algorithms implementation in TEMS Automatic, the PC version has been run on the same speech database and the obtained scores have been compared to the TEMS Automatic measurements. PESQ tuning on AMR live networks A tuning of the raw PESQ scores on the AMR live data has been performed and the improvements of the AMR tuned scores have been analyzed.
Ascom 2009
Open
Rev A
9 (12)
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
Table 1
Database AMR HR&FR (850Mhz&1900Mhz) live network and AMR clean coding conditions ITU-T expected performance for wireless live networks* Metric Correlation Prediction error Correlation Prediction error pesqTA >0.85 < 0.45 0.85 (lower limit of the 95% confidence interval) 0.45 (upper limit of the 95% confidence interval) pesqLQO (PC version) pesqLQO_AMR
* Values based on the performance of the P.862.1 on wireless live networks (see [2]).
Table 2
Database Metric <0.25 <0.5 MOS bins <0.75 <1 <1.25 <1.5 <1.75 <2
AMR HR&FR pesqTA ; (850Mhz&1900Mhz) pesqLQO live network and (PC AMR clean coding version) conditions ;pesqamr CDF(%)
>40
>80
>95
>98
100
100
100
100
ITU-T expected CDF(%)* 40.44 70.48 90.33 97.71 99.3 99.7 99.91 performance for wireless live networks* * Values based on the performance of the P.862.1 on wireless live networks (see [2]).
100
10 (12)
Rev A
Ascom 2009
Open
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
In order to evaluate the algorithms tendency within AMR live networks, the overand under- prediction error percentages have been determined based on the absolute residual error vales. The results are presented in the Table 3.
Table 3
Prediction percentage Under prediction Over prediction pesqTA 50.54% 48.66% pesqLQO 51.08% 47.58% pesqamr 51.34% 48.12%
2.4. Comments on the performance and the accuracy of the PESQ algorithm as implemented in TEMS Automatic
The PESQ implementation in TEMS Automatic (pesqTA) shows a prediction error on AMR live networks within the ITU-T expected 95% confidence limits. The residual error distribution exhibits higher CDF(%) values than the ITU-T expected distribution, indicating that the algorithm performance on AMR live networks stays close to its performance on other wireless live networks tested within the ITU-T. The analysis showed the same performance results for the PC version, which means that the TEMS Automatic implementation works properly, in accordance with the ITU-T implementation requirements. The analysis of the algorithms tendency, respectively the over- and underprediction percentages (Table 3), showed that the algorithm is well balanced. Therefore, it is expected that on average, PESQ implemented in TEMS Automatic will not exhibit speech quality values that are either too pessimistic or too optimistic.
Ascom 2009
Open
Rev A
11 (12)
2005-11-07 The Performance of the ITU-T P.862.1 Standard (PESQLQO) on AMR Live Networks Technical Paper
3.
Conclusions
A comprehensive test has been performed in order to evaluate the PESQ performance on AMR live networks. The evaluation showed that PESQ-TEMS Automatic performs within the ITU-Ts expected 95% confidence interval limits. In addition, it has been verified that the PESQ-TEMS Automatic matches the PESQPC version and it is meeting therefore the ITU-T implementation requirements. The PESQ algorithm exhibited very well balanced behavior, with equal over- and under-predicting percentages across the entire AMR live network database. A PESQ AMR tuned version has been created and tested. The results showed that the improvements are not significant. Any re-tuning of the standard is therefore not meaningful, especially since the test showed that the performance values lie within the ITU-T expected limits.
References
[1] 3GPP TR 26.975, Technical Specification Group Services and System Aspects; Performance characterization of the Adaptive Multi-Rate AMR speech codec, Release 1999. [2] ITU-T SG12, P.662.1, February 2003 [3] ITU-T SG12, P.862.2, October 2005 [4] Dynastat Lab, MOS Test Results for Ericsson, August 2005 [5]. I.Cotanis, , ITU-T SG12 white contribution, January 2003. [6]. Cingular, MOS Lab Test-RF Planning and Standard, report presented to TEMS, April 2005.
12 (12)
Rev A
Ascom 2009
Open