You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/247308365

Quality bounds for packetized voice transport

Article  in  Alcatel Telecommunications Review · January 2000

CITATIONS READS

21 25

4 authors, including:

Danny De Vleeschauwer G.H. Petit


Nokia Alcatel Lucent
116 PUBLICATIONS   1,459 CITATIONS    62 PUBLICATIONS   634 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

image interpretation: depth from stereo View project

Internet television View project

All content following this page was uploaded by Danny De Vleeschauwer on 03 October 2014.

The user has requested enhancement of the downloaded file.


Alcatel Telecommunications Review - 1 st Quarter 2000

D. De Vleeschauwer, J. Janssen, G. H. Petit, F. Poppe

Quality bounds for packetized


voice transport
> The transport of high quality packetized voice requires bounds
on the mouth-to-ear delay and distortion to be respected.

which can be used to predict sub- packet loss. The fifth term A is the
Introduction
jective user reactions, such as the expectation factor, which expresses
Mean Opinion Score (MOS) or the the decrease in the rating R that a
or traditional (wire-bound) percentage of users finding the qual- user is willing to tolerate because of
F Switched Telephone Network
(STN) calls, which do not suffer
ity Good or Better (GoB). The R scale
was defined so that impairments are
the “access advantage” that certain
systems have over traditional wire-
from distortion, the key factor that approximately additive in the R bound telephony. As an example, the
determines the quality is the mouth- range of interest. The rating R is expectation factor A for mobile tele-
to-ear delay, defined as the delay given by : phony (e.g. GSM) is 10.
incurred from the moment the talker ITU-T draft Recommendation
utters the words until the instant the R = R0 -Is -Id -Ie +A G.109 [7] states that a rating R in the
listener hears them. ITU-T Recom- ranges [90,100], [80,90], [70,80],
mendations G.114 [1] and G.131 [2] The first term R0 groups the effects [60,70], [50,60] corresponds to best,
report the mouth-to-ear delays that of noise, such as background noise high, medium, low and poor quality,
can be tolerated for undistorted and circuit noise. The second term Is respectively. A rating below 50
voice. The bounds on these delays includes impairments that occur indicates unacceptable quality.
depend on the level of echo disturb- simultaneously with the voice signal, Throughout this article, the classes
ing the voice call. such as those caused by quantiza- are color coded according to Table 1.
Voice calls can also tolerate some tion, by too loud a connection and by As far as quality is concerned, a
distortion, that is, the voice signal too loud a side tone. The third term packetized voice call introduces
heard by the listener does not need I d encompasses delayed impair- more delay and distortion than a tra-
to be an exact copy of the voice sig- ments, including impairments caused ditional STN call.
nal produced by the talker. In the by talker and listener echo or by a First, the delay for packetized
case of packetized voice calls, dis- loss of interactivity. The fourth term voice calls, where the most important
tortion may be introduced by the Ie covers impairments caused by contributions are encoding, packeti-
codec that compresses the voice sig- the use of special equipment; for zation, propagation, queuing, ser-
nal or by the loss of voice packets. example, each low bit rate codec has vice, dejittering and decoding delay,
Controlling both the mouth-to-ear an associated impairment value. This is larger than for a traditional circuit-
delay and distortion is the key to impairment term can also be used to switched voice call, where the mouth-
offering high quality packetized voice take into account the influence of to-ear delay is mainly made up of the
calls.

R-value 100 - 90 90 - 80 80 - 70 70 - 60 60 - 0
Range
E-model
Speech Transmission
Quality Category best high medium low (very) poor
The E-model [3,4,5,6] predicts the
subjective quality of a telephone call
based on its characterizing trans-
mission parameters. It combines the PSTN Quality
impairments caused by these trans-
mission parameters into a rating R, Table 1 – Quality classes

19
Quality bounds for packetized voice transport

Codec Voice Look Algorithmic Instrinsic


amount equal to the impairment Id
Origin Standard Type bit rate Ie
(kbit/s) Frame(ms) ahead(ms) Delay(ms) Quality associated with the mouth-to-ear
delay. This impairment is the sum of
G.711 PCM 64 0 94.3
three contributing impairments: talker
16 50 44.3
0.125 0 0.125 echo, listener echo and loss of inter-
G.726 24 25 69.3
ADPCM activity.
G.727 32 7 87.3 First, talker echo disturbs party 1,
ITU-T 40 0.125 0 0.125 2 92.3 who hears an attenuated and delayed
12.8 20 74.3 echo of his or her own voice. This
G.728 LD-CELP 0.625 0 0.625
16 7 87.3 echo is caused by a reflection close
G.729(A) CS-ACELP 8 10 5 15 10 84.3 to party 2. The level of this echo is
ACELP 5.3 19 75.3
strongly influenced by the echo loss
G.723.1 30 7.5 37.5 EL2 close to party 2 (measured
MP-MLQ 6.3 15 79.3
with respect to a certain reference
GSM-FR RPE-LTP 13 20 0 20 20 74.3
point) [5].
ETSI GSM-HR VSELP 5.6 20 0 20 23 71.3
Second, listener echo also dis-
GSM-EFR ACELP 12.2 20 0 20 5 89.3 turbs party 1, who hears the original
signal from party 2 followed by an
Table 2 – Major parameters of standard codecs attenuated echo of the signal. This
echo is determined by a reflection
propagation delay and switching loud, also impair the quality (via R0 close to party 1 with attenuation
delay. Most low bit rate codecs are and Is) of a packetized voice call, EL1, followed by a reflection close to
frame-based, that is, they encode a but as these factors are not funda- party 2 with attenuation EL2.
voice interval of a certain duration, mentally different from a traditional Echo may occur in the hybrid if
referred to as the voice frame, in a STN call they were not considered. the packetized voice call is termi-
single encoding operation. Some Furthermore, as the objective was nated over a local STN or in the
codecs even need to collect the voice to make a fair comparison between callers’ terminal equipment. For STN
signal of an interval (referred to as the quality of packetized voice calls calls from traditional handsets,
the look-ahead) after the voice frame and traditional wire-bound STN where echo is mainly caused by the
that is being encoded. The lengths of calls, the expectation factor A was
these intervals are given in Table 2 set to zero. Party 1 Reference Point Party 2
for standard codecs. Since a packet From Equation 1 it follows that
RLR
must transport at least one voice two calls with the same rating R can
frame, the lower bound on the pack- give a totally different subjective
etization delay is set by the voice impression. One call might produce
frame length. Similarly, as the crystal clear, undistorted speech EL1 EL2
encoder has to wait until the look- (e.g. Ie = 0) but suffer from a rel-
ahead has been collected, the lower atively large delay (e.g. Id = 10).
bound on the encoding delay is Another call might slightly distort
SLR
determined by the look-ahead length. the speech (e.g. Ie = 10), while its
Hence, the mouth-to-ear delay is delay is not noticeable (e.g. Id = 0). Talker Echo
lower bounded by the algorithmic However, the E-model predicts that
delay of a codec [8], which is the sum a judging panel will award the same Party 1 Reference Point Party 2
of the voice frame and the look- MOS to both calls and the same per-
ahead length. The algorithmic delays centage of users will find both calls
of various standard codecs are given GoB, albeit for different reasons.
in Table 2. Consider a packetized voice call
EL1 EL2
Second, in contrast to circuit- between two parties, referred to as
switched voice calls, as a result of party 1 and party 2 (see Figure 1).
voice compression and packet loss Using the E-model, Alcatel calcu-
during transport or in the dejitter- lated how party 1 will judge the
ing buffer, the distortion of packe- call, that is, what rating R will be
tized voice calls is not negligible. assigned to it. The influence of Listener Echo
Alcatel has studied the impact of delay was studied first, followed by
the one-way mouth-to-ear delay the influence of distortion.
(via Id) and the distortion (via Ie) on Figure 1 – Talker and listener echo
the quality of a packetized voice Influence of Mouth-to-Ear Delay EL : Echo Loss
call. Other factors, like background If the voice signal party 1 hears is SLR : Send Loudness Rating
noise and a connection that is too delayed, the rating R decreases by an RLR : Receive Loudness Rating

20
Alcatel Telecommunications Review - 1 st Quarter 2000

100
applications;
- a mouth-to-ear delay between 150
90 ms and 400 ms is acceptable, pro-
vided that one is aware of the
80
impact of delay on the quality of
70 the user applications; and
- a mouth-to-ear delay above 400
60 ms is unacceptable.
Rating R

50
It can be seen from Figure 2 that for
40 an echo loss of 21 dB, the rating R
drops below 70 at a mouth-to-ear
30
delay of 25 ms. For calls with per-
20 fect echo control, the rating R drops
below 70 at a mouth-to-ear delay of
10 400 ms. Hence, ITU-T Recommen-
0
dations G.114 and G.131 ensure that
0 50 100 150 200 250 300 350 400 traditional PSTN calls have a rating
Mouth-to-Ear Delay (ms) R of at least 70. Also, the interac-
tivity bound of 150 ms can be
EL=11dB EL=21dB EL=31dB EL=41dB EL=51dB EL=infinity
observed in Figure 2 for infinite
echo loss.
Figure 2 – The rating R as function of the mouth-to-ear delay for undistorted voice and
for various echo loss values Influence of Distortion
If the voice signal party 1 hears is
4-to-2-wire hybrids, a typical value for influence of the mouth-to-ear delay distorted, the rating R decreases by
the echo loss is 21 dB [5]. The same on the rating R for different values of an amount equal to the distortion
value is valid for packetized voice echo loss when the echo losses at impairment Ie. This impairment has
calls terminated over a local STN to both end points are equal two sources: encoding of the voice
traditional handsets. Echo loss is (EL1 = EL2). The impairment asso- signal from party 2 and packet loss
likely to be lower for other kinds of ciated with delay is strongly influ- during the transport of voice packets
terminal, such as personal computers enced by this echo loss value. from party 2 to party 1.
and handsfree phones. The echo Observe that the rating R is a Table 2 summarizes the distor-
losses EL1 and EL2 can be increased non-increasing function of the tion impairment and intrinsic quality
by using an echo controller, which mouth-to-ear delay. The intrinsic (using the color code of Table 1)
should be deployed as close to the quality of a voice call is defined as the associated with each standard codec
source of echo as possible, that is, in rating R associated with a zero mouth- [9]. The distortion impairment Ie
the gateways between the PSTN and to-ear delay. The intrinsic quality of associated with a codec increases as
the packet network, or in the termi- apacketized voice call transported the packet loss ratio increases. Fig-
nals. A simple echo controller can without packet loss in the G.711 for- ure 3, based on [9], shows this effect
increase the echo loss by 30 dB. Per- mat corresponds to R = 94.3. Fig- for four codecs, assuming that voice
fect echo control, in which the echo ure 2 shows that if echo is perfectly packets are lost at random. This fig-
losses EL1 and EL2 increase to infin- controlled (EL1 = EL2 = ∞), this ure deals only with one specific
ity, can be achieved at moderate com- voice call retains its intrinsic quality packetization interval per codec
putational cost. up to a mouth-to-ear delay of 150 ms. (10 ms for G.711, 20 ms for G.729
The third delay-related factor ITU-T Recommendations G.114 and GSM-EFR, 30 ms for G.723.1).
that may disturb party 1 is the loss [1] and G.131 [2] specify the following Results are not yet known for other
of interactivity. If the mouth-to-ear tolerable mouth-to-ear delays for tra- packetization intervals.
delay is too large, an interactive con- ditional PSTN calls: The sensitivity to packet loss
versation becomes impossible. depends on the Packet Loss Con-
Alcatel has used the E-model, • Under normal circumstances (i.e. cealment (PLC) technique used by
which takes all these impairments if the echo loss is at least 21 dB), the codec. In contrast to the G.711
into account, to calculate the rating echo control is needed if the codec, most low bit rate codecs (i.e.
R given by party 1 in the case of mouth-to-ear delay is larger than G.729, G.723.1 and GSM-EFR) have
undistorted voice (see Figure 2). In 25 ms. a built-in PLC scheme. However, a
the case of packetized voice calls, • When the echo is adequately con- PLC scheme can be implemented on
undistorted calls are calls trans- trolled: top of the G.711 codec. For the
ported without packet loss in the - a mouth-to-ear delay of up to 150 codecs that use PLC, the impairment
G.711 format. Figure 1 shows the ms is acceptable for most user increases by about four units on the

21
Quality bounds for packetized voice transport

R scale per percent packet loss (for As stated before, if there is no there is an impairment budget of
low loss values). If no PLC scheme is echo control, the echo loss is likely 24.3, part of which is consumed by
implemented on top of the G.711 to be (smaller than) 21 dB for the codec (see Table 2). Once the
codec, the distortion impairment packetized voice transport. For this codec has been chosen, the remain-
increases by 25 units on the R scale value of the echo loss, the rating R der of the margin can be consumed
for each percent packet loss (for low drops rapidly as the mouth-to-ear either by allowing the mouth-to-ear
loss values). delay increases. Hence, if there is no delay to exceed 150 ms or by
echo control, there is only a very allowing some packet loss. Tables
The voice signal does not need small delay budget for which tradi- 4 and 5 give the codec-dependent
to be transported in the same for- tional PSTN quality (R ≥ 70) can be bounds on the packet loss and
mat end-to-end. Somewhere along guaranteed. As mentioned previ- mouth-to-ear delay, respectively,
the route, voice might be ously, the lower bound for the assuming only one of these phe-
transcoded from one codec format mouth-to-ear delay for packetized nomena is allowed to occur. Note
into another. Since all (considered) voice is the algorithmic delay. Since, that packet loss could be traded off
standard codecs need an 8 kHz in the case of low bit rate codecs this against mouth-to-ear delay (e.g.
stream of uniformly quantized voice algorithmic delay is larger than the by varying the dejittering delay), as
samples at the input, the code delay budget corresponding to 21 long as the impairment budget is
words of the first codec need to be dB, calls transported using this codec not exceeded.
decoded before the signals can be format require echo control [8].
encoded into another codec format. It is assumed here that perfect
Consequently, the impairment echo control is achieved, in which
Conclusions
terms associated with the two case the intrinsic quality of the call
codecs should be added to obtain is attained if the mouth-to-ear delay
the overall distortion impairment Ie, is kept below 150 ms. This intrinsic The E-model has been used to study
because, in the E-model, impair- quality is solely determined by the the quality of packetized voice calls.
ments are additive on the R scale. distortion impairment Ie, which in With regard to quality, more delay
The intrinsic quality associated turn is determined by the codec(s) and distortion are introduced for
with all combinations of two codecs used and the overall packet loss packetized voice calls than for tra-
can be found in Table 3 (again experienced. ditional STN calls.
using the color code of Table 1).
The diagonal entries in this table Since the intrinsic quality of an Since the tolerable mouth-to-ear
correspond to tandeming two undistorted call is 94.3 and the delay budget is smaller than the
codecs of the same type. Hence, bound for traditional quality is 70, minimal packetization delay if voice
transcoding can be very harmful to
the quality of a call. In practice, the
60
order in which the codecs are
tandemed has a small influence,
50
which cannot be seen in (the sym-
Distortion Impairment Ie

metric) Table 3 because, as impair-


ments are considered to be additive 40
in the E-model, asymmetries cannot
occur. 30

20

Quality Bounds
10

If the mouth-to-ear delay, echo loss


0
and distortion impairment are 0 2 4 6 8 10 12 14 16
known, the quality of a packetized Packet Loss Ratio (%)
voice call (i.e. its rating R) can be
derived from Figure 2, as follows. G.729(A)+VAD G.723.1@6.kbit/s+VAD GSM EFR
First, identify the curve on Figure 2 G.711 with PLC G.711 without PLC
that corresponds to the given echo
loss. Then, using this curve, read the
rating R corresponding to the given Figure 3 – Distortion impairment as a function of the packet loss
mouth-to-ear delay. Finally, subtract VAD : Voice Activity Detection
the distortion impairment Ie from PLC : Packet Loss Concealment
this rating R. EFR : Enhanced Full Rate

22
Alcatel Telecommunications Review - 1 st Quarter 2000

7 “Definition of Categories of

kb FR

bit .1

bit .1
bit R
bit R

(1 SM )

s)
bit 1

bit 6

bit 6

bit 6

bit 6

bit 8

kb 8

kb 9

GS /s)

G, /s)
/s)
G i t/ s
/s)

/s)

/s)

/s)

/s)

GS /s)
/s)

.6k -H
3k -F

G s)
i t/

.3k 3

.3k 3
2.2 -E
4k 1

0k 2

2k 2

4k 2

6k 2

6k 2

2.8 2

(8 .72
i t/
Speech Transmission Quality”,

(6 .72

(5 72
(1 M
CODEC
(6 G.7

(4 .7

(3 .7

(2 G.7

(1 G.7

(1 .7

(1 G.7

(5 M
G

G
ITU-T Recommendation G.109,
G.711 94.3 92.3 87.3 69.3 44.3 87.3 74.3 74.3 89.3 84.3 79.3 71.3 75.3 September 1998.
(64kbit/s)
8 D. De Vleeschauwer, J. Janssen,
G.726 92.3 90.3 85.3 67.3 42.3 85.3 72.3 72.3 87.3 82.3 75.3 67.3 71.3
(40kbit/s) G.H. Petit: “Delay Bounds for Low
G.726 Bit Rate Voice Transport over IP
(32kbit/s) 87.3 85.3 80.3 62.3 37.3 80.3 67.3 67.3 82.3 77.3 72.3 64.3 68.3
Networks”, Proceedings of the
G.726 69.3 67.3 62.3 44.3 19.3 62.3 49.3 49.3 64.3 59.3 54.3 46.3 50.3 SPIE Conference on Performance
(24kbit/s)
and Control of Network Systems
G.726 44.3 42.3 37.3 19.3 0 37.3 24.3 24.3 39.3 34.3 29.3 21.3 25.3
(16kbit/s) III, volume 3841, pp 40–48,
G.728 Boston (MA), 20-21 September
(16kbit/s) 87.3 85.3 80.3 62.3 37.3 80.3 67.3 67.3 82.3 77.3 72.3 64.3 68.3
1999.
GSM-FR 9 “Provisional Planning Values for
(13kbit/s) 74.3 72.3 67.3 49.3 24.3 67.3 54.3 54.3 69.3 69.3 59.3 51.3 55.3
G.728 the Equipment Impairment Fac-
(12.8kbit/s) 74.3 72.3 67.3 49.3 24.3 67.3 54.3 54.3 69.3 64.3 59.3 51.3 55.3 tor Ie”, Appendix to ITU-T Rec-
GSM-EFR 89.3 87.3 82.3 64.3 39.3 82.3 69.3 69.3 84.3 79.3 74.3 66.3 70.3 ommendation G.113 (Draft),
(12.2kbit/s)
September 1999. ■
G.729 84.3 82.3 77.3 59.3 34.3 77.3 69.3 64.3 79.3 74.3 69.3 61.3 65.3
(8kbit/s)
G.723.1 79.3 75.3 72.3 54.3 29.3 72.3 59.3 59.3 74.3 69.3 64.3 56.3 60.3
(6.3kbit/s)
Origin Standard Codec Bit PL Bound
GSM-HR 71.3 67.3 64.3 46.3 21.3 64.3 51.3 51.3 66.3 61.3 56.3 48.3 52.3 Rate (kbit/s) (%)
(5.6kbit/s)
G.711 64 1
G.723.1 75.3 71.3 68.3 50.3 25.3 68.3 55.3 55.3 70.3 65.3 60.3 52.3 56.3 without PLC
(5.3kbit/s)
G.711 64 10
with PLC
ITU-T
Table 3 – Matrice de transcodage G.729(A)+VAD 8 3.4

G.723.1@
6.3 kbit/s+VAD 6.3 2.1
is transported in a low bit rate codec
References ETSI GSM-EFR 12.2 2.7
format, calls transported in this for-
mat need to be echo controlled. If the
echo is perfectly controlled, the 1 “One-Way Transmission Time”, Table 4 – Tolerable packet loss
quality remains equal to the intrinsic ITU-T Recommendation G.114, bounds for a mouth-to-ear delay
quality up to a mouth-to-ear delay of February 1996. below 150ms
150 ms. The intrinsic quality 2 “Control of Talker Echo”, ITU-T PLC : Packet Loss Concealment
depends on the amount of distortion Recommendation G.131, August VAD : Voice Activity Detection
that is introduced. 1996.
3 N.O. Johannesson: “The ETSI Codec Bit M2E Delay
Origin Standard Rate (kbit/s) Bound (ms)
The intrinsic quality associated Computation Model: A Tool for
with some low bit rate codecs is Transmission Planning of Tele- G.711 64 400
lower than the traditional STN qual- phone Networks”, IEEE Commu- 16 NA
ity. Therefore these codecs should nications Magazine, pp 70–79, G.726 24 NA
be avoided. For the same reason, January 1997. G.727
32 324
transcoding should be avoided at all 4 P. Meschkat: “TPE: Transmission 40 379
ITU-T
cost. The margin between the Planning (End-to-End) using the
12,8 212
intrinsic quality of a codec and the E-model (Supporting ETSI Guide G.728
16 324
bound for traditional quality can 201 050)”, Windows Software
G.729(A) 8 296
either be consumed by allowing a Tool, Alcatel Telecom, December
5,3 221
mouth-to-ear delay above 150 ms 1997. G.723.1
6,3 253
or by allowing some packet loss. 5 “Speech Processing, Transmis-
GSM-FR 13 212
The mouth-to-ear delay and packet sion and Quality Aspects (STQ);
loss bounds are reported here for Overall Transmission Plan Aspects ETSI GSM-HR 5,6 180

the most common codecs. These for Telephony in a Private Net- GSM-EFR 12,2 345
bounds should be respected by any work”, ETSI Guide 201 050
packetized voice call (phone-to- (Draft), November 1998. Table 5 – Tolerable mouth-to-ear
(M2E) delay bounds when
phone, PC-to-PC, mobile-phone- 6 “The E-model, a Computational
there is no packet loss
to-mobile-phone, phone-to-PC, etc) Model for Use in Transmission
if traditional quality is to be main- Planning”, ITU-T Recommenda- NA : Traditional PSTN quality is Not
tained. tion G.107, December 1998. Attainable

23
Quality bounds for packetized voice transport

Danny De Vleeschauwer is a
research engineer participating
in the Traffic and Routing Tech-
nology project within the Net-
work Architecture department of
the Alcatel Corporate Research
Center in Antwerp, Belgium.

Jan Janssen is a research engi-


neer participating in the Traffic
and Routing Technology project
within the Network Architecture
department of the Alcatel Cor-
porate Research Center in Ant-
werp, Belgium.

Guido H. Petit is Manager of


the Traffic and Routing Techno-
logy project within the Network
Architecture department of the
Alcatel Corporate Research Cen-
ter in Antwerp, Belgium.

Fabrice Poppe is a research


engineer participating in the
Traffic and Routing Technology
project within the Network
Architecture department of the
Alcatel Corporate Research Cen-
ter in Antwerp, Belgium.

24

View publication stats

You might also like