Professional Documents
Culture Documents
October 2005
Masters Thesis
Voice Quality Measurement over Bluetooth
ii
Abstract
TEMS Automatic is an autonomous system that measures the quality of
mobile network from a subscriber perspective. One of the measurement
probes in the TEMS Automatic is the Mobile Test Unit. Within the Mobile Test
Unit audio quality measurements are performed with a digital signal
processor, which compares two sentences. The audio quality algorithm
calculates a score that describes how the subscribers experience the quality
of the mobile network.
This Masters Thesis has investigated how the audio quality algorithm in the
Mobile Test Unit is affected by the Synchronous Connection-Oriented link in
the Bluetooth standard version 1.1.
To manage this task, a prototype was built with a Bluetooth link as the
transmission media in the Mobile Test Unit. Tests have been performed and
the results have shown that the present audio quality algorithm in the Mobile
Test Unit is severely affected when the Bluetooth link is used.
Keywords: Bluetooth, CVSD, MTU, PESQ
iii
iv
Table of Contents
1 INTRODUCTION ...................................................................................................... 1
1.1
1.2
1.3
1.4
2 TEMS AUTOMATIC................................................................................................. 5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
DATABASE ........................................................................................................... 6
OPERATOR CONSOLE ........................................................................................... 6
COMSERVER ........................................................................................................ 6
MOBILE TEST UNIT ............................................................................................. 6
CALL GENERATOR ............................................................................................... 7
TEMS LOGFILE HANDLER .................................................................................. 7
TEMS PRESENTATION AND REPORT ................................................................... 7
3 BLUETOOTH............................................................................................................. 9
3.1
3.2
3.3
5 IMPLEMENTATION .............................................................................................. 23
5.1
5.2
7 ACKNOWLEDGEMENT ....................................................................................... 35
8 TERMINOLOGY..................................................................................................... 37
9 REFERENCES ......................................................................................................... 39
vi
List of figures
FIGURE 1 SCHEMATIC PICTURE OVER A SYSTEM LAYOUT FOR TEMS AUTOMATIC.
5
FIGURE 2: THE LOWEST DEFINED LAYERS IN THE BLUETOOTH PROTOCOL STACK.
9
FIGURE 3: MASTER AND SLAVE FREQUENCY HOPPING.
10
FIGURE 4: DIFFERENT PICONET COMPOSITION IN A AND B; A SCATTERNET EXAMPLE IN
C.
11
FIGURE 5: STANDARD BLUETOOTH PACKET ENTITIES.
11
FIGURE 6: VIEW OF THE CVSD ENCODER WITH SYLLABIC COMPOUNDING [22].
13
FIGURE 7: VIEW OF THE CVSD DECODER WITH SYLLABIC COMPOUNDING[22].
14
FIGURE 8: VIEW OF THE ACCUMULATOR ACTION[22].
14
FIGURE 9: 16-BIT PCM SAMPLING WITH 8 KHZ SAMPLINGS FREQUENCY.
15
FIGURE 10: FREE2MOVE F2M03AC2 MODULE.
16
FIGURE 11: LONG AND SHORT SYNCHRONIZATION-PULSES, 8- AND 16-BIT WORDS.
17
FIGURE 12: VIEW OF THE PESQ ALGORITHM.
19
FIGURE 13: PSYCHOACOUSTIC PART OF THE PESQ ALGORITHM.
20
FIGURE 14: COGNITIVE DOMAIN OF THE PESQ ALGORITHM.
21
FIGURE 15: THE PC RESETS AND RESTARTS THE F2M03AC2.
23
FIGURE 16: PACKET FORMAT, COMMAND LENGTH AND COMAND PARAMETERS AREA. 24
FIGURE 17: A COMPARISON BETWEEN AN ORIGINAL SENTENCE AND A SENTENCE SENT
THROUGH THE MOBILE NETWORK.
25
FIGURE 18: SIMPLIFIED AUDIO PATH IN THE SE V800.
25
FIGURE 19: EXCHANGING THE SE V800 WITH THE F2M03 MODULE TO CREATE A NEW
CONNECTION WITH THE DSP.
26
FIGURE 20: THE RELATIONSHIP BETWEEN THE CG AND THE MTU IN AN AUDIO QUALITY
MEASUREMENT.
27
FIGURE 21 INTERACTION BETWEEN THE PROTOTYPE MTU AND THE CG.
28
FIGURE 22 PESQ-TESTERS SYMMETRIC AND ASYMMETRIC VALUES.
29
FIGURE 23 SYMMETRIC AND ASYMMETRIC VALUES AFTER FILTER AND NOISE
PROCESSING.
29
FIGURE 24 ASYMMETRIC AND SYMMETRIC VALUES WITH A USB BLUETOOTH
TRANSMISSION.
30
FIGURE 25 SYMMETRIC AND ASYMMETRIC VALUES AFTER FILTER AND NOISE
PROCESSING.
30
FIGURE 26 TRANSMITTED CORRUPTED WAVEFORM.
31
FIGURE 27 CVSD QUANTIZATION NOISE.
32
FIGURE 28 RELATIONSHIP BETWEEN THE AMPLITUDE AND MOS_LQO SCORE.
32
List of Tables
TABLE 1: VOICE CODING PLAN SUPPORTED ON THE AIR INTERFACE.
TABLE 2: CVSD PARAMETER VALUES [22].
TABLE 3: RELATIONSHIP, MOS VALUES AND INTENTION OF THE VALUES.
vii
13
15
20
viii
Introduction
Many telecommunications companies around the world today are working in a
saturated market, with decreasing revenues. Therefore, operators are trying
to reduce costs, maximize usage of resources and maintain high quality in
order to attract new customers while keeping existing customers satisfied.
The key to a satisfied customer is Quality of Service (QoS). In fact, good QoS
can make the difference between a satisfied subscriber and a former
subscriber. Because of that it is important for the operator to maintain control
of the network quality. This is best achieved by a continuous gathering of live
information about the networks QoS, by observing the benefits of
optimization efforts on a regular basis, and by measuring the quality as it is
perceived by the customers.
To fulfill these requirements, operators collect information from the fixed-side
of the network. While this data is necessary, it is generally used only for
statistical analyses, and does not contain detailed information about the QoS
nor the end user experience.
Another valuable source of information is the companys own Customer Care
Department (CCD). In the CCD, data regarding the problems that subscribers
experience in the network is collected. However, very few subscribers will call
the CCD when problems occur, but they may still consider switching to
another operator. Therefore, the amount of dissatisfied subscribers obtained
by the CCD may be a fallible source of information regarding the operators
network.
An additional way for the operators to acquire information about the network
quality is by performing a manual drive test. Drive tests provide essential
information about the network, but are time consuming and costly. Therefore,
they are limited by the time and resources available. Despite the necessity of
records from the highest network load, which is mainly during rush hour and
weekends, drive tests are seldom performed at these times due to the
expenses.
It is here that autonomous test systems exceed the limitations of any other
methods. The autonomous system provides the operator with realistic and
reliable measurement data 24 hours a day using minimal human resources.
The system uses a series of measurement probes emplaced in strategic
places, providing end-to-end voice and data measurements.
Within each measurement probe in an autonomous automatic system, audio
data is transported between different processes. One way to do this is by
using the specified synchronous connection-oriented (SCO) link in the
Bluetooth standard as the transmission media. The measurement probes
commonly perform more or less advanced audio processing. Hence different
audio quality algorithms have been developed to mimic the human audio
perceptual capability, to get as realistic audio quality scores of the
measurements as possible.
1.1
Thesis outline
Because of the usage of internal confidential documents during the work with
this Masters Thesis, some of the chapters in this paper will focus on a
discussion around problems proximate to the actual problems that have
arisen during the work on this project.
The second chapter is an overview of the TEMS Automatic (TA), containing a
short presentation of the processes in the system; what they do and the
interfaces between them. Chapter three is a brief overview of the lowest layer
in the Bluetooth stack. It will also provide a deeper insight into how Pulse
Code Modulation (PCM) is managed before the air interface, how the audio
channel is present, and finally a description of the Free2move module
F2M03AC2. Chapter four gives a review of the different parts of the
Perceptual Evaluation of Speech Quality (PESQ) algorithm, and how speech
quality tests including Mean Option Score (MOS) are performed. Further, it
will provide details on how it correlates to MOS-Listener Quality Option (MOSLQO). The fifth chapter describes how the prototype is built and contains
some information about the program that was produced to control the
F2M03AC2 module. It also explains how an original Audio Quality
Measurements (AQM) is achieved in the TA system and how the AQM are
performed with the prototype. The final part of chapter five describes a test
accomplished with two USB Bluetooth devices.
1.2
1.3
Background
Most cellular phones today have the ability to transfer different types of
information to and from the cellular phone via a Bluetooth interface. This
interface has two different types of transmissions links [17, 18, 14]. One is a
reliable link, where acknowledgements and retransmission schemes are used
to ensure the reliability of the data transmission. The other is an unreliable
circuited service link, which allocates slots in a periodic manner for the data
transmission. The data transmitted is never acknowledged and never
retransmitted. These two links have different purposes; the reliable one is
used for file transfers while the unreliable link is intended to handle streaming
data.
As mentioned above, in the Bluetooth standard v1.1 [17, 18, 20] streaming
data is never retransmitted because delayed data will interrupt the parts
participating in the connection. In addition, excessive changes in the audio
picture will cause degeneration of the audio quality, i.e. corrupt the audio
information that is arriving at the receiver. Hence, audio quality algorithms
have been developed to measure the degeneration in the audio between the
sender and the receiver.
1.4
Method
The work started with the production of a time plan containing all the possible
tasks and their respective time requirements. It also included an approximate
plan for when the usage of different equipment would be needed.
A deeper study of papers delving into the PESQ [12] algorithm was made.
Most of the information describing PESQ was found in the ITU-T P.862
Perceptual Evaluation of Speech Quality as well as the document AQM in
TEMS Automatic PESQ. The TEMS Automatic UMTS MTU700
Configuration Guide and the TEMS Automatic MTU700 Installation Guide
were read to obtain an understanding of how the MTU is configured for AQM.
The SE V800 PCM interface was also studied.
A prototype was built. The wired PCM solution between the phone card and
the DSP card was replaced by a new link. This link was equipped with a
Bluetooth device, which was used as the transmission media instead of the
wired link.
A graphical user interface for the Free2move module was implemented. The
Free2move evaluation kit F2M03AC2 [6, 9,24] (F2M03) was used as the
Bluetooth module in the prototype. The DSP software was then exchanged
due to incompatibility with the new interface from the Bluetooth device.
MTU test measurements were performed with the prototype hardware. The
audio recordings were analysed with tools such as Microsoft Excel and
PESQ-tester. These programs were able to provide statistical information
about the audio quality and a sufficient overview of the sets of speech
samples.
Most of the project documentation was performed in the last stage of the
project, due to the many unknowns that required answers during the early
stages.
TEMS Automatic
TEMS Automatic (TA) is an autonomous automated system that gives an
overview of networks from subscribers perspectives. It also provides reports
and tools for troubleshooting and analysis. Because the TA system is a
completely automatic system, it can execute measurements 24 hours a day, 7
days a week. This gives the operator the possibility to access measurements
during rush hour and holidays when staff resources are limited.
The TA system is divided into several separate parts that interact with each
other via specified interfaces. Each interface in the system is kept as simple
as possible to make the system easy to understand, to maintain and to
troubleshoot. Figure 1 below shows the interfaces between the processes in
the system.
2.1
Database
The database is the central hub in the TA system. Its interface is used by
many different applications and since most of the system components are not
real-time applications, the database is used as a communication node. This
provides an option to build the system in a modular way, and an opportunity
to run the system even though not all components have been completed.
The design of the database is simple; this is to enable easy access from
different types of applications such as web-servers, report generators and
third party products.
2.2
Operator Console
The Operator Console (OC) process is responsible for administrating
information that is being handled in the system and gives the user an entry
point to interact via. This entry point can be used to change settings or to
send new information, Work Orders (WO), to the MTU:s. The OC only
interacts with the database via an external interface.
2.3
ComServer
This part of the system serves as an advanced gateway for interaction
between the database and the MTU. All information exchanged between the
MTU and the ComServer is transferred via an ordinary File Transfer Protocol
(FTP) server.
When the MTU sends information files to the ComServer, the ComServer will
translate all incoming information to a format that fits the database. It will
update the database with the new information; the location from which the
information can be read and manipulated via the OC. Some collected files will
be TEMS log files, which are to be placed in the directory for the TEMS log
file reader for further processing. The ComServer is also responsible for
unpacking any compressed files.
When the OC updates the database with information that belongs to the MTU,
the ComServer will create new files including this content and send them to
the MTU.
2.4
2.5
Call Generator
In the TA system the Call Generator (CG) is responsible for making and
receiving calls from the MTU. The CG implements AQM including PESQ, and
saves fixed line events that are recognized by the tones signaled from the
Public Switch Telephone Network (PSTN). The measurement data is stored in
the database by the CG, and when merged with information that the MTU
stores in the log files, the user can pinpoint the geographical position of the
fixed side event.
2.6
2.7
Bluetooth
The first two sub-chapters under this heading describe the lower parts of the
Bluetooth stack and how the Bluetooth entities can form different kind of
deployments. Packet types and different physical channels are also
presented. The part of the chapter where audio is discussed contains a
detailed declaration of the Bluetooth audio air interface. This has been
included because it has a very deep impact on the result of the Masters
Thesis. The four layers above the radio layer and the baseband layer included
in figure 2 are also shortly described. In the end of the chapter, there is a
declaration of the Free2Move handsfree module used in the prototype.
3.1
General description
Bluetooth is a short-range radio standard (10-100 meter), intended to replace
cable(s) between portable and fixed electronic devices. Each entity that will
communicate has to be equipped with a Bluetooth circuit, where the chip
works as a transceiver. To be able to communicate, all Bluetooth entities have
to work together. One entity will act as a master, which controls the other
entities that are participating in the communication. These controlled entities
are known as slaves. Bluetooth is developed to be a robust and reliable link,
which can operate in noisy environments at a low cost [17].
3.2
10
When two or more devices are using the same channel, they will form a
Piconet [20]. In the Piconet there is one master and up to 7 slave devices.
The hopping sequence is unique for each Piconet. Figure 4 shows different
types of piconets.
3.3.1
Physical Links
In the Bluetooth baseband, two types of links are defined [18, 20, 22]; the
SCO and the Asynchronous Connection-Less (ACL) link.
The SCO link is a point-to-point connection between a master and a specific
slave participating in the same Piconet. The link has reserved slots and can
therefore be considered a circuit-switched connection. SCO packets are
never retransmitted and are intended for speech transmissions. Every SCO
link has a transmission capacity of 64 kB/s.
The ACL link is a point-to-multipoint packet-switched connection between the
master and all active slaves participating in the Piconet. In slots not reserved
for a SCO link, a master can establish an ACL connection on a peer-to-peer
basis. This applies to any slave, even a slave that is already engaged in an
SCO link. For all of the ACL links there is a packet retransmission possibility,
which assures data integrity.
3.3.2
11
The packet types in the Piconet are related to the physical links they are
being used in.
Four different packages are defined for the SCO link; HV1, HV2, HV3 and a
DV-packet [17, 18, 20]. HV stands for High-quality Voice. The HV1 packet
carries 10 information bytes, which represents 1.25ms of speech. The HV1
packet is encoded using 1/3 Forward Error Correction (FEC). The HV2 packet
carries 20 information bytes (2.5 ms speech) and is encoded using 2/3 FEC,
while the HV3 packet, which is also encoded with 2/3 FEC, carries 30
information bytes (3.75 ms speech). These bytes are not protected by FEC. A
HV packet is never retransmitted. The DV is a combined voice and data
packet, which carries 10 bytes of voice information and up to 150 bits of data.
The voice field of the payload is not FEC protected, and like the HV packet, it
is never retransmitted. The data field is protected by a 16-bit CRC and
encoded with 2/3 FEC.
The ACL link has seven different packages defined as DM1, DM3, DM5, DH1,
DH3, DH5, and AUX1. The Data-Medium (DM) packets only carry data
information. DM1, DM3 and DM5 cover one, three and five time slots
respectively. In the same order they contain 18, 123 and 226 information
bytes. All packets have a 16-bit Cyclic Redundancy Code (CRC), and 2/3
FEC. The Data-High rate packets are similar to the DM packets, except for
the lack of FEC encoding in the payload information. As a result, the DH1,
DH2 and DH3 packets can carry up to 28, 185 and 341 information bytes. In
the same manner as the DM packets, the DH packets cover one, three and
five time slots respectively. [18, 20]
The AUX1 packet looks like the DH1 packet but has no CRC code. The AUX1
packet can carry up to 30 information bytes, and it covers only one slot [18].
3.4
12
RFCOMM
The Radio Frequency Communication (RFCOMM) is a protocol that emulates
the RS-232 serial port. Due to its dependency on the underlying L2CAP layer,
which is used for multiplexing, the RFCOMM layer is forced to use the ACL
link for transmission. The higher layer in the Bluetooth protocol stack, e.g. the
Object Exchange Protocol, manages the transportation capacities of the
RFCOMM layer [17].
3.5
Bluetooth Audio
The Bluetooth air interface can have either a 64 kB/s log PCM format (A-law
or -law) or a 64 kB/s Continuous Variable Slope Delta modulation
(CVSD)[17, 18, 20]. What type of air interface will be used depends on the
input data stream to the entity. Table 1 summarizes the interfaces supported
by the air interface.
Voice Codecs
Linear
CVSD
8-bit logarithmic
A-law
-law
Table 1: Voice coding plan supported on the air interface.
3.5.1
CVSD
CVSD modulation is a method used to convert a speech signal into a digital
format [22]. It takes advantage of the fact that voice signals do not change
unexpectedly [17]. The CVSD modulation schedule tries to follow the
waveform. The sgn(x) function returns 1 if x 0, however, if x < 0, then the
function will return -1 instead. These numbers represent 0 and 1 respectively
on the air interface. I.e. the CVSD algorithm uses a prediction value; if the
input value is larger than the predictor, a 1 is generated as output in the air
interface, whereas a 0 is generated if the input value is lower than the
prediction value. This process is illustrated in figure 7.
The step size control is applied to reduce slope overhead effects [20]. The
step size is adjusted according to the average signal slope. The input to the
CVSD encoder in the Bluetooth standard has to be a 64 kB/s linear PCM.
Figure 6 exhibits the CVSD encoder. Figure 7 shows the decoder and figure 8
describes the accumulator [20].
13
Notice the constants in the figure on page 14. The CVSD encoder output is
b(k), the accumulator substance is y(k) and the step size is d(k). Also pay
attention to the decay factors, where the representative of the step size decay
is and the decay factor for the accumulator is denoted by h. These decay
factors are definite as shown in table 2. The step size parameter is denoted
by , which monitors the slope by considering the four most recent output
bits.
Let
x (k ) = hy (k )
(1)
The different steps in the CVSD algorithm are then updated according to the
following equations:
b(k ) = sgn{x(k ) x (k 1) },
(2)
(3)
min{y (k ), y max},
y (k ) =
max{y(k), ymin},
(4)
y(k) 0.
y(k) < 0.
where:
y (k ) = x (k 1) + b(k )d (k ).
The minimum and maximum step sizes are denoted by dmax and dmin , and
ymax and ymin denotes the accumulators negative and positive saturation
values respectively [22].
14
The bits are transmitted in the same order as they are generated over the air.
Table 2 below shows the different values that the parameters must use, and
the parameter values are settled into different locations that are determined
by a 16-bit signed number output from the accumulator.
Parameter Value
h
dmin
dmax
1-1/32
10
1280
Parameter Value
ymin
ymax
1-1/1024
15
15
-2 or 2 +1
15
2 -1
3.5.2
PCM
The first step when converting an analog speech signal to a digital one is to
filter out the high frequency components in the signal. This is possible
because most of the energy in spoken language measures between 200-2800
Hz [26]. A band-limiting filter is used to reduce aliasing. The second step is to
read (sample) the amplitude of the analog curve in a manner that results in
good audio quality. The sampling will result in a Pulse Amplitude Modulation
(PAM) where each pulse corresponds to an amplitude in the analog curve.
The sampling frequency is determined by the Nyquist criterion [26, 5], which
says that the sampling frequency has to be at least twice as high as the
highest frequency in the original signal.
FS } 2( BW )
FS = Sampling Frequency
BW = Bandwith of the analog voice signal
The next step in the process is the quantization step. Each input sample is
mapped into the quantization interval that is the closest match to the
amplitude height. If the quantization interval does not match the actual
amplitude of the input signal, an error is introduced into the PCM. This error is
called quantization noise. A way to reduce the error is to increase the
quantization intervals. The last step in the Analog Digital (AD) transformation
is the coding phase, where each quantization value is expressed as a binary
code, each code consisting of 16-bits.
15
The -law and the A-law are two different types of compression schemes,
which compress the 16-bit linear PCM data down to eight-bit logarithmic data.
Since the air-interface supports a 64 kb/s information stream, it is possible to
apply either the A-law PCM or -law PCM compression. If the line interface
uses A-law and the air interface uses -law or vice versa, a conversion from
A-law to -law is made. Both A-law and -law follow the ITU-T
recommendations G.711[22]. If the PCM is represented in 16-bit, i.e. linear
PCM, CVSD modulation is used instead of the A-law PCM or -law PCM.
3.6
3.6.1
PCM interface
To transfer PCM-data in a wired manner, five different wires represent the
PCM data-bus, and if two entities will share PCM-data between each other,
they have to be configured in a master to slave relationship. I.e. in a wired
solution, one entity will have the master role in the connection and the other
entity will be the slave. The masters role is to synchronize the slave to the
clock of the master, and to generate sync pulses.
As mentioned earlier the PCM is a standard method to digitalize human voice
patterns for transmission in digital channels. The F2M03AC2 has hardware
support for transmitting continual PCM data. The data will not pass through
the HCI layer of the protocol stack; it is only managed in the radio- and
baseband layers. The SCO links in the F2M03AC2 are designated to send
and receive streaming mono audio and voice data particularly, and the
module can handle up to three different SCO connections at one time.
When the F2M03AC2 entity operates as a master in the PCM interface it can
generate an output clock at 128, 256 or 512 kHz (it only generates clock
pulses during the time when a SCO link is established). When configured as a
slave it supports input clocks up to 2048kHz.
16
The F2M03AC2 follows the Bluetooth standard v1.1 and supports four
different types of sampling formats; 13- and 16-bit linear PCM, and 8-bit A-law
or -law. The two latter are coded formats with 8000 samples per second. In
the first three of the four primary slots following the PCM sync, the module is
able to transmit or receive.
The module is equipped with a headset firmware that supports a 16-bit linear
PCM format, which force the module to use CVSD transformation. The
different types of PCM formats that the module supports are directed by the
firmware, which is flashed into the module.
There are two types of synchronization pulses in the F2M03AC2 module;
Long frame synchronization and short frame synchronization. In the long
frame synchronization the rising edge of the synchronization pulse indicates
the start of a data word. The long frame synchronization is always 8 bits long.
In the short frame synchronization, however, the falling edge of
synchronization pulse indicates the start of the PCM word. The short frame
synchronization is always 1 bit long, and as mentioned, the device will only
produce the synchronization pulses when the F2M03AC2 is configured as a
master [6].
It is also possible to configure which bit shall be sent first in the data word; the
most significant or the least significant [6]. Figure 11 gives an overview of the
different types of synchronization pulses and the different word lengths which
can be managed in the F2M03AC2 module. Before and after the PCM input
shown in the figure below, the data is undefined. This is because the
F2M03AC2 module is unable to control the actions performed on the other
side of the connection before and after a data word is sent [6].
17
18
19
MOS
5
4
3
2
1
Intention
Excellent
Good
Fair
Poor
Bad
The intentions in the MOS table are only specified in terms of excellent, good,
fair, poor and bad. There are no reference audio samples for the listener to
relate them to. Therefore, each listener individually decides what constitutes a
fair speech sample. This loose definition of MOS makes it very sensible to
utilize alternative listening procedures, as the listeners prior experiences,
equipment quality et cetera can affect the listeners interpretation of the MOS
values.
The output from the PESQ algorithm is a MOS-Listening Quality Objective
(MOS-LQO) [10] and not a MOS value. This means that the MOS value is
transformed to a MOS-LQO value according to the ITU-T p.862.1 standard.
4.2
Psychoacoustic domain
As mentioned, the PESQ algorithm is divided into different parts. The most
important steps in the psychoacoustic region of the algorithm are described
below. They are also illustrated in figure 13.
Scale: When performing system tests, the gain of the system may vary
considerably. For instance, how much the system gains is affected by
whether the system uses an ISDN-line or if an analog two-wired interface is
used to perform the measurements. For this reason, the transmitted speech
and reference speech are both scaled as a means to compensate for the
overall gain in the network.
Time align: Transmission delays may occur in a mobile network. They can
change the transmitted sentences either in a single speech reference or
between two speech references. The delays occur as a result of handovers or
VoIP. Both the transmitted speech sentence and the reference sentence are
time aligned; therefore all parts of the transmitted sentence continuously
correspond to the reference and vice versa.
20
Mimic ear resolution: The speech signal is converted into the frequency
domain. Next, the Hertz scale is warped into the critical band domain by
attempting to mimic how the ear treats different frequencies. Thus higher
frequencies are given a lower resolution.
Remove filter influence: Filtering in the PSTN or mobile network may have a
negative effect on the PESQ score, because severe filtering disturbing to the
listener. To decrease the filter influence, the transfer function is measured and
this information is used to equalize the reference.
Remove gain variations: Gain variations may occur because of the
Automatic Gain Control (AGC) units in the network. The effect of gain
variations is removed.
Mimic ear-brain loudness perception: In order to imitate how the human
ear transforms intensity into discerned loudness, the intensity of the spectrum
is warped.
4.3
Cognitive domain
21
By combining the average split second disturbance with the average split
second asymmetrical disturbance for the entire speech reference, a
PESQ_MOS score can be calculated.
Transform to MOS-LQO: The PESQ-score [10] is transformed into the
MOS_LQO score according to ITU p.826.1.
MOS-LQO: MOS_LQO is similar to the MOS scale. The MOS-LQO scale
goes from 1 to 4.5, where 1 is the worst value and 4.5 is the best.
22
Implementation
This section aims to describe the bigger problems that have occurred during
the work with this Masters Thesis. The main input devices and hardware
used for the work on this Masters Thesis are the MTU 700, a SonyEricsson
V800 (SE V800) equipped with Ericsson TEMS software, and a Free2Move
Bluetooth module F2M03AC2. Two Broadcom USB Bluetooth dongles were
also utilized.
5.1
23
The packets in the protocol are divided into three parts. The first part is a
command value consisting of 1 byte of information. This provides the module
with commands that are waiting to be executed, but it can also contain
information from the module about the tasks that have already been
completed. The second part of the packet is a length indicator that tells how
many bytes the command parameters consist of. This part of the packet, like
the command value, constitutes 1 byte. The third part contains command
parameters describing whether the commands have been processed under
normal circumstances or if there were any deviating occurrences. Figure 16
shows the structure of a packet in the protocol.
Figure 16: Packet format, command length and comand parameters area.
5.2
5.2.1
Audio Path
The SE V800 supports several audio modes [2]. The mode used in this
Masters Thesis is a normal voice mode. It includes functions that are able to
perform audio decoding, audio mixing and filtering. All of these
transformations are completed before the digital audio signal reaches the
Bluetooth circuit. Most of the audio processing units are turned off, in order to
keep the incoming audio picture as consistent as possible.
Some of the phone band filtering and voice coding cannot be completely
turned off due to aspects of the mobile network that are out of our control,
therefore this will to some extent affect the audio coming from the mobile.
Figure 17 is a diagram that compares the frequency scan of a reference
sentence in the MTU and the same sentence after it was sent through a
mobile network. The figure 17 also displays the effects of the phone band
filters and voice codec. Figure 18 shown below is a model of the path the
audio takes from the mobile network to the Bluetooth circuit in the SE V800.
24
-30
1
-40
dB
-50
Original Sentence
-60
Sentence Sent
-70
-80
-90
Hz * 4
Figure 17: A comparison between an original sentence and a sentence sent through
the mobile network.
5.2.2
Acoustic Parameters
In the file system inside the SE V800, several files belong to the audio
configuration [1]. These are so called acoustic parameter files. Most of these
files manage the routing of the audio path. Two of these files have been
modified during the work on this Masters Thesis. One corresponds to the
access type i.e. this parameter determines how the audio is routed in the
mobile, and is also responsible for the time and location from where and how
the data can be reached in the mobile. The main task for this file is to provide
the Bluetooth entity in the SE V800 with a constant PCM-data stream.
The second parameter file that has been altered, manages how the SE V800
behaves when its clamshell is either opened or closed. When the mobile is
inside the MTU the clamshell has to be closed due to the lack of room. During
the work with the prototype, it was instead required that the clamshell stay
open. This was necessary in order to be able to pair the phone with another
Bluetooth entity originating from the MTU.
25
5.3
Building prototype
During the designing phase of the prototype, circuit schemes designated for
the MTU and the F2M03AC2 module were studied. In particular, two different
options for routing the new PCM-data stream were of interest. The first option
required an investigation of a new way of routing the PCM-data into the DSP,
but because it was difficult to estimate the amount of time that would be
required to create the new route, and because of the strict time limitations for
this Masters Thesis, this option was cancelled at an early stage. Therefore,
the remaining option was to route the audio using the existing way from the
phone to the DSP. The existing wires connecting the mobile phone to the
MTU were removed at this stage and the F2M03AC2 PCM-bus was directly
connected to the DSP PCM-bus.
The 16-bit linear PCM-interface from the SE V800 was copied to the
F2M03AC2 module so that it would fit the existing DSP software. However,
the PCM-interface in the F2M03AC2 module was not copied in its entirety due
to compatibility problems, and therefore the differences between the SE V800
and the F2M03AC2 module in the PCM interface had to be managed by the
DSP software. The existing DSP software was replaced by a modified version
to suit the interface in the F2M03AC2 module.
Notice that the SE V800 phone card was never removed from the MTU; only
the wired PCM-bus between the phone and DSP was taken out. The phone in
the MTU had two tasks; it was used to handle calls to- and from the CG
during all tests and it also managed one end of the Bluetooth connection.
Figure 19 gives a schematic picture of the exchange from the SE V800 PCMbus to the new F2M03AC2 PCM-bus.
Figure 19: Exchanging the SE V800 with the F2M03 module to create a new
connection with the DSP.
5.4
26
Every sentence in the TA system is 5.5 seconds long. The CG plays the first
sentence for 5.5 seconds and the MTU records the transmitted speech
sentence. In the following 5.5 seconds, the roles are reversed, i.e. the MTU
plays a sentence and the CG records it. Hence, each play and record cycle
will take 11 seconds. [3]
For every recording, three quality scores are calculated [3]. Two of these
scores are Frequent AQM values; one score is calculated for the first half of
the recorded sentence, while the other reasonably calculates a score for the
remaining part of the sentence. As the sentence is 5.5 seconds long, each
half will constitute 2.75 seconds. The third value that is calculated is the
PESQ score, which is calculated based on the entire sentence.
On occasion, the measurements do not result in any scores, because the
recordings contain more than 25% of silence [12]. The PESQ algorithm
cannot synchronize these sentences.
The scores on the CG side are based on the uplink, while the MTU manages
the measurement scores from the downlink. All generated scores are saved
to log files. Figure 20 gives a schematic picture of the measurement
procedure between the CG and the MTU.
Figure 20: The relationship between the CG and the MTU in an audio quality
measurement.
5.5
27
One of the log files is the Trace log, which is sent to one of the serial ports in
the MTU during the AQM. The Trace log is transmitted to a PC and is
presented to the user via the MS-HyperTerminal program. This log contains
both the Frequent AQM and the PESQ score for all downlink sentences used
when making AQM. The names of the recorded sentences are also displayed.
The recorded sentences in the MTU are saved on a Compact Flash disc,
where each sentence receives an individual name. Thus, if any PESQ score
deviates from the average score, it will be possible to capture the recorded
sentence for further analysis.
5.5.1
PESQ-tester
PESQ-tester is a program intended for a PC, which uses the same algorithm
as the PESQ software included in the MTU. The advantage of the PESQtester program is that, on top of the Frequent AQM score and PESQ score, it
is also able to produce two vectors. One is a symmetric vector, that shows
how much of the original sentence is withdrawn during the transmission. The
other is an asymmetric vector containing the disturbance density signals, i.e. it
describes how much noise is added to the transmitted speech.
The sentences recorded during AQM with the MTU are collected and inserted
as arguments to the PESQ-tester. The PESQ-tester program takes three
arguments on a command line. The first argument communicates how the
reference- and transmitted speech sentences are sampled. The other two
arguments consist of the reference and distorted (recorded) sentences
respectively.
The output vectors, that are a result of the tests run in PESQ-tester, are often
described in a chart, which make it is easier to pinpoint the exact locations of
the symmetric and asymmetric parts in the transmitted sentence. Figure 22
shows a chart of the symmetric and asymmetric values taken from a
measurement using the prototype.
28
5.6
Interference
The asymmetric values displayed in the graph in figure 22, shows the noise
added during the transmission from the CG to the DSP of the MTU. This
transmission is carried out via the mobile network and the Bluetooth prototype
link in the MTU. This noise is presented as interference in the audio picture,
and most of the interference is rather easily filtered out using a digital filter
[25] and noise reduction tools.
Figure 23 shows the same sentence as in figure 22, but after it has been
processed with a noise reduction function and a fast fourier transformation
filter. However, some parts of the interference are impossible to filter out.
These parts consistently appear at the same position in the sentence and
have a negative effect on the PESQ score. Some parts of the interference
may be introduced via the unshielded wires extending from the F2M03
module to the DSP, whereas some audio packets may have been corrupted
in the air transmission between the two Bluetooth entities. The persistent
interferences, which continuously occur in certain positions in the sentences,
indicate that some type of audio transformation is affecting the PESQ score in
a negative way.
Figure 23 Symmetric and asymmetric values after filter and noise processing.
29
5.7
Figure 25 Symmetric and asymmetric values after filter and noise processing.
30
5.8
Transmission studies
The parts of the transmitted sentences that resulted in large values on the
asymmetric vector were analysed in depth. The focal point of this analysis
was to discover the elements responsible for transforming the audio stream in
Bluetooth. This information was valuable in order to learn the reason why
negative values on the asymmetric vector always appeared on the same
sample interval every time a test has been performed.
After a closer examination of the radio layer in the Bluetooth stack, the source
of the audio impairment was located. The Bluetooth entities in the prototype,
as well as the PCM-interface format that exists between them, all use a 16 bit
linear PCM format. This format is managed by the CVSD codec, which
performs encoding and decoding in the linear PCM [7], as described in
chapter three.
This CVSD codec does however have certain known limitations. The CVSD
algorithm is constructed based on the assumption that a voice signal does not
change abruptly. Therefore, if the slope of input voice stream changes too fast
in a way that the CVSD algorithm is unequipped for, the results produced by
the CVSD codec will be unreliable. Unsuccessful CVSD transformation may
introduce quantization noise into the Bluetooth link, which will affect the PESQ
score negatively. [7]
Figure 27 shows how the CVSD quantization noise may occur in the
transformation from linear PCM to the CVSD data stream.
31
-2
9
-2
7
-2
5
-2
3
-2
1
-1
9
-1
7
-1
5
MOS_LQO
-1
3
-1
1
3.9
3.8
3.7
3.6
3.5
3.4
3.3
3.2
3.1
3
2.9
2.8
dB
32
Conclusion
The Bluetooth audio link created using the prototype, as well as the Bluetooth
connection set up between the two PCs used for this Masters Thesis, both
affected the audio recorded in the MTU negatively to the extent that the audio
could not be used for AQM with PESQ.
The TA system and especially the MTU are designated to measure the audio
quality in the mobile network. If an audio link within the MTU is of a poor
enough quality that measurement results become unsatisfactory, then the
MTU will focus on solving other tasks than it is intended for. Instead of
producing a measurement of the quality of the mobile network, the MTU will
instead provide a score reflecting the quality of the mobile network merged
with the Bluetooth link. This is not a task that the MTU is intended for today.
6.1
Further work
This prototype only used the linear PCM interface in the Bluetooth SCO-link,
which means that the data is transformed by the CVSD codec, and the data is
never retransmitted. A task for developers in the future can be to study
whether it is possible to redirect the audio stream and send it to the ACL link
[21], as the ACL link does not transform the audio and is able to retransmit
audio packets. However, the ACL link will introduce delays in- and between
the sentences, and as a result, retransmissions will occur when packets are
lost or corrupted upon arrival at the receiver. The PESQ algorithm will
hopefully be more tolerant to such delays, which is not entirely unlikely
considering its toleration of handovers and VoIP.
33
34
Acknowledgement
First of all I would like to thank my external supervisor Ulf Marklund at
Ericsson TEMS for his guidance and support during the work on this project. I
would also like to direct my great appreciation to Per Johansson at Ericsson
TEMS, who has provided me with numerous valuable ideas and suggestions.
My internal supervisor, Jerry Eriksson, deserves a special mention for
answering my questions about thesis formalities. Lastly, I would also like to
thank all of the employees at Ericsson TEMS who have been more or less
involved in my project, for making this period a very enjoyable time.
35
36
Terminology
ACL
Asynchronous Connection-Less
AQM
CCD
CG
CVSD
DSP
HTU
ISM
L2CAP
LMP
LQO
MOS
MTU
PCM
PESQ
PSQM
PSTN
QoS
Quality of Service
RFCOMM
SCO
Synchronous Connection-Oriented
TLH
VoIP
VQM
37
38
References
[1] Acoustic Parameters Ericsson Mobile Platform E100, G200, U100
Description (internal document)
[2] Audio path Ericsson Mobile Platform E100, G200, U100 Description
(internal document)
[3] Audio Quality Measurement in TEMS Automatic PESQ
[4] Audio Quality Measurement in TEMS Automatic PSQM White Paper
[5] Att frst Tele Kommunikation Ericsson Telecom, Telia AB ISBN 91-4437801-7, pp 70-76
[6] Class 2 BluetoothTM Module F2M03AC2 Datasheet Rev:10 January 2005
[7] Continuously Variable Slope Delta Modulation: A tutorial. Web site 1 Sep
2005 http://www.cmlmicro.com
[8] Design Specification TEMS Automatic System (internal document)
[9] Host Hands-Free Message Interface Free2Move Rev:08 April 2005
[10] ITU-T P.862.1 Mapping function for transforming of P862 to MOS-LQO
[11] ITU-T P.800 Mean Option Score
[12] ITU-T P.862 Perceptual evaluation of speech quality (PESQ), and
objective method for end-to-end speech quality assessment of narrow-band
telephone networks and speech codecs
[13] ITU-T G.711 Pulse Code Modulation (PCM) of voice frequencies.
[14] Bluetooth: Carrying Voice over ACL Links Rohit Kapoor, Ling-Jyh Chen,
Yeng-Zhong Lee, Mario Gerla. 3803 H, Boelter Hall, University of California,
Los Angeles
[15] Modern telekommunikation Gunnar Karlsson ISBN 91-44-00118-5
[16] Bluetooth Revealed Brent A. Miller, Chatschik Bisdikian ISBN 0.13090294-2
[17] Bluetooth Demystified Nathan J. Muller ISBN 0-07-136323-8, pp 23, 59,
70-71, 101, 104-106
[18] Mobile Communications Second Edition Jochen Schiller ISBN 0-32112381-6, pp. 269-293
[19] Specification for mixed digital analog ASIC (internal document)
[20] Specification of the Bluetooth System Wireless connections made easy
Core Version 1.1 February 22, 2001
39
40