You are on page 1of 48

Ume University

Department of Computing Science


Olov Holmlund

October 2005

Masters Thesis
Voice Quality Measurement over Bluetooth

ii

Abstract
TEMS Automatic is an autonomous system that measures the quality of
mobile network from a subscriber perspective. One of the measurement
probes in the TEMS Automatic is the Mobile Test Unit. Within the Mobile Test
Unit audio quality measurements are performed with a digital signal
processor, which compares two sentences. The audio quality algorithm
calculates a score that describes how the subscribers experience the quality
of the mobile network.
This Masters Thesis has investigated how the audio quality algorithm in the
Mobile Test Unit is affected by the Synchronous Connection-Oriented link in
the Bluetooth standard version 1.1.
To manage this task, a prototype was built with a Bluetooth link as the
transmission media in the Mobile Test Unit. Tests have been performed and
the results have shown that the present audio quality algorithm in the Mobile
Test Unit is severely affected when the Bluetooth link is used.
Keywords: Bluetooth, CVSD, MTU, PESQ

iii

iv

Table of Contents
1 INTRODUCTION ...................................................................................................... 1
1.1
1.2
1.3
1.4

THESIS OUTLINE .................................................................................................. 2


PURPOSE AND MOTIVATION ................................................................................ 2
BACKGROUND ..................................................................................................... 2
METHOD .............................................................................................................. 3

2 TEMS AUTOMATIC................................................................................................. 5
2.1
2.2
2.3
2.4
2.5
2.6
2.7

DATABASE ........................................................................................................... 6
OPERATOR CONSOLE ........................................................................................... 6
COMSERVER ........................................................................................................ 6
MOBILE TEST UNIT ............................................................................................. 6
CALL GENERATOR ............................................................................................... 7
TEMS LOGFILE HANDLER .................................................................................. 7
TEMS PRESENTATION AND REPORT ................................................................... 7

3 BLUETOOTH............................................................................................................. 9
3.1
3.2
3.3

GENERAL DESCRIPTION ....................................................................................... 9


BLUETOOTH RADIO LAYER ................................................................................. 9
BLUETOOTH BASEBAND LAYER ........................................................................ 10
3.3.1 Physical Links ........................................................................................... 11
3.3.2 Packet format and types ............................................................................ 11
3.4 ADDITIONAL LAYERS IN THE BLUETOOTH PROTOCOL STACK ......................... 12
3.5 BLUETOOTH AUDIO ........................................................................................... 13
3.5.1 CVSD......................................................................................................... 13
3.5.2 PCM .......................................................................................................... 15
3.6 FREE2MOVE F2M03AC2 MODULE .................................................................. 16
3.6.1 PCM interface ........................................................................................... 16
4 PERCEPTUAL EVALUATION OF SPEECH QUALITY .................................. 19
4.1
4.2
4.3

MAIN OUTPUT FROM PESQ MOS-LQO ............................................................ 19


PSYCHOACOUSTIC DOMAIN ............................................................................... 20
COGNITIVE DOMAIN .......................................................................................... 21

5 IMPLEMENTATION .............................................................................................. 23
5.1
5.2

DESIGN SOFTWARE REQUIRED FOR THE F2M03 MODULE ................................. 23


AUDIO IN A TEMS SONYERICSSON V800......................................................... 24
5.2.1 Audio Path................................................................................................. 24
5.2.2 Acoustic Parameters ................................................................................. 25
5.3 BUILDING PROTOTYPE ....................................................................................... 26
5.4 AQM IN A MTU INCLUDING PESQ .................................................................. 26
5.5 AQM USING THE PROTOTYPE IN THE MTU ...................................................... 27
5.5.1 PESQ-tester............................................................................................... 28
5.6 INTERFERENCE .................................................................................................. 29
5.7 USING USB BLUETOOTH DEVICES .................................................................... 30
5.8 TRANSMISSION STUDIES .................................................................................... 31
6 CONCLUSION ......................................................................................................... 33
6.1

FURTHER WORK ................................................................................................. 33

7 ACKNOWLEDGEMENT ....................................................................................... 35
8 TERMINOLOGY..................................................................................................... 37
9 REFERENCES ......................................................................................................... 39

vi

List of figures
FIGURE 1 SCHEMATIC PICTURE OVER A SYSTEM LAYOUT FOR TEMS AUTOMATIC.
5
FIGURE 2: THE LOWEST DEFINED LAYERS IN THE BLUETOOTH PROTOCOL STACK.
9
FIGURE 3: MASTER AND SLAVE FREQUENCY HOPPING.
10
FIGURE 4: DIFFERENT PICONET COMPOSITION IN A AND B; A SCATTERNET EXAMPLE IN
C.
11
FIGURE 5: STANDARD BLUETOOTH PACKET ENTITIES.
11
FIGURE 6: VIEW OF THE CVSD ENCODER WITH SYLLABIC COMPOUNDING [22].
13
FIGURE 7: VIEW OF THE CVSD DECODER WITH SYLLABIC COMPOUNDING[22].
14
FIGURE 8: VIEW OF THE ACCUMULATOR ACTION[22].
14
FIGURE 9: 16-BIT PCM SAMPLING WITH 8 KHZ SAMPLINGS FREQUENCY.
15
FIGURE 10: FREE2MOVE F2M03AC2 MODULE.
16
FIGURE 11: LONG AND SHORT SYNCHRONIZATION-PULSES, 8- AND 16-BIT WORDS.
17
FIGURE 12: VIEW OF THE PESQ ALGORITHM.
19
FIGURE 13: PSYCHOACOUSTIC PART OF THE PESQ ALGORITHM.
20
FIGURE 14: COGNITIVE DOMAIN OF THE PESQ ALGORITHM.
21
FIGURE 15: THE PC RESETS AND RESTARTS THE F2M03AC2.
23
FIGURE 16: PACKET FORMAT, COMMAND LENGTH AND COMAND PARAMETERS AREA. 24
FIGURE 17: A COMPARISON BETWEEN AN ORIGINAL SENTENCE AND A SENTENCE SENT
THROUGH THE MOBILE NETWORK.
25
FIGURE 18: SIMPLIFIED AUDIO PATH IN THE SE V800.
25
FIGURE 19: EXCHANGING THE SE V800 WITH THE F2M03 MODULE TO CREATE A NEW
CONNECTION WITH THE DSP.
26
FIGURE 20: THE RELATIONSHIP BETWEEN THE CG AND THE MTU IN AN AUDIO QUALITY
MEASUREMENT.
27
FIGURE 21 INTERACTION BETWEEN THE PROTOTYPE MTU AND THE CG.
28
FIGURE 22 PESQ-TESTERS SYMMETRIC AND ASYMMETRIC VALUES.
29
FIGURE 23 SYMMETRIC AND ASYMMETRIC VALUES AFTER FILTER AND NOISE
PROCESSING.
29
FIGURE 24 ASYMMETRIC AND SYMMETRIC VALUES WITH A USB BLUETOOTH
TRANSMISSION.
30
FIGURE 25 SYMMETRIC AND ASYMMETRIC VALUES AFTER FILTER AND NOISE
PROCESSING.
30
FIGURE 26 TRANSMITTED CORRUPTED WAVEFORM.
31
FIGURE 27 CVSD QUANTIZATION NOISE.
32
FIGURE 28 RELATIONSHIP BETWEEN THE AMPLITUDE AND MOS_LQO SCORE.
32
List of Tables
TABLE 1: VOICE CODING PLAN SUPPORTED ON THE AIR INTERFACE.
TABLE 2: CVSD PARAMETER VALUES [22].
TABLE 3: RELATIONSHIP, MOS VALUES AND INTENTION OF THE VALUES.

vii

13
15
20

viii

Introduction
Many telecommunications companies around the world today are working in a
saturated market, with decreasing revenues. Therefore, operators are trying
to reduce costs, maximize usage of resources and maintain high quality in
order to attract new customers while keeping existing customers satisfied.
The key to a satisfied customer is Quality of Service (QoS). In fact, good QoS
can make the difference between a satisfied subscriber and a former
subscriber. Because of that it is important for the operator to maintain control
of the network quality. This is best achieved by a continuous gathering of live
information about the networks QoS, by observing the benefits of
optimization efforts on a regular basis, and by measuring the quality as it is
perceived by the customers.
To fulfill these requirements, operators collect information from the fixed-side
of the network. While this data is necessary, it is generally used only for
statistical analyses, and does not contain detailed information about the QoS
nor the end user experience.
Another valuable source of information is the companys own Customer Care
Department (CCD). In the CCD, data regarding the problems that subscribers
experience in the network is collected. However, very few subscribers will call
the CCD when problems occur, but they may still consider switching to
another operator. Therefore, the amount of dissatisfied subscribers obtained
by the CCD may be a fallible source of information regarding the operators
network.
An additional way for the operators to acquire information about the network
quality is by performing a manual drive test. Drive tests provide essential
information about the network, but are time consuming and costly. Therefore,
they are limited by the time and resources available. Despite the necessity of
records from the highest network load, which is mainly during rush hour and
weekends, drive tests are seldom performed at these times due to the
expenses.
It is here that autonomous test systems exceed the limitations of any other
methods. The autonomous system provides the operator with realistic and
reliable measurement data 24 hours a day using minimal human resources.
The system uses a series of measurement probes emplaced in strategic
places, providing end-to-end voice and data measurements.
Within each measurement probe in an autonomous automatic system, audio
data is transported between different processes. One way to do this is by
using the specified synchronous connection-oriented (SCO) link in the
Bluetooth standard as the transmission media. The measurement probes
commonly perform more or less advanced audio processing. Hence different
audio quality algorithms have been developed to mimic the human audio
perceptual capability, to get as realistic audio quality scores of the
measurements as possible.

1.1

Thesis outline
Because of the usage of internal confidential documents during the work with
this Masters Thesis, some of the chapters in this paper will focus on a
discussion around problems proximate to the actual problems that have
arisen during the work on this project.
The second chapter is an overview of the TEMS Automatic (TA), containing a
short presentation of the processes in the system; what they do and the
interfaces between them. Chapter three is a brief overview of the lowest layer
in the Bluetooth stack. It will also provide a deeper insight into how Pulse
Code Modulation (PCM) is managed before the air interface, how the audio
channel is present, and finally a description of the Free2move module
F2M03AC2. Chapter four gives a review of the different parts of the
Perceptual Evaluation of Speech Quality (PESQ) algorithm, and how speech
quality tests including Mean Option Score (MOS) are performed. Further, it
will provide details on how it correlates to MOS-Listener Quality Option (MOSLQO). The fifth chapter describes how the prototype is built and contains
some information about the program that was produced to control the
F2M03AC2 module. It also explains how an original Audio Quality
Measurements (AQM) is achieved in the TA system and how the AQM are
performed with the prototype. The final part of chapter five describes a test
accomplished with two USB Bluetooth devices.

1.2

Purpose and motivation


Within a Mobile Test Unit (MTU), there is a mobile phone card and a Digital
Signal Processor (DSP) card. These two cards communicate via wires that
exchange Pulse Code Modulation (PCM) information, i.e. audio. The PCM is
collected from a PCM bus of the mobile phone [23].
The audio information can also be redirected to the Bluetooth circuit in a
mobile phone that is able to communicate with wireless devices like headsets.
This study shall investigate whether it is possible to use the SE V800
Bluetooth circuit and an external Bluetooth circuit as the transmission media
for the PCM information from the mobile phone to the DSP processor. The
existing wired interfaces shall be exchanged to a Bluetooth solution, a
Bluetooth prototype shall be constructed, and AQM values that deviate
between the two transfer media shall be investigated.

1.3

Background
Most cellular phones today have the ability to transfer different types of
information to and from the cellular phone via a Bluetooth interface. This
interface has two different types of transmissions links [17, 18, 14]. One is a
reliable link, where acknowledgements and retransmission schemes are used
to ensure the reliability of the data transmission. The other is an unreliable
circuited service link, which allocates slots in a periodic manner for the data
transmission. The data transmitted is never acknowledged and never
retransmitted. These two links have different purposes; the reliable one is
used for file transfers while the unreliable link is intended to handle streaming
data.

As mentioned above, in the Bluetooth standard v1.1 [17, 18, 20] streaming
data is never retransmitted because delayed data will interrupt the parts
participating in the connection. In addition, excessive changes in the audio
picture will cause degeneration of the audio quality, i.e. corrupt the audio
information that is arriving at the receiver. Hence, audio quality algorithms
have been developed to measure the degeneration in the audio between the
sender and the receiver.
1.4

Method
The work started with the production of a time plan containing all the possible
tasks and their respective time requirements. It also included an approximate
plan for when the usage of different equipment would be needed.
A deeper study of papers delving into the PESQ [12] algorithm was made.
Most of the information describing PESQ was found in the ITU-T P.862
Perceptual Evaluation of Speech Quality as well as the document AQM in
TEMS Automatic PESQ. The TEMS Automatic UMTS MTU700
Configuration Guide and the TEMS Automatic MTU700 Installation Guide
were read to obtain an understanding of how the MTU is configured for AQM.
The SE V800 PCM interface was also studied.
A prototype was built. The wired PCM solution between the phone card and
the DSP card was replaced by a new link. This link was equipped with a
Bluetooth device, which was used as the transmission media instead of the
wired link.
A graphical user interface for the Free2move module was implemented. The
Free2move evaluation kit F2M03AC2 [6, 9,24] (F2M03) was used as the
Bluetooth module in the prototype. The DSP software was then exchanged
due to incompatibility with the new interface from the Bluetooth device.
MTU test measurements were performed with the prototype hardware. The
audio recordings were analysed with tools such as Microsoft Excel and
PESQ-tester. These programs were able to provide statistical information
about the audio quality and a sufficient overview of the sets of speech
samples.
Most of the project documentation was performed in the last stage of the
project, due to the many unknowns that required answers during the early
stages.

TEMS Automatic
TEMS Automatic (TA) is an autonomous automated system that gives an
overview of networks from subscribers perspectives. It also provides reports
and tools for troubleshooting and analysis. Because the TA system is a
completely automatic system, it can execute measurements 24 hours a day, 7
days a week. This gives the operator the possibility to access measurements
during rush hour and holidays when staff resources are limited.
The TA system is divided into several separate parts that interact with each
other via specified interfaces. Each interface in the system is kept as simple
as possible to make the system easy to understand, to maintain and to
troubleshoot. Figure 1 below shows the interfaces between the processes in
the system.

Figure 1 Schematic picture over a system layout for TEMS Automatic.

2.1

Database
The database is the central hub in the TA system. Its interface is used by
many different applications and since most of the system components are not
real-time applications, the database is used as a communication node. This
provides an option to build the system in a modular way, and an opportunity
to run the system even though not all components have been completed.
The design of the database is simple; this is to enable easy access from
different types of applications such as web-servers, report generators and
third party products.

2.2

Operator Console
The Operator Console (OC) process is responsible for administrating
information that is being handled in the system and gives the user an entry
point to interact via. This entry point can be used to change settings or to
send new information, Work Orders (WO), to the MTU:s. The OC only
interacts with the database via an external interface.

2.3

ComServer
This part of the system serves as an advanced gateway for interaction
between the database and the MTU. All information exchanged between the
MTU and the ComServer is transferred via an ordinary File Transfer Protocol
(FTP) server.
When the MTU sends information files to the ComServer, the ComServer will
translate all incoming information to a format that fits the database. It will
update the database with the new information; the location from which the
information can be read and manipulated via the OC. Some collected files will
be TEMS log files, which are to be placed in the directory for the TEMS log
file reader for further processing. The ComServer is also responsible for
unpacking any compressed files.
When the OC updates the database with information that belongs to the MTU,
the ComServer will create new files including this content and send them to
the MTU.

2.4

Mobile Test Unit


The MTU is a measurement probe in the TA System. The MTU is often
installed in a vehicle such as a taxi, bus or delivery vehicle that moves around
a network area. The MTU can also be installed into a fixed position such as a
large arena. The MTU carries out measurements following a WO received
from the ComServer. These measurements include, among other things,
AQM with PESQ. After the WO has been completed within a fixed time frame,
the MTU will send the measurement data to the ComServer via the FTP
server.

2.5

Call Generator
In the TA system the Call Generator (CG) is responsible for making and
receiving calls from the MTU. The CG implements AQM including PESQ, and
saves fixed line events that are recognized by the tones signaled from the
Public Switch Telephone Network (PSTN). The measurement data is stored in
the database by the CG, and when merged with information that the MTU
stores in the log files, the user can pinpoint the geographical position of the
fixed side event.

2.6

TEMS Logfile Handler


The task of converting the received air interface measurements is handled by
a tool known as the TEMS Logfile Handler (TLH). It is also responsible for
organizing the measurement data into the database. Voice quality
measurements (VQM) from the uplink are merged with those from the
downlink, which results in a geographic positioning of the VQM originating
from the uplink.

2.7

TEMS Presentation and Report


TEMS Presentation models the measurement data collected in the MTU on a
map, in spreadsheets or in line charts. TEMS Presentation can either display
measurement routes or statistics drawn from the raw data. These statistics
are used to configure and generate the statistical database. The purpose of
the statistical database is to be able to distinguish trends in the
measurements over a certain period of time or to identify problem areas by
analyzing vast amounts of data. [8]

Bluetooth
The first two sub-chapters under this heading describe the lower parts of the
Bluetooth stack and how the Bluetooth entities can form different kind of
deployments. Packet types and different physical channels are also
presented. The part of the chapter where audio is discussed contains a
detailed declaration of the Bluetooth audio air interface. This has been
included because it has a very deep impact on the result of the Masters
Thesis. The four layers above the radio layer and the baseband layer included
in figure 2 are also shortly described. In the end of the chapter, there is a
declaration of the Free2Move handsfree module used in the prototype.

3.1

General description
Bluetooth is a short-range radio standard (10-100 meter), intended to replace
cable(s) between portable and fixed electronic devices. Each entity that will
communicate has to be equipped with a Bluetooth circuit, where the chip
works as a transceiver. To be able to communicate, all Bluetooth entities have
to work together. One entity will act as a master, which controls the other
entities that are participating in the communication. These controlled entities
are known as slaves. Bluetooth is developed to be a robust and reliable link,
which can operate in noisy environments at a low cost [17].

3.2

Bluetooth Radio Layer


The Bluetooth radio layer is the lowest defined layer in the standard. Figure 2
exhibits the lowest layer in the Bluetooth stack. The transceiver operates in
the license free 2.4 GHz Industrial Scientific Medicine (ISM) band. In a
majority of countries, this frequency band reaches from 2400 2438.5 MHz.
The Bluetooth wireless technology uses a frequency-hopping/time division
duplex scheme. It can hop to a maximum of 79 different frequencies; a space
of 1 MHz separates each hopping frequency. The time between two hops is
called a slot, and the Bluetooth entity makes 1600 hops per second. I.e. each
slot uses different frequencies, and remains at a specific frequency for 625
s. Some countries, however, have limitations in their frequency range, which
forces the Bluetooth entity to adhere to a reduced hopping range [21].

Host controller interface

Figure 2: The lowest defined layers in the Bluetooth protocol stack.

Gaussian Frequency Shift Keying for modulation is used in the transceivers


and is available in three different classes.

Power Class 1: Has a maximum transmission power of 100mW and a


minimum of 1mW. Power control is mandatory. A class 1 entity usually ranges
up to 100 m without impediments.
Power Class 2: Has a maximum transmission power of 2.5mW and minimum
of 0.25mW. Power control is optional. A class 2 entity can range up to 10 m
without obstacles.
Power Class 3: The sole requirement is that an entity has a maximum
transmission power below 1mW. [18]
3.3

Bluetooth Baseband Layer


The Bluetooth baseband layer is fairly complex [18, 22]; it handles the
pseudo random frequency hopping sequence which limits interference and
handles all access to the medium. (The hopping sequence is defined by the
master address while the master clock determines the phase in the hopping
sequence.)
The Bluetooth baseband layer also defines different physical links and a large
numbers of packet formats. Figure 3 shows different frequency sections for
slot packets allocating 1, 3 and 5 slots. Each slot has a duration of 625 s.
The master sends data at a frequency fk, which only allocates frequencies at
an even number. The slave subsequently responds in the frequency fk+1,
hence the slaves use only odd numbered frequencies. This odd and even
schedule shows a point of Time Division Duplex (TDD) and is used for
separating the transmission directions in Bluetooth.
The multi-slot packets defined in the Bluetooth standard, as described above,
cover three or five slots in order to achieve a higher data rate. When a master
or a slave sends a multi-slot packet, the transmitting entity will remain at the
same frequency during the transmission. Under these circumstances, no
frequency hopping is performed within the packet transmission. During a
multi-slot packet transmission, the clock is running in every entity, thus
changing the frequencies. Therefore, after the transmission has been
completed, the radio entities will remain synchronized with the other entities
that are participating in the communication. Devices that share the same
pseudo random frequency hopping sequence are also sharing the same
channel.

Figure 3: Master and slave frequency hopping.

10

When two or more devices are using the same channel, they will form a
Piconet [20]. In the Piconet there is one master and up to 7 slave devices.
The hopping sequence is unique for each Piconet. Figure 4 shows different
types of piconets.

Figure 4: Different piconet composition in A and B; a scatternet example in C.

Multiple Piconets with overlapping coverage areas form a scatternet. Each


Piconet can only have one single master; however, slaves can participate in
different Piconets simultaneously on a time-division multiplex basis. In
addition, a master can participate as a slave in another Piconet. This is only
possible when the different Piconets participating in the scatternet are not
frequency synchronized.

3.3.1

Physical Links
In the Bluetooth baseband, two types of links are defined [18, 20, 22]; the
SCO and the Asynchronous Connection-Less (ACL) link.
The SCO link is a point-to-point connection between a master and a specific
slave participating in the same Piconet. The link has reserved slots and can
therefore be considered a circuit-switched connection. SCO packets are
never retransmitted and are intended for speech transmissions. Every SCO
link has a transmission capacity of 64 kB/s.
The ACL link is a point-to-multipoint packet-switched connection between the
master and all active slaves participating in the Piconet. In slots not reserved
for a SCO link, a master can establish an ACL connection on a peer-to-peer
basis. This applies to any slave, even a slave that is already engaged in an
SCO link. For all of the ACL links there is a packet retransmission possibility,
which assures data integrity.

3.3.2

Packet format and types


The data packets presented in the Piconet channel consist of three different
fields: access code, header and payload. The access code and header have
a fixed size; 72 and 54 bits respectively. The payload can carry 0-2745
bits[18]. Figure 5 shows the different fields in the packets.

Figure 5: Standard Bluetooth packet entities.

11

The packet types in the Piconet are related to the physical links they are
being used in.
Four different packages are defined for the SCO link; HV1, HV2, HV3 and a
DV-packet [17, 18, 20]. HV stands for High-quality Voice. The HV1 packet
carries 10 information bytes, which represents 1.25ms of speech. The HV1
packet is encoded using 1/3 Forward Error Correction (FEC). The HV2 packet
carries 20 information bytes (2.5 ms speech) and is encoded using 2/3 FEC,
while the HV3 packet, which is also encoded with 2/3 FEC, carries 30
information bytes (3.75 ms speech). These bytes are not protected by FEC. A
HV packet is never retransmitted. The DV is a combined voice and data
packet, which carries 10 bytes of voice information and up to 150 bits of data.
The voice field of the payload is not FEC protected, and like the HV packet, it
is never retransmitted. The data field is protected by a 16-bit CRC and
encoded with 2/3 FEC.
The ACL link has seven different packages defined as DM1, DM3, DM5, DH1,
DH3, DH5, and AUX1. The Data-Medium (DM) packets only carry data
information. DM1, DM3 and DM5 cover one, three and five time slots
respectively. In the same order they contain 18, 123 and 226 information
bytes. All packets have a 16-bit Cyclic Redundancy Code (CRC), and 2/3
FEC. The Data-High rate packets are similar to the DM packets, except for
the lack of FEC encoding in the payload information. As a result, the DH1,
DH2 and DH3 packets can carry up to 28, 185 and 341 information bytes. In
the same manner as the DM packets, the DH packets cover one, three and
five time slots respectively. [18, 20]
The AUX1 packet looks like the DH1 packet but has no CRC code. The AUX1
packet can carry up to 30 information bytes, and it covers only one slot [18].
3.4

Additional Layers in the Bluetooth Protocol Stack


The Link Manager Protocol
The Link Manager Protocol (LMP) is responsible for establishing a link
between two Bluetooth devices and controlling the baseband packet sizes. It
is also responsible for the power control modes and the security issues in a
Bluetooth connection. LMP messages between two devices are filtered out
and interpreted; they have higher priority than a data packet and are never
delayed [17].
L2CAP
The tasks for Logical Link Control and Adaptation Protocol (L2CAP) include
multiplexing for higher layer protocols, reassembling and separating packets
and QoS. It allows higher layers to transmit packets with a length up to 64
kilobytes and it only supports ACL packets [17].
Service Discovery Protocol
The Service Discovery Protocol (SDP) is a very important framework that
discovers information about other devices, such as which services are
allowed and what characteristics of the detected services can be queried.
After a device has detected the services in the surrounding area, the user can
choose to establish a connection with the device [17].

12

RFCOMM
The Radio Frequency Communication (RFCOMM) is a protocol that emulates
the RS-232 serial port. Due to its dependency on the underlying L2CAP layer,
which is used for multiplexing, the RFCOMM layer is forced to use the ACL
link for transmission. The higher layer in the Bluetooth protocol stack, e.g. the
Object Exchange Protocol, manages the transportation capacities of the
RFCOMM layer [17].
3.5

Bluetooth Audio
The Bluetooth air interface can have either a 64 kB/s log PCM format (A-law
or -law) or a 64 kB/s Continuous Variable Slope Delta modulation
(CVSD)[17, 18, 20]. What type of air interface will be used depends on the
input data stream to the entity. Table 1 summarizes the interfaces supported
by the air interface.

Voice Codecs
Linear
CVSD
8-bit logarithmic
A-law
-law
Table 1: Voice coding plan supported on the air interface.

3.5.1

CVSD
CVSD modulation is a method used to convert a speech signal into a digital
format [22]. It takes advantage of the fact that voice signals do not change
unexpectedly [17]. The CVSD modulation schedule tries to follow the
waveform. The sgn(x) function returns 1 if x 0, however, if x < 0, then the
function will return -1 instead. These numbers represent 0 and 1 respectively
on the air interface. I.e. the CVSD algorithm uses a prediction value; if the
input value is larger than the predictor, a 1 is generated as output in the air
interface, whereas a 0 is generated if the input value is lower than the
prediction value. This process is illustrated in figure 7.
The step size control is applied to reduce slope overhead effects [20]. The
step size is adjusted according to the average signal slope. The input to the
CVSD encoder in the Bluetooth standard has to be a 64 kB/s linear PCM.
Figure 6 exhibits the CVSD encoder. Figure 7 shows the decoder and figure 8
describes the accumulator [20].

Figure 6: View of the CVSD encoder with syllabic compounding [22].

13

Figure 7: View of the CVSD decoder with syllabic compounding[22].

Figure 8: View of the Accumulator action[22].

Notice the constants in the figure on page 14. The CVSD encoder output is
b(k), the accumulator substance is y(k) and the step size is d(k). Also pay
attention to the decay factors, where the representative of the step size decay
is and the decay factor for the accumulator is denoted by h. These decay
factors are definite as shown in table 2. The step size parameter is denoted
by , which monitors the slope by considering the four most recent output
bits.
Let

x (k ) = hy (k )

(1)

The different steps in the CVSD algorithm are then updated according to the
following equations:

b(k ) = sgn{x(k ) x (k 1) },

(2)

1, if four bits in the last four output bits are equal,


=
0, otherwise
min{d (k 1) + d min, d max}, = 1,
d (k ) =
= 0,
max{d (k 1), d min},

(3)

min{y (k ), y max},
y (k ) =
max{y(k), ymin},

(4)

y(k) 0.
y(k) < 0.

where:

y (k ) = x (k 1) + b(k )d (k ).
The minimum and maximum step sizes are denoted by dmax and dmin , and
ymax and ymin denotes the accumulators negative and positive saturation
values respectively [22].

14

The bits are transmitted in the same order as they are generated over the air.
Table 2 below shows the different values that the parameters must use, and
the parameter values are settled into different locations that are determined
by a 16-bit signed number output from the accumulator.
Parameter Value
h
dmin
dmax

1-1/32
10
1280

Parameter Value

ymin
ymax

1-1/1024
15
15
-2 or 2 +1
15
2 -1

Table 2: CVSD parameter values [22].

3.5.2

PCM
The first step when converting an analog speech signal to a digital one is to
filter out the high frequency components in the signal. This is possible
because most of the energy in spoken language measures between 200-2800
Hz [26]. A band-limiting filter is used to reduce aliasing. The second step is to
read (sample) the amplitude of the analog curve in a manner that results in
good audio quality. The sampling will result in a Pulse Amplitude Modulation
(PAM) where each pulse corresponds to an amplitude in the analog curve.
The sampling frequency is determined by the Nyquist criterion [26, 5], which
says that the sampling frequency has to be at least twice as high as the
highest frequency in the original signal.

FS } 2( BW )
FS = Sampling Frequency
BW = Bandwith of the analog voice signal

Figure 9: 16-bit PCM sampling with 8 kHz samplings frequency.

The next step in the process is the quantization step. Each input sample is
mapped into the quantization interval that is the closest match to the
amplitude height. If the quantization interval does not match the actual
amplitude of the input signal, an error is introduced into the PCM. This error is
called quantization noise. A way to reduce the error is to increase the
quantization intervals. The last step in the Analog Digital (AD) transformation
is the coding phase, where each quantization value is expressed as a binary
code, each code consisting of 16-bits.

15

The -law and the A-law are two different types of compression schemes,
which compress the 16-bit linear PCM data down to eight-bit logarithmic data.
Since the air-interface supports a 64 kb/s information stream, it is possible to
apply either the A-law PCM or -law PCM compression. If the line interface
uses A-law and the air interface uses -law or vice versa, a conversion from
A-law to -law is made. Both A-law and -law follow the ITU-T
recommendations G.711[22]. If the PCM is represented in 16-bit, i.e. linear
PCM, CVSD modulation is used instead of the A-law PCM or -law PCM.
3.6

Free2Move F2M03AC2 Module


The F2M03AC2 module is of power class 2. The module is a surface
mountable Bluetooth system, and is equipped with two different interfaces for
audio. One is an analog voice interface and the other is a PCM digital audio
interface. The module can be equipped with a number of different firmware
versions [6]. The module in this Masters Thesis is equipped with a
standalone headset firmware.

Figure 10: Free2Move F2M03AC2 module.

3.6.1

PCM interface
To transfer PCM-data in a wired manner, five different wires represent the
PCM data-bus, and if two entities will share PCM-data between each other,
they have to be configured in a master to slave relationship. I.e. in a wired
solution, one entity will have the master role in the connection and the other
entity will be the slave. The masters role is to synchronize the slave to the
clock of the master, and to generate sync pulses.
As mentioned earlier the PCM is a standard method to digitalize human voice
patterns for transmission in digital channels. The F2M03AC2 has hardware
support for transmitting continual PCM data. The data will not pass through
the HCI layer of the protocol stack; it is only managed in the radio- and
baseband layers. The SCO links in the F2M03AC2 are designated to send
and receive streaming mono audio and voice data particularly, and the
module can handle up to three different SCO connections at one time.
When the F2M03AC2 entity operates as a master in the PCM interface it can
generate an output clock at 128, 256 or 512 kHz (it only generates clock
pulses during the time when a SCO link is established). When configured as a
slave it supports input clocks up to 2048kHz.

16

The F2M03AC2 follows the Bluetooth standard v1.1 and supports four
different types of sampling formats; 13- and 16-bit linear PCM, and 8-bit A-law
or -law. The two latter are coded formats with 8000 samples per second. In
the first three of the four primary slots following the PCM sync, the module is
able to transmit or receive.
The module is equipped with a headset firmware that supports a 16-bit linear
PCM format, which force the module to use CVSD transformation. The
different types of PCM formats that the module supports are directed by the
firmware, which is flashed into the module.
There are two types of synchronization pulses in the F2M03AC2 module;
Long frame synchronization and short frame synchronization. In the long
frame synchronization the rising edge of the synchronization pulse indicates
the start of a data word. The long frame synchronization is always 8 bits long.
In the short frame synchronization, however, the falling edge of
synchronization pulse indicates the start of the PCM word. The short frame
synchronization is always 1 bit long, and as mentioned, the device will only
produce the synchronization pulses when the F2M03AC2 is configured as a
master [6].
It is also possible to configure which bit shall be sent first in the data word; the
most significant or the least significant [6]. Figure 11 gives an overview of the
different types of synchronization pulses and the different word lengths which
can be managed in the F2M03AC2 module. Before and after the PCM input
shown in the figure below, the data is undefined. This is because the
F2M03AC2 module is unable to control the actions performed on the other
side of the connection before and after a data word is sent [6].

Figure 11: Long and short synchronization-pulses, 8- and 16-bit words.

17

18

Perceptual Evaluation Of Speech Quality


The purpose of this section is to give an overview of the Perceptual
Evaluation of Speech Quality (PESQ) [12] algorithm, as well as how the audio
quality measurements are performed and what the goal of the algorithm is.
PESQ is an objective method for a subjective quality evaluation of mobile
networks. This means that PESQ is a mechanism to calculate how the
general public experiences the speech quality in the mobile network. PESQ
does this by mimicking the human speech perception. It evaluates the
distorted speech signal (the signal that is transmitted via the cellular network)
by comparing it to the original undistorted reference signal.

Figure 12: View of the PESQ algorithm.

PESQ is divided into two parts; a psychoacoustic domain and cognitive


domain [3]. The psychoacoustic domain mimics how humans experience
speech. This procedure produces a PESQ-score extending from 1 to 4.5,
where a value of 1 indicates a very hard distorted speech signal and a value
of 4.5 asserts that the measured speech has no distortion [3, 10].
PESQ has several advantages comparing to older audio quality algorithms.
Unlike Perceptual Speech Quality Measure (PSQM) [4], PESQ has a time
align procedure, which handles Voice over Internet Protocol (VoIP) and
handovers. PESQ is also superior to PSQM in that it removes filtering effects.
PESQ is however very sensible to all types of audio transformation and react
noticeably to added noise.
4.1

Main output from PESQ MOS-LQO


The Mean Option Score (MOS) [11] is frequently used for a subjective
evaluation of speech-encoders and -decoders. In a MOS test, a listener
grades a speech sample normally five to eight seconds long, by assigning it to
one of the following categories in table 3:

19

MOS
5
4
3
2
1

Intention
Excellent
Good
Fair
Poor
Bad

Table 3: Relationship, MOS values and intention of the values.

The intentions in the MOS table are only specified in terms of excellent, good,
fair, poor and bad. There are no reference audio samples for the listener to
relate them to. Therefore, each listener individually decides what constitutes a
fair speech sample. This loose definition of MOS makes it very sensible to
utilize alternative listening procedures, as the listeners prior experiences,
equipment quality et cetera can affect the listeners interpretation of the MOS
values.
The output from the PESQ algorithm is a MOS-Listening Quality Objective
(MOS-LQO) [10] and not a MOS value. This means that the MOS value is
transformed to a MOS-LQO value according to the ITU-T p.862.1 standard.
4.2

Psychoacoustic domain
As mentioned, the PESQ algorithm is divided into different parts. The most
important steps in the psychoacoustic region of the algorithm are described
below. They are also illustrated in figure 13.

Figure 13: Psychoacoustic part of the PESQ algorithm.

Scale: When performing system tests, the gain of the system may vary
considerably. For instance, how much the system gains is affected by
whether the system uses an ISDN-line or if an analog two-wired interface is
used to perform the measurements. For this reason, the transmitted speech
and reference speech are both scaled as a means to compensate for the
overall gain in the network.
Time align: Transmission delays may occur in a mobile network. They can
change the transmitted sentences either in a single speech reference or
between two speech references. The delays occur as a result of handovers or
VoIP. Both the transmitted speech sentence and the reference sentence are
time aligned; therefore all parts of the transmitted sentence continuously
correspond to the reference and vice versa.

20

Mimic ear resolution: The speech signal is converted into the frequency
domain. Next, the Hertz scale is warped into the critical band domain by
attempting to mimic how the ear treats different frequencies. Thus higher
frequencies are given a lower resolution.
Remove filter influence: Filtering in the PSTN or mobile network may have a
negative effect on the PESQ score, because severe filtering disturbing to the
listener. To decrease the filter influence, the transfer function is measured and
this information is used to equalize the reference.
Remove gain variations: Gain variations may occur because of the
Automatic Gain Control (AGC) units in the network. The effect of gain
variations is removed.
Mimic ear-brain loudness perception: In order to imitate how the human
ear transforms intensity into discerned loudness, the intensity of the spectrum
is warped.
4.3

Cognitive domain

Figure 14: Cognitive domain of the PESQ algorithm.

Perceptual subtraction: In order to obtain a disturbance density signal, the


volume representation is subtracted from the reference and transmitted
signals. The brains understanding of differences in volume is taken into
account.
Identify bad intervals: An incorrect time alignment for a speech interval can
sometimes cause very bad disturbances, affecting the disturbance density
signal. By re-computing the time alignment and rest of the PESQ algorithm, it
is possible to replace the bad interval with an improved version if the altered
interval has a better disturbance signal.
Asymmetry processing: Noise can be added to the original speech by
speech codec, resulting in clearly audible distortion. An asymmetric
disturbance density signal, including added disturbances, is calculated by the
asymmetry processing.
Aggregate disturbances for all of the speech: By adding the disturbance
signals and the asymmetric disturbance signals in the frequency plane,
signals representing the amount of speech distortion are achieved. These
signals are representative of the speech distortion during a very short time
interval. When these short periods of time are summed to blocks of 320ms,
they constitute what is referred to as split second disturbances.

21

By combining the average split second disturbance with the average split
second asymmetrical disturbance for the entire speech reference, a
PESQ_MOS score can be calculated.
Transform to MOS-LQO: The PESQ-score [10] is transformed into the
MOS_LQO score according to ITU p.826.1.
MOS-LQO: MOS_LQO is similar to the MOS scale. The MOS-LQO scale
goes from 1 to 4.5, where 1 is the worst value and 4.5 is the best.

22

Implementation
This section aims to describe the bigger problems that have occurred during
the work with this Masters Thesis. The main input devices and hardware
used for the work on this Masters Thesis are the MTU 700, a SonyEricsson
V800 (SE V800) equipped with Ericsson TEMS software, and a Free2Move
Bluetooth module F2M03AC2. Two Broadcom USB Bluetooth dongles were
also utilized.

5.1

Design software required for the F2M03 module


The Bluetooth module F2M03AC2 was connected to an evaluation board [6]
designed to make the module more user friendly. Using request, respond and
indication protocols [9], the module could be controlled via a serial port. All
sent requests had to receive a response before a new request could be sent.
An indication could be sent from the F2M03AC2 module at any time. It
indicated any changes that had affected the module. Figure 15 shows what a
restart of the module could look like.

Figure 15: The PC resets and restarts the F2M03AC2.

A program has been produced in order to enable governing of the F2M03AC2


module via a PC. The module protocol communicates with hexadecimal
values that describe the status of the module. This hexadecimal protocol
communication is translated into a text arrangement, and it is presented in a
graphical user interface. The main task for the software is to control the setup
of the audio link in the prototype, and to give feedback to the user about the
status of the audio link. The software contains the possibility of forcing the
module to change the audio packet types in the audio link. The default packet
type is a HV1 packet.

23

The packets in the protocol are divided into three parts. The first part is a
command value consisting of 1 byte of information. This provides the module
with commands that are waiting to be executed, but it can also contain
information from the module about the tasks that have already been
completed. The second part of the packet is a length indicator that tells how
many bytes the command parameters consist of. This part of the packet, like
the command value, constitutes 1 byte. The third part contains command
parameters describing whether the commands have been processed under
normal circumstances or if there were any deviating occurrences. Figure 16
shows the structure of a packet in the protocol.

Figure 16: Packet format, command length and comand parameters area.

5.2

Audio in a TEMS SonyEricsson V800


All MTU:s are equipped with a mobile phone, which is flashed with TEMS
phone software. There are numerous differences between TEMS and the
original software, and as a result of this, the phone in the MTU can give more
information than a mobile phone distributed to the end user market.

5.2.1

Audio Path
The SE V800 supports several audio modes [2]. The mode used in this
Masters Thesis is a normal voice mode. It includes functions that are able to
perform audio decoding, audio mixing and filtering. All of these
transformations are completed before the digital audio signal reaches the
Bluetooth circuit. Most of the audio processing units are turned off, in order to
keep the incoming audio picture as consistent as possible.
Some of the phone band filtering and voice coding cannot be completely
turned off due to aspects of the mobile network that are out of our control,
therefore this will to some extent affect the audio coming from the mobile.
Figure 17 is a diagram that compares the frequency scan of a reference
sentence in the MTU and the same sentence after it was sent through a
mobile network. The figure 17 also displays the effects of the phone band
filters and voice codec. Figure 18 shown below is a model of the path the
audio takes from the mobile network to the Bluetooth circuit in the SE V800.

24

-30
1

51 101 151 201 251 301 351 401 451 501

-40

dB

-50
Original Sentence

-60

Sentence Sent

-70
-80
-90
Hz * 4

Figure 17: A comparison between an original sentence and a sentence sent through
the mobile network.

Figure 18: Simplified Audio Path in the SE V800.

5.2.2

Acoustic Parameters
In the file system inside the SE V800, several files belong to the audio
configuration [1]. These are so called acoustic parameter files. Most of these
files manage the routing of the audio path. Two of these files have been
modified during the work on this Masters Thesis. One corresponds to the
access type i.e. this parameter determines how the audio is routed in the
mobile, and is also responsible for the time and location from where and how
the data can be reached in the mobile. The main task for this file is to provide
the Bluetooth entity in the SE V800 with a constant PCM-data stream.
The second parameter file that has been altered, manages how the SE V800
behaves when its clamshell is either opened or closed. When the mobile is
inside the MTU the clamshell has to be closed due to the lack of room. During
the work with the prototype, it was instead required that the clamshell stay
open. This was necessary in order to be able to pair the phone with another
Bluetooth entity originating from the MTU.

25

5.3

Building prototype
During the designing phase of the prototype, circuit schemes designated for
the MTU and the F2M03AC2 module were studied. In particular, two different
options for routing the new PCM-data stream were of interest. The first option
required an investigation of a new way of routing the PCM-data into the DSP,
but because it was difficult to estimate the amount of time that would be
required to create the new route, and because of the strict time limitations for
this Masters Thesis, this option was cancelled at an early stage. Therefore,
the remaining option was to route the audio using the existing way from the
phone to the DSP. The existing wires connecting the mobile phone to the
MTU were removed at this stage and the F2M03AC2 PCM-bus was directly
connected to the DSP PCM-bus.
The 16-bit linear PCM-interface from the SE V800 was copied to the
F2M03AC2 module so that it would fit the existing DSP software. However,
the PCM-interface in the F2M03AC2 module was not copied in its entirety due
to compatibility problems, and therefore the differences between the SE V800
and the F2M03AC2 module in the PCM interface had to be managed by the
DSP software. The existing DSP software was replaced by a modified version
to suit the interface in the F2M03AC2 module.
Notice that the SE V800 phone card was never removed from the MTU; only
the wired PCM-bus between the phone and DSP was taken out. The phone in
the MTU had two tasks; it was used to handle calls to- and from the CG
during all tests and it also managed one end of the Bluetooth connection.
Figure 19 gives a schematic picture of the exchange from the SE V800 PCMbus to the new F2M03AC2 PCM-bus.

Figure 19: Exchanging the SE V800 with the F2M03 module to create a new
connection with the DSP.

5.4

AQM in a MTU including PESQ


The ordinary AQM procedure between the CG and a MTU is performed in a
master to slave manner. Both the master and the slave follow a half duplex
timetable where the master (CG) and the slave (MTU) alternate between
playing and recording speech sentences. The master decides the playing and
the recording scheme and the slave has to follow it [3].

26

Every sentence in the TA system is 5.5 seconds long. The CG plays the first
sentence for 5.5 seconds and the MTU records the transmitted speech
sentence. In the following 5.5 seconds, the roles are reversed, i.e. the MTU
plays a sentence and the CG records it. Hence, each play and record cycle
will take 11 seconds. [3]
For every recording, three quality scores are calculated [3]. Two of these
scores are Frequent AQM values; one score is calculated for the first half of
the recorded sentence, while the other reasonably calculates a score for the
remaining part of the sentence. As the sentence is 5.5 seconds long, each
half will constitute 2.75 seconds. The third value that is calculated is the
PESQ score, which is calculated based on the entire sentence.
On occasion, the measurements do not result in any scores, because the
recordings contain more than 25% of silence [12]. The PESQ algorithm
cannot synchronize these sentences.
The scores on the CG side are based on the uplink, while the MTU manages
the measurement scores from the downlink. All generated scores are saved
to log files. Figure 20 gives a schematic picture of the measurement
procedure between the CG and the MTU.

Figure 20: The relationship between the CG and the MTU in an audio quality
measurement.

5.5

AQM using the Prototype in the MTU


All test measurements where the prototype has been involved, have been
performed in a manner similar to the original measurement procedures as
described above under the heading AQM in a MTU including PESQ.
However, during measurements involving the prototype, the CG has only
played sentences and the MTU only recorded and performed AQM. I.e. The
main difference between the original procedures and the ones used for the
prototype is that the prototype has only made downlink measurements. This is
simply because it is much easier to acquire measurement data from the MTU
than it is from the CG.. Figure 21 shows how the sentences are played and
recorded when making AQM with the prototype.

27

Figure 21 Interaction between the prototype MTU and the CG.

One of the log files is the Trace log, which is sent to one of the serial ports in
the MTU during the AQM. The Trace log is transmitted to a PC and is
presented to the user via the MS-HyperTerminal program. This log contains
both the Frequent AQM and the PESQ score for all downlink sentences used
when making AQM. The names of the recorded sentences are also displayed.
The recorded sentences in the MTU are saved on a Compact Flash disc,
where each sentence receives an individual name. Thus, if any PESQ score
deviates from the average score, it will be possible to capture the recorded
sentence for further analysis.
5.5.1

PESQ-tester
PESQ-tester is a program intended for a PC, which uses the same algorithm
as the PESQ software included in the MTU. The advantage of the PESQtester program is that, on top of the Frequent AQM score and PESQ score, it
is also able to produce two vectors. One is a symmetric vector, that shows
how much of the original sentence is withdrawn during the transmission. The
other is an asymmetric vector containing the disturbance density signals, i.e. it
describes how much noise is added to the transmitted speech.
The sentences recorded during AQM with the MTU are collected and inserted
as arguments to the PESQ-tester. The PESQ-tester program takes three
arguments on a command line. The first argument communicates how the
reference- and transmitted speech sentences are sampled. The other two
arguments consist of the reference and distorted (recorded) sentences
respectively.
The output vectors, that are a result of the tests run in PESQ-tester, are often
described in a chart, which make it is easier to pinpoint the exact locations of
the symmetric and asymmetric parts in the transmitted sentence. Figure 22
shows a chart of the symmetric and asymmetric values taken from a
measurement using the prototype.

28

Figure 22 PESQ-testers symmetric and asymmetric values.

5.6

Interference
The asymmetric values displayed in the graph in figure 22, shows the noise
added during the transmission from the CG to the DSP of the MTU. This
transmission is carried out via the mobile network and the Bluetooth prototype
link in the MTU. This noise is presented as interference in the audio picture,
and most of the interference is rather easily filtered out using a digital filter
[25] and noise reduction tools.
Figure 23 shows the same sentence as in figure 22, but after it has been
processed with a noise reduction function and a fast fourier transformation
filter. However, some parts of the interference are impossible to filter out.
These parts consistently appear at the same position in the sentence and
have a negative effect on the PESQ score. Some parts of the interference
may be introduced via the unshielded wires extending from the F2M03
module to the DSP, whereas some audio packets may have been corrupted
in the air transmission between the two Bluetooth entities. The persistent
interferences, which continuously occur in certain positions in the sentences,
indicate that some type of audio transformation is affecting the PESQ score in
a negative way.

Figure 23 Symmetric and asymmetric values after filter and noise processing.

29

5.7

Using USB Bluetooth devices


To remove any interference that may have been introduced by the unshielded
wires in the prototype, two USB Bluetooth entities were purchased and
installed on two computers. A SCO connection was established between the
entities, and sentences similar to the CG and MTU correlation of 5.5 second
transmissions were sent. However, the time slots between the transmissions
were longer than for the CG and MTU correlation.
The tests using the USB Bluetooth devices followed the same pattern as the
prototype. The parts that had repeatedly been corrupted in tests with the
prototype were also corrupted in the tests with the USB devices. These
results show that the Bluetooth transmission has a harmful influence on the
PESQ score. Figure 24 illustrates the symmetric and asymmetric PESQ-tester
values of a transmission between two USB Bluetooth devices. Figure 25
shows the symmetric and asymmetric values after filter and noise processing
are performed on the transmitted sentence.

Figure 24 Asymmetric and symmetric values with a USB Bluetooth transmission.

Figure 25 Symmetric and asymmetric values after filter and noise processing.

30

The waveforms of the corrupted parts in the transmitted and recorded


sentences were analysed. The sample intervals surrounding 42 and 272
produced high values on the asymmetric vector in every test. Both of these
sample intervals show the same pattern of corruption. The waveform was flat
in the corrupted parts, which seems to generate bad output MOS_LQO from
the PESQ algorithm. The human ear is unable to distinguish these flat
sections on the waveform, as they represent only a small fraction of the
samples that constitute one second. Figure 26 exhibits the aforementioned
flat parts in the corrupted sentence.

Figure 26 Transmitted corrupted waveform.

5.8

Transmission studies
The parts of the transmitted sentences that resulted in large values on the
asymmetric vector were analysed in depth. The focal point of this analysis
was to discover the elements responsible for transforming the audio stream in
Bluetooth. This information was valuable in order to learn the reason why
negative values on the asymmetric vector always appeared on the same
sample interval every time a test has been performed.
After a closer examination of the radio layer in the Bluetooth stack, the source
of the audio impairment was located. The Bluetooth entities in the prototype,
as well as the PCM-interface format that exists between them, all use a 16 bit
linear PCM format. This format is managed by the CVSD codec, which
performs encoding and decoding in the linear PCM [7], as described in
chapter three.
This CVSD codec does however have certain known limitations. The CVSD
algorithm is constructed based on the assumption that a voice signal does not
change abruptly. Therefore, if the slope of input voice stream changes too fast
in a way that the CVSD algorithm is unequipped for, the results produced by
the CVSD codec will be unreliable. Unsuccessful CVSD transformation may
introduce quantization noise into the Bluetooth link, which will affect the PESQ
score negatively. [7]
Figure 27 shows how the CVSD quantization noise may occur in the
transformation from linear PCM to the CVSD data stream.

31

Figure 27 CVSD quantization noise.

Figure 28 shows a change in the average amplitude values of the sentences.


This represents how the output from the PESQ algorithm has changed
according to the different average Root Mean Square (RMS) powers.

-2
9

-2
7

-2
5

-2
3

-2
1

-1
9

-1
7

-1
5

MOS_LQO

-1
3

-1
1

3.9
3.8
3.7
3.6
3.5
3.4
3.3
3.2
3.1
3
2.9
2.8

dB

Figure 28 Relationship between the amplitude and MOS_LQO score.

32

Conclusion
The Bluetooth audio link created using the prototype, as well as the Bluetooth
connection set up between the two PCs used for this Masters Thesis, both
affected the audio recorded in the MTU negatively to the extent that the audio
could not be used for AQM with PESQ.
The TA system and especially the MTU are designated to measure the audio
quality in the mobile network. If an audio link within the MTU is of a poor
enough quality that measurement results become unsatisfactory, then the
MTU will focus on solving other tasks than it is intended for. Instead of
producing a measurement of the quality of the mobile network, the MTU will
instead provide a score reflecting the quality of the mobile network merged
with the Bluetooth link. This is not a task that the MTU is intended for today.

6.1

Further work
This prototype only used the linear PCM interface in the Bluetooth SCO-link,
which means that the data is transformed by the CVSD codec, and the data is
never retransmitted. A task for developers in the future can be to study
whether it is possible to redirect the audio stream and send it to the ACL link
[21], as the ACL link does not transform the audio and is able to retransmit
audio packets. However, the ACL link will introduce delays in- and between
the sentences, and as a result, retransmissions will occur when packets are
lost or corrupted upon arrival at the receiver. The PESQ algorithm will
hopefully be more tolerant to such delays, which is not entirely unlikely
considering its toleration of handovers and VoIP.

33

34

Acknowledgement
First of all I would like to thank my external supervisor Ulf Marklund at
Ericsson TEMS for his guidance and support during the work on this project. I
would also like to direct my great appreciation to Per Johansson at Ericsson
TEMS, who has provided me with numerous valuable ideas and suggestions.
My internal supervisor, Jerry Eriksson, deserves a special mention for
answering my questions about thesis formalities. Lastly, I would also like to
thank all of the employees at Ericsson TEMS who have been more or less
involved in my project, for making this period a very enjoyable time.

35

36

Terminology
ACL

Asynchronous Connection-Less

AQM

Audio quality measurement

CCD

Customer Care Department

CG

Call generator, one of the processes in the TA system

CVSD

Continuous Variable Slope Delta

DSP

Digital Signal Processor

HTU

Handheld Test Unit

ISM

Industrial Scientific Medicine

L2CAP

Logical Link Control and Adaptation Protocol

LMP

Link Manger Protocol

LQO

Listener Quality Option

MOS

Mean Option Score

MTU

Mobile Test Unit

PCM

Pulse Code Modulation

PESQ

Perceptual Evaluation of Speech Quality

PSQM

Perceptual Speech Quality measurement

PSTN

Public Switch Telephone Network

QoS

Quality of Service

RFCOMM

Radio Frequency Communication

SCO

Synchronous Connection-Oriented

TLH

TEMS Logfile Handler

VoIP

Voice over Internet Protocol

VQM

Voice Quality Measurement

37

38

References
[1] Acoustic Parameters Ericsson Mobile Platform E100, G200, U100
Description (internal document)
[2] Audio path Ericsson Mobile Platform E100, G200, U100 Description
(internal document)
[3] Audio Quality Measurement in TEMS Automatic PESQ
[4] Audio Quality Measurement in TEMS Automatic PSQM White Paper
[5] Att frst Tele Kommunikation Ericsson Telecom, Telia AB ISBN 91-4437801-7, pp 70-76
[6] Class 2 BluetoothTM Module F2M03AC2 Datasheet Rev:10 January 2005
[7] Continuously Variable Slope Delta Modulation: A tutorial. Web site 1 Sep
2005 http://www.cmlmicro.com
[8] Design Specification TEMS Automatic System (internal document)
[9] Host Hands-Free Message Interface Free2Move Rev:08 April 2005
[10] ITU-T P.862.1 Mapping function for transforming of P862 to MOS-LQO
[11] ITU-T P.800 Mean Option Score
[12] ITU-T P.862 Perceptual evaluation of speech quality (PESQ), and
objective method for end-to-end speech quality assessment of narrow-band
telephone networks and speech codecs
[13] ITU-T G.711 Pulse Code Modulation (PCM) of voice frequencies.
[14] Bluetooth: Carrying Voice over ACL Links Rohit Kapoor, Ling-Jyh Chen,
Yeng-Zhong Lee, Mario Gerla. 3803 H, Boelter Hall, University of California,
Los Angeles
[15] Modern telekommunikation Gunnar Karlsson ISBN 91-44-00118-5
[16] Bluetooth Revealed Brent A. Miller, Chatschik Bisdikian ISBN 0.13090294-2
[17] Bluetooth Demystified Nathan J. Muller ISBN 0-07-136323-8, pp 23, 59,
70-71, 101, 104-106
[18] Mobile Communications Second Edition Jochen Schiller ISBN 0-32112381-6, pp. 269-293
[19] Specification for mixed digital analog ASIC (internal document)
[20] Specification of the Bluetooth System Wireless connections made easy
Core Version 1.1 February 22, 2001

39

[21] Specification of the Bluetooth System Wireless connections made easy


Core Version 1.1 February 22, 2001 Part A
[22] Specification of the Bluetooth System Wireless connections made easy
Core Version 1.1 February 22, 2001 Part B pp. 45-77 ,81, 85-91, 139-142
[23] TEMS Automatic UMTS MTU700 Configuration Guide
[24] USER MANUAL Free2Move Evaluation Board. Rev:24 February 2005
[25] Digital Filters Lars Wanhammar and Hkan Johansson Department of
Electrical Enginering Lindkpings universitet 2002 pp. 13-18
[26] Waveform Coding Techniques
http://www.cisco.com/warp/public/788/signalling/waveform_coding.html#topic
1, 26 September 2005

40

You might also like