You are on page 1of 40

Overlays for Live Internet

Multimedia Streaming Systems

––– PhD 1st Year Report –––

Nick Blundell
Supervisor: Dr. Laurent Mathy

Computing Department
Faculty of Applied Sciences
Lancaster University

4 November 2002
PhD 1st Year Report - Nick Blundell - 04/11/2002

Contents
Introduction 6
Report Structure 6

Background 7
Streaming Audio Systems 7
Audio Systems 7
Digital Audio 7
Working with the Soundcard 8
Real-time Scheduling Problems 9
Audio Compression 9
Audio Streaming 10
Network Delay & Jitter 11
Packet Loss 11
RTP/RTCP 12
Audio Device Clock Skew 12
Internet Radio 13
VoIP 13
User Acceptance of Delay 13
Audio Conferencing 13
VoIP Conference Models 13
Signalling 14
SIP 15
H.323 15
Summary 15

Streaming Video Systems 16


Video Systems 16
Digital Video 16
Video Capture 17
Webcams 18
Analog Video Capture Cards 18
Firewire 18
Video Streaming 19
Video Compression 20
Video Broadcasting 21
Video on Demand 21
Video Conferencing 22
Summary 22

Overlay Networks 23
Multicast 23
IP Multicast 24
IGMP (Internet Group Management Protocol) 24
Multicast Routing 24
DVMRP 25
PIM-DM 25
PIM-SM 26
MBone 27
Application Level Multicast 27
Problems with IP Multicast 27
Advantages of ALM 28
Current ALM Systems 29
Content Delivery Networks 32
Content Distribution and Management 33
Content Routing 33

2
PhD 1st Year Report - Nick Blundell - 04/11/2002

Global Redirection 33
Local Redirection 34
Active Networks 34
Summary 34

Motivation 35
IP Multicast not Deployed and Inflexible 35

Statically Configured CDNs 35

Current ALM Technology 35

Proposed Research 36
Open Issues 36
How Can Overlays be Tailored to Applications? 36
How Can Overlays Exploit Heterogeneity? 36
How Can Overlays Adapt with Minimal Disruption? 37

Initial Ideas 37
Adaptation to Application Usage Patterns 37
Resilience through Controlled Loops 37

Conclusions 38

References 38

3
PhD 1st Year Report - Nick Blundell - 04/11/2002

Table of Figures
Figure 1 - Analog Signal Overlaid with a Sampled Digital Signal 8
Figure 2 - Interfacing the Audio Device through a Typical Audio API 8
Figure 3 - User Acceptance of One-way Voice Delay 13
Figure 4 - Painting Scan Lines on an Analog Display 17
Figure 5 - Load Placed on a Unicast Video Server and Network for a Small Group 23
Figure 6 - Video Server Utilising Efficient Multicast to Deliver Data to Clients 24
Figure 7 – DVMRP SPTs for each Network 25
Figure 8 - Reverse Path Forwarding 26
Figure 9 - PIM-SM Shared-to-Source Tree Switchover 27
Figure 10 - Comparison of (a) Unicast, (b) IP Multicast and (c) ALM 27
Figure 11 - Operation of a General CDN (Content Delivery Network) 33

4
PhD 1st Year Report - Nick Blundell - 04/11/2002

List of Tables
Table 1 - Widely used Audio Codecs 10
Table 2 - Audio Streaming Delay Components 11
Table 3 - Audio Packet Loss Concealment Techniques 12
Table 4 - Comparison of VoIP Conference Models 14
Table 5 - Widely used Video Codecs 20
Table 6 - IP Multicast Address Ranges 24

5
PhD 1st Year Report - Nick Blundell - 04/11/2002

Introduction
The majority of all home computers are now capable of handling rich multimedia content
through the use of devices such as DVD and CD drives which allow high quality video and
audio playback. Improvements in network transmission technology and router processing
power are enabling both faster LANs (Local Area Networks) and WANs (Wide Area
Networks) able to move more bytes, faster. Bandwidth for domestic access to the Internet is
increasing with technology such as broadband connections allowing home users to connect
directly through digital networks provided by telephone companies and ISPs (Internet Service
Providers). However, despite all these technological advances efficient and truly scalable
techniques for media broadcasting and group communication have still not been deployed
successfully in the Internet. IP Multicast has been implemented for research purposes on an
experimental sub-network within the Internet but a decade after its initial proposal wide scale
deployment has not been realised. CDNs (Content Delivery Networks) allow congestion
relief for busy media-providing networks and try to provide clients with replicated data from
a nearer source but static configuration of such networks makes optimisation for clients
difficult especially when a clients nearest CDN becomes overloaded.

This report aims to identify current Internet multimedia streaming technologies and give
motivation for a PhD thesis researching the suitability of overlay networks for such systems
especially where live and real-time multimedia data transmission is involved such as live
video streaming and video conferencing systems.

Recent research has shown promising results for the use of application level overlay networks
for group streaming media systems in response to shortcomings of network level group
communication offered by IP Multicast.

A detailed look at all aspects of streaming systems ranging from acquisition of video and
audio data to session initiation and control to network routing is given in this report to provide
a complete picture of the problems of group media streaming systems.

Report Structure
This report is divided into four main sections. Firstly the background section describes
current technology and research on audio and video streaming systems including media
capture, compression and typical applications such as conferencing and broadcast systems
before describing overlay network technology including multicast and CDN technology for
group communication and finally active networks. The next section gives motivation for
research into overlay network support for multimedia streaming systems on the Internet.
Following motivation, the next section outlines current open issues in the area of overlay
network research and proposes some initial ideas. Finally a the document is summarised in
the conclusion section.

6
PhD 1st Year Report - Nick Blundell - 04/11/2002

Background
A range of technologies are described in this section relevant to the proposed research into
Internet multimedia streaming systems. The first two subsections describe specifics of audio
and video streaming systems including multimedia capture from devices, compression,
network streaming and playback. Overlay networks for group network communication are
then examined independently of the applications that use them in the final subsection
including multicast, CDN (Content Delivery Networks) technologies and programmable
networks.

Streaming Audio Systems


This section examines properties of streaming audio systems for the Internet. Such systems
include Internet radio broadcasting and Internet telephony. Firstly common features of all
audio streaming systems are described such as audio playback and capture through soundcard,
digital audio and audio compression. Later subsections examine properties specific to VoIP
and Internet radio systems.

Audio Systems
This section gives an overview of how computers are able to playback and capture audio
signals through soundcards, describing digital and analog signal conversion, digital audio
representation and also the interaction between applications and typical audio APIs.

Digital Audio
Computers manipulate and store audio as a sequence of bytes converted from analog audio
signals by an analog-to-digital converter (ADC). To play back captured digital audio it must
first be converted back into an analog signal using a digital-to-analog converter (DAC) [1].

Most general purpose computers now come equipped with a soundcard which allows analog
audio devices such as speakers and microphones to be connected to the computer and
performs analog and digital conversion allowing the computer to capture and playback analog
sound.

An analog signal is converted into a digital representation by an ADC through sampling


whereby the analog signal is quantised at regular time intervals. Quantisation refers to
approximating the analog signal’s amplitude to a value belonging to a finite range, known as
the digital audio signal’s resolution, at a particular moment in time [1]. For example the
amplitude of a signal could be quantised to a number in the range of 0-255 requiring
resolution of a single byte of storage per sample. Figure 1 shows a sampled digital audio
signal superimposed on the original analog signal with vertical grid lines representing
samples and horizontal lines quantisation values.

7
PhD 1st Year Report - Nick Blundell - 04/11/2002

Figure 1 - Analog Signal Overlaid with a Sampled Digital Signal

The signal amplitude could be approximated more accurately if a word (two bytes) were used
allowing the range of values 0-65535. Similarly increasing the sample rate (rate at which
samples are taken) allows the analog signal’s frequency to be approximated more accurately.
Increasing the sample rate and resolution of an ADC results in increased bandwidth of the
digital signal as more bytes are needed more frequently to represent the analog signal.

Soundcard analog and digital converters usually offer a set of standard digital audio formats
suitable for different applications for example 8 bit, 8KHz digital audio (1 byte resolution at
8000 samples a second) is suitable to represent voice and 16 bit, 44KHz digital audio for high
quality music.

Once in digital form, DSP (Digital Signal Processing) techniques can be applied to the audio
signal allowing useful manipulation such as noise filtering, echo cancellation, silence
detection, error correction, mixing and compression (see section Audio Compression) [1].

Working with the Soundcard


Applications typically interact with a soundcard through an audio API (Application
Programming Interface) which offers a set of functions for configuring the device (i.e. setting
the digital audio format or playout volume) and reading or writing digital audio to and from
the device (see Figure 2 below).

Figure 2 - Interfacing the Audio Device through a Typical Audio API

Audio is passed between the application and soundcard in blocks consisting of one or more
audio samples. To maintain unbroken playback of a continuous incoming audio stream the
internal playback buffer of the soundcard must be regularly topped up with audio blocks by
the application. If the application is unable to deliver an audio block from the incoming

8
PhD 1st Year Report - Nick Blundell - 04/11/2002

stream before the device playback buffer runs out the listener will experience a gap in the
audio.

The choice of audio block size for exchanging audio between the application and soundcard
depends largely on the application although it cannot be larger than the internal soundcard
buffer and processing of hardware events becomes inefficient if it is chosen too small.
Another restriction on audio block size as a result of non real-time operating systems is
described in the following section.

Real-time Scheduling Problems

Currently general purpose operating systems cannot guarantee that a process will be
scheduled slices of processor time at regular intervals enforcing a minimum audio block size
in order to avoid the audio device playback buffer running dry during periods between
scheduling.

Many audio applications avoid this issue by making sure the audio device is given a generous
amount of data (large block size) to feed it while the processor is serving other tasks but for
real-time applications such as audio conferencing this excess buffering can add unnecessary
end-to-end delay.

One technique proposed at UCL to minimise audio block size whilst retaining uninterrupted
playback monitors how much audio was played out between scheduling the process and
increases or decreases the block size accordingly to match the computer’s load. The
technique uses a tight loop to poll a non-blocking audio API function which reads any newly
captured audio samples from the device and then uses the size of the captured data to
determine how much data to write to the audio device. For example, if 10 ms has elapsed
since a process was last scheduled then the application will read 10 ms worth of audio from
the device capture buffer to reflect this [2].

Audio Compression
Several techniques exist to compress digital audio signals enabling more efficient storage and
transmission over networks. The network bandwidth required to stream an audio signal of
8 bit, 8kHz, uncompressed format is;

bandwidth required = <sample rate>*<resolution>


= 8000*8
= 64000 bps
= 64 kbps

The bandwidth consumed by the signal descried above would not be suitable for a user
connected to the Internet via a 56 kbps modem. For such low bandwidth users to stream
audio would require a reduction of the audio bit-rate. This can be achieved by reducing the
audio sampling rate or resolution if supported by the sound card, however the signal quality
may then become unsuitable as 8 bit, 8kHz audio is the minimum sample rate and resolution
necessary to accurately represent human speech. Successful audio compression techniques
try to keep as much of the signal’s important information as possible while reducing the bit-
rate.

9
PhD 1st Year Report - Nick Blundell - 04/11/2002

Audio Codec Description Bit-rate


A single byte is used to represent the quantised analog amplitude for each
PCM (Pulse Code Modulation) 64 Kbps
sample.
Audio is encoded as the difference between adjacent samples exploiting the
property that audio levels vary only slightly from sample to sample. Further
ADPCM (Adaptive Differential
compression is achieved by adapting sample quantisation to the signals 40-16 Kbps
Pulse Code Modulation)
current average energy allowing fewer bits per sample. ADPCM coding
produces good sound quality audio at a low processing cost.
GSM is specially designed for voice communication, it is widely used in
mobile phones and for Internet telephony. This technique extends previous
GSM (Global Systems for
compression techniques further by using prediction based on modelling the 13.2 Kbps
Mobile)
human vocal tract. GSM offers good quality for voice at a low bandwidth
but requires a slightly higher processing overhead than the simpler ADPCM.
Conjugate-Structured Algebraic Code Excited Linear Prediction (CS-
CS-ACELP ACELP) offering low bit-rate, reasonable quality voice at the price of a very 8 Kbps
high processing cost.
CELP Voice compression using Code Excited Linear Prediction (CELP). 4.8 Kbps
Voice compression using Linear Predictive Coding (LPC) offering just
LPC10 2.4 Kbps
about intelligible voice at a very low bit-rate.
Table 1 - Widely used Audio Codecs

Table 1 (above) lists some codecs (COder-DECoderS), widely used in audio communication,
each suited to different applications offering various reductions in bit-rate with characteristic
side effects such as processing overhead, delay and audio quality [3]. For example, some of
the techniques are only suitable for voice signals while others are for general audio. The bit-
rates given are for compression of a 8bit, 8kHz sampled signal.

Audio Streaming
The previous section described properties of digital audio, its playback and capture along with
commonly used compression techniques. This section follows up by introducing the idea of
audio transmission across networks such as the Internet. By networking two audio enabled
machines it is possible to send audio from one computer to another for playback. This
continuous flow of data is known as streaming or live streaming in the case of sending newly
captured data (i.e. from a microphone).

The quality of audio streaming is greatly affected by properties of the conjoining network
such as delay, jitter, packet loss rate and available bandwidth. These issues along with loss
detection through using RTP and loss concealment are described in the following sections.

10
PhD 1st Year Report - Nick Blundell - 04/11/2002

Network Delay & Jitter

The time taken for a sample of audio to be captured from one machine and transmitted across
the network and played back on another machine can be broken down into several general
delay components illustrated in Table 2 (below) [4].

Delay Component Description Typical Value


The time taken to accumulate
Packetisation enough audio samples to fill a 20 ms
complete audio block.
CPU time taken to compress the
Processing (Encoding) 0.7 ms
audio block.
Dynamic delay introduced by
Network Delay Internet routing caused by store-and- 0-75 ms
forward processing and congestion.
CPU time taken to uncompress the
Processing (Decoding) 0.7 ms
encoded audio.
Buffering required to cushion the
Playout Buffering effects of network delay variance 5 ms
(jitter).
Table 2 - Audio Streaming Delay Components
Packetisation delay can be reduced by reducing the audio block size but this reduction leads
to an increase in audio device event frequency and less efficient packet processing. The
minimum block size is therefore limited by CPU processing power along with OS scheduling
properties (see section Real-time Scheduling Problems).

Little can be done to reduce delay contributed by Internet routing on paths between a
particular source and destination. End-to-end delay consists largely of router processing
delay such as packet classification and queuing with a small proportion being made up of
physical link latency between each router hop along the path [5].

End-to-end delay can vary as a result of network congestion. When routers experience high
incoming data-rates packet queue lengths increase resulting in longer delays between packet
queuing and transmission [6]. These changes in delay are experienced as jitter by the
receiver, where delay between packet arrival varies from one packet to the another. Extra
buffering must be in place at the receiver to account for network jitter in order to avoid the
audio device playout buffer running dry through underestimating the minimum end-to-end
delay.

Packet Loss

Not only does the Internet introduce delay and jitter but can also discard packets completely
resulting in packet loss. When a router becomes heavily congested and can no longer accept
any more packets for queuing it has no other option but to drop incoming packets resulting in
the receiver experiencing lost audio packets. Packet loss usually occurs in bursts of one or
more packets where the probability that the next packet is lost is raised if the previous was
lost [7].

Packet Erasure
Reliable protocols such as TCP use acknowledgment and retransmission to ensure data gets
from source to destination, however the extra delay incurred for acknowledgments makes this
method unsuitable for multimedia streaming systems. Protocols such as RTP (see section
RTP/RTCP) exist to support real-time streaming applications allowing detection of packet
loss and reordering through the use of packet timestamps and sequence numbers. If a packet

11
PhD 1st Year Report - Nick Blundell - 04/11/2002

arrives later than it was scheduled to be played it is of no use to the audio streaming
application and so is erased having the same effect to the application as network packet loss.

Loss Concealment
Since retransmission of audio packets is not desirable for audio streaming systems, techniques
have been adopted to try and minimise the impact of packet loss and conceal it from the
listener whilst trying to maintain low delay. Effective concealment techniques usually have
the sender encapsulate extra redundant information within audio packets which can be used
by the receiver to reconstruct lost stream data. This extra redundancy increases the required
per stream bandwidth and introduces extra receiver buffering delay to allow for
reconstruction in the event of packet loss. Table 3 (below) summarises a selection of widely
used loss concealment techniques in audio streaming applications.

Loss Concealment
Description Effectiveness
Method
Silence/White The receiver substitutes white noise, silence or a Works well for small audio
Noise/Repetition repeat of a correctly received audio packet into the packet (<16ms) and a low
Substitution [8] audio stream on encountering packet loss. loss rate (up to 1%).
The sender embeds a low bit-rate encoding of the Adequate intelligibility for
Low bit-rate encoded previously sent audio packet in the current audio larger audio packet sizes of
redundancy [8] packet allowing the receiver to reconstruct 40-80ms and high loss rates
missing audio from lost packets. (up to 40%).
Highly resilient to burst
The sender establishes two independent network
losses as this is unlikely to
paths to the receiver by relaying data through
affect the two paths
intermediate nodes. The audio packet is encoded
Multiple Descriptor simultaneously, however
as two multiple descriptions each sent on differing
Encoding and Path establishing two
paths. If one description is lost redundancy in the
Diversity [9] completely independent
other can be used to reconstruct the missing audio.
paths of similar delay
The original signal is fully reconstructed when
characteristics is potentially
both multiple descriptions are received.
problematic.
Audio packet size is adapted to the speakers pitch
Adaptive allowing the receiver to more successfully Performs well up to high
Packetisation [10] reconstruct the missing audio based on loss rates (30%)
successfully received adjacent audio packets.
Table 3 - Audio Packet Loss Concealment Techniques

RTP/RTCP

RTP (Real-time Protocol) is a standard for real-time data transmission over the Internet,
offering minimal services, not present in UDP, required by real-time multimedia streaming
applications whilst without the delay overheads of TCP retransmission. RTP does not offer a
reliable service but does include sequence numbering and time-stamped packets allowing loss
detection and packet playout scheduling. Also RTP includes a control channel protocol
RTCP (Real-time Control Protocol) for disseminating reports between streaming peers
describing loss rates experienced and participant information [11].

Audio Device Clock Skew

Another potential problem with long lived audio streams is that of audio device clock skew
where small variations in quartz crystal oscillators (clocks) on different soundcards can cause
playout buffer overruns/underruns at the receiver when the sender’s soundcard is sampling
audio faster/slower respectively than the senders soundcard. Many audio streaming
implementations use large playout buffers to avoid this problem with clock skew which was
observed to be as much as ±0.5% by researchers at UCL, however skew detection and
compensation is necessary to help minimise delay [12].

12
PhD 1st Year Report - Nick Blundell - 04/11/2002

Internet Radio
Many Internet radio websites exist, offering audio streaming of various content to users on the
slowest of Internet connections. The bandwidth and processing requirements of audio are far
less than that of video allowing Internet organisations to easily serve thousands of audio
streams from a single media server. BBC Radio 1 and other national radio channels are
broadcast over both the Internet and air [13]. A wide range of other radio broadcasts are
available over the Internet such as hobbyist police radio sites allowing people to listen to live
police communications for various departments [14].

VoIP
By establishing full-duplex (two-way simultaneous) audio streaming between networked
machines users are able to hold a conversation over the Internet as they would using the
PSTN (Public Switched Telephone Network). This section describes the use of audio
streaming for voice communication over the Internet more commonly known as VoIP (Voice
over IP), examining user requirements of VoIP systems such as audio quality and delay and
also describing communication models covering point-to-point and group communication.

User Acceptance of Delay


Users expect VoIP systems to offer adequate quality comparable to the PSTN but excessive
application buffering, audio block size and the Internet as a best-effort network potentially
makes end-to-end delay too high to be acceptable. High delays lead to users becoming
confused, interpreting the delay as pauses and becoming out of sync with each other (i.e.
talking at the same time). Figure 3 - User Acceptance of One-way Voice Delay (Below)
shows the well-studied results of users acceptance for one-way delay1 in a conversation,
assuming that there is no echoing which can greatly degrade acceptance at the smallest of
delays (< 10ms) [3].

Figure 3 - User Acceptance of One-way Voice Delay

Audio Conferencing
VoIP, being based on packet switched networks lends itself more naturally to support
multiparty calls than the PSTN which was originally designed only to construct circuits
between two telephones [15]. PSTN conferencing is achieved by participants dialling into a
telephone bridge using a pre-arranged conference telephone number. Once connected the
bridge provides a caller with mixed audio from all of the other members [16]. Conference
floor control is achieved by members dialling special number sequences such as mute and
invite to control who can speak and to invite new members into the conference.

VoIP Conference Models

A simple approach to VoIP audio conferencing is for each participant to send duplicates of
captured audio packets to each of the other participants. This unicast approach is compared

1
One-way delay for VoIP would consist of both audio processing delay and end-to-end network delay.

13
PhD 1st Year Report - Nick Blundell - 04/11/2002

with other more sophisticated audio conferencing models in Table 4 - Comparison of VoIP
Conference Models (below).

Conference
Description Diagram Discussion
Model

The bandwidth required by


Each participant sends copies of
each participant2 is
Unicast their captured audio packets to
pb(n) = 2(n-1)B
each other participant [17].

A Participant sends captured audio The bandwidth required by


packets to a central mixer which each participant2 is
Central returns packets containing mixed pb(n) = 2B
Mixer audio of all the other participants The bandwidth required by
similar to how a PSTN bridge the mixer is
works [17]. mb(n) = 2nB

Initially the conference starts with


two participants. A new
participant joins by attaching to
The maximum bandwidth
one of the current participants who
required by each
End System then begins to forward a mix of
participant2 for maximum
Mixing audio they are receiving from
degree D is
peers to the new participant.
pb(n) = 2DB
Further participants join forming
chains of nodes mixing audio for
their peers [17].

Participants rely on network layer


The maximum bandwidth3
multicast to distribute their
IP Multicast required by participants2 is
captured audio packets among
pb(n) = nB
multicast members [17].

Table 4 - Comparison of VoIP Conference Models

The conferencing models described in the above table offer a mix of advantages and
disadvantages for group communication. For example, unicast is simple to implement,
requires no multicast support in the network and offers low delay but scales badly for large
groups. Central mixing allows low bandwidth participants to communicate in large groups
however a dedicated server with high bandwidth access is required which introduces a central
point of failure. End system mixing has no central point of failure and allows low bandwidth
participants to communicate in large groups however long chains of mixers introduce high
audio delay between participants at opposite ends of the chains. Finally, IP Multicast has no
central point of failure, allows low bandwidth participants to communicate in large groups if
silence suppression is used and offers audio delay comparable to the unicast approach but
requires all participants to have access to IP Multicast.

Signalling
Signalling protocols have been designed to address the setup of multimedia data streams
between hosts in packet-switched networks such as the Internet. Signalling protocols offer

2
Assuming no silence suppression and where n is the number of participants and B is the per audio
stream bandwidth.
3
Note that for IP Multicast a sender can also receive their own packets which would require a
bandwidth of pb(n) = (n+1)B.

14
PhD 1st Year Report - Nick Blundell - 04/11/2002

services for codec negotiation, location of users and also more application specific services
such as call forwarding and redirection for VoIP akin to call services offered by the PSTN
Signalling System 7 (SS7) [18].

Two standards, SIP (Session Initiation Protocol) and H.323 have emerged dominant for
multimedia stream signalling in the Internet [19]. There follows brief descriptions of SIP and
H.323.

SIP

SIP is a simple text based protocol proposed by the IETF (Internet Engineering Task Force)
using a small set of messages similar in format to HTTP (Hyper-text Transfer Protocol)
request and response messages. SIP clients are termed UAs (User Agents) and can initiate
SIP sessions directly with each other, however the proposed SIP architecture also includes
SIP servers with which UAs register their current location (i.e. IP address and port) to be
mapped by a globally unique SIP-URL of the form sip:nick.blundell@comp.lancs.ac.uk SIP
servers handle location of callees on behalf of UAs and can offer services such as call forking
allowing a single call to ring several telephones at once until one is answered (i.e. ringing
someone’s cell phone and office phone at the same time) [20].

The text-based protocol fields used in SIP messages are completely extendable [19,20]. For
example, SIP sessions are usually described using SDP (Session Description Protocol) within
SIP messages but this can easily be replaced by another form of description such as XML
(eXtensible Mark-up Language) by adding a new protocol field if more flexibility
descriptions are required.

H.323

H.323 is an umbrella standard by the ITU-T (International Telecommunications Union)


consisting of a collection of well defined standards for multimedia communication over
packet switched networks. H.323 offers a similar service to SIP but has existed a few years
longer and is designed primarily to be compatible with PSTN services. H.323 rigorously
defines standards for call signalling, audio & video codecs, codec negotiation and security
[21].

The H.323 architecture defines four main entities: terminals, gateways, gatekeepers and
MCUs (Multipoint Control Units). Terminals and gatekeepers are similar to SIP UAs and SIP
servers. Gateways provide connectivity and translation between H.323 networks and the
PSTN and MCUs allow multiparty conferencing calls [21].

H.323 is currently more widely accepted than SIP largely due to its maturity. Also, less
implementation interoperability issues are expected with H.323 due to the rigorous definition
of all aspects of communication from signalling and security to the use of specific codecs.
However, H.323 is less flexible to extend and support applications other than VoIP as SIP due
to binary encoded protocol messages. As a result of being more complex H.323 is also more
heavyweight to implement than SIP requiring more complex processing requiring more costly
terminal devices [19].

Summary
This section has covered all aspects of streaming audio systems ranging from audio playback
and capture and the associated problems of network delay, loss and jitter on audio streaming
applications. Audio compression and user acceptance of VoIP (Voice over IP) was described
and various models for audio conferencing along with streaming session initiation protocols.

15
PhD 1st Year Report - Nick Blundell - 04/11/2002

Streaming Video Systems


This section examines properties of streaming video systems for the Internet. Such systems
include video broadcasting and conferencing. Firstly common features of all video streaming
systems are described such as video capture through devices such as webcams and video
capture cards, digital video and video compression before later discussion of video
broadcasting and streaming systems.

Video Systems
This section gives an overview of how computers are able to capture video signals through
cameras and analog-to-digital capture cards, describing digital and analog signal conversion,
digital video representation and video compression.

Digital Video
Before describing digital video it is important to understand its predecessor - analog video
used by millions of people worldwide to record and watch TV.

Analog video is produced by cameras which convert the intensity and colour of light observed
through a lens into an electromagnetic signal. The signal is broken down into frames of still
images and further into scan lines of colour and intensity representing single lines of the still
image.

There are three main standards, PAL (Phase Alternation Line), NTSC (National Television
Systems Committee) and SECAM (SEquenctial Couleur Avec Memoire) for encoding analog
video signals. The standards differ in properties such as colour encoding techniques, frame
rate (how many frames are captured/played per second) and how many scan lines make up a
frame. NTSC is an American standard which defines a frame rate of 30 fps (frames per
second) 4 and 525 scan lines of which 480 are visible and others used for Teletext and
subtitles. The European standard PAL and Russian standard SECAM5 both define a frame
rate of 25 fps with 625 scan lines of which 575 are visible. Though both have similar
properties, PAL and SECAM are incompatible with each other. The choice of 25 fps and 30
fps frame rates are based on the power line frequencies used by the different continents,
respectively, being 50Hz and 60Hz [22].

Analog TVs, as found in the majority of peoples homes worldwide, display broadcast and
recorded video signals using a CRT (Cathode Ray Tube) which fires a beam of electrons at a
phosphor coated screen. When the electrons hit the screen at a certain point it glows for an
instant. The beam of electrons is swept across the screen from left to right painting a
horizontal line adjusting the intensity of the beam based on the colour and intensity encoded
in a particular scan line of the current analog frame signal. When the beam finishes painting
a scan line it resets back to the left of the screen, drops down a line and begins to paint the
next scan line. After painting a whole frame and reaching the bottom right-hand corner of the
screen the electron gun resets back to the top left of the screen ready to paint the next encoded
frame (see Figure 4) [23,24].

4
Actually the frame rate of NTCS is 29.97 fps to allow for adequate separation of audio and video
carrier signals.
5
SECAM also used in France and a handful of other Eastern European countries.

16
PhD 1st Year Report - Nick Blundell - 04/11/2002

Figure 4 - Painting Scan Lines on an Analog Display


The human eye retains an image for a short period after it was seen but with an increase in
brightness this time period decreases so it is necessary for a TV to refresh an image on the
screen more frequently than 25 or 30 times a second to avoid the viewer experiencing flicker.
An effective technique used in TVs to overcome this is known as interlacing whereby a single
frame of video from the analog signal is painted on the screen in two stages. Firstly all the
odd numbered lines are painted from top to bottom and then all the even lines are painted.
This effectively doubles the displayed frame rate as two still images, composed of half the
scan lines are painted on the screen for each frame - one followed by the other tricking the
eye into believing that two whole frames were displayed. This refresh rate is known as the
field rate which is double the signal frame rate (i.e. 50Hz for PAL\SECAM and 60Hz for
NTSC) [23].

Digital video is represented by a sequence of bitmap frames which are two-dimensional


arrays of coloured pixels. Pixel colour is usually represented by intensity of the three primary
colours red, green and blue which can be combined in different proportions to produce any
colour. The quality of digital video is determined by its frame resolution, colour depth, and
frame rate. The frame resolution is simply the size of each bitmap used for each frame and
limits the level of detail possible in the picture. A higher frame resolution contains more
pixels and therefore allows more image detail to be displayed on screen. The colour depth of
digital video specifies how many bits are used to represent pixel colour. The higher the
colour depth used, the more colours can be used allowing the colour of images to be
represented more accurately. As with analog video signals, frame rate specifies the frequency
at which frames are capture and played back. A higher frame rate allows motion to be
represented more accurately [25].

Conversion of analog video into digital video is done by splitting the scan line part of an
analog frame into sections and then quantising (see section Digital Audio) the signals colour
information to produce a value for the appropriate pixel in the current digital frame. To
convert an NTSC signal, with a typical TV aspect ratio of 3:4, the digital frames must have a
resolution of 640x480. If a 24bit RGB (Red Green Blue) colour depth is used to represent
each pixel a single frame would require 921.6 KB (640 x 480 x 3) of storage. A mere second
of NTSC digital video, consisting of 30 fps, would require 27.6 MB (30 x 921.6 KB) of
storage. A 90 minute film would require just under 150 GB. It is possible to reduce the
storage capacity of digital video through compression (see section Video Compression).

Video Capture
Video can be captured to a computer through a variety of devices suited to different
applications and computer configurations ranging from the streaming of low resolution video
across the Internet to editing high quality digital video for TV broadcasting. Such devices
include webcams, analog capture cards and Firewire an interface used by DV (Digital Video)
camcorders which are discussed in more detail in the following subsections.

17
PhD 1st Year Report - Nick Blundell - 04/11/2002

Webcams

Webcams are the cheapest and usually lowest visual quality video capture devices. A
webcam consists of a CCD camera and usually uses simple M-JPEG (see section Video
Compression) hardware compression to reduce video storage requirements before the video is
passed through USB (Universal Serial Bus) or serial cable to the computer.

A CCD camera converts light intensity observed through the camera lens straight into a
digital bitmap. A CCD (Charge Coupled Device) chip lies behind the camera lens and
consists of a two-dimensional array of tiny light sensitive diodes which convert light hitting
the chip at a specific location into an electrical charge and then quantises it using a ADC
(analog-to-digital converter) into intensity of a specific pixel [26].

Cheap webcams often use a single CCD chip to capture light intensity for RGB colours by
alternating RGB filters over different diodes (i.e. some diodes measure red light intensity,
some green and some blue) then use a technique known as interpolation to estimate the actual
colour of a pixel based on the RGB values recorded at surrounding diodes. More accurate
colour detection can be achieved by using a separate CCD chip for each primary colour as
found in more expensive cameras [26].

Leading webcam developers claim to be able to capture full screen NTSC (640x480) at 30 fps
but on general purpose machines this resolution is usually only possible at around 15 fps or
less though a full 30 fps is usually possible at quarter screen resolution (320x240).

Analog Video Capture Cards

Analog capture cards allow devices which produce analog video such as video recorders,
video cameras and broadcast TV aerials to be attached to the computer and converted in real-
time into digital format. Depending on the price range of capture cards they usually allow
more flexibility in video format output to the computer than webcams and can achieve better
quality video when used with professional analog cameras or other sources. Much of the
strain can be taken off the computer by more complex, higher compression rate codecs
available in hardware on most capture cards making them suitable for a range of applications
from studio editing to high or low quality video conferencing.

Firewire

Firewire, known by its standard as IEEE 1394, is a high speed serial bus capable of
transferring data up to rates of 400 Mbps over a 3.5 m cable. Firewire was developed
alongside DV (Digital Video) technology as a way to transfer high quality (mildly
compressed) digital video between devices. DV camcorders use CCDs, similar to webcams
and digital cameras, to capture digital video directly and store it on high capacity tapes. With
all the recent DV camcorders supporting Firewire and availability of Firewire PC cards and
enabled motherboards it is possible to transfer high quality digital video between a computer
and DV devices allowing professional editing of video with no reduction in quality.
Acquisition of DV through Firewire is aimed more at video editing but hardware compression
can also be used to support a number of other applications such as high and low quality
conferencing or broadcasting though at a high price.

18
PhD 1st Year Report - Nick Blundell - 04/11/2002

Video Streaming
The process of video streaming between hosts on a network has many similarities to that of
audio streaming (see section Audio Streaming) with issues such as packetisation delay,
network delay and jitter, compression delay and network bandwidth restrictions. However
there are subtle differences between video and audio streaming which are described in this
section.

In general, video streaming requires higher network bandwidth than audio streaming as more
information is required to represent visual images - made up from arrays of continuous
varying colour signals as appose to a single air pressure waveform representing audio. It is
unconceivable to stream uncompressed digital video over current computer networks due the
massive bandwidth it requires even for the lowest resolution images (see section Digital
Video) and so it is always compressed before transmission.

Compression of video frames usually results in variable bit-rate (VBR) streams as appose to
constant bit-rate (CBR) streams typical of compressed audio for reasons described later in
(see section Audio Compression). The compressed video bit-rate can vary dramatically based
on factors such as motion and changes in video scenes and also video codec configuration
making it at times too high for the network to carry and also too low to utilised available
reserved bandwidth [27].

It is possible for a sender to smooth out encoded VBR video before transmission creating an
artificial, near CBR stream by limiting the transmitted bandwidth through buffering high data
bursts until extra bandwidth becomes available [27]. It may also be necessary to drop frames
from the video in order to match network bandwidth which results in jerky motion if the
frame rate is dropped too low (< 24 fps). Many video compression techniques rely in
temporal compression which means rendering of a received frame requires information from
earlier received frames [25,28]. In this case dropping frames will also inhibit video quality by
introducing artefacts (observable glitches) on the receivers screen.

19
PhD 1st Year Report - Nick Blundell - 04/11/2002

Video Compression
Raw, uncompressed digital video is far to large for many applications to handle and so is
usually compressed immediately after capture for storage on a computers hard disk and
further compressed if it is to be written to CD or DVD and even further compressed if it is to
be transmitted across a network. Table 5 (below) lists commonly used codecs for video
compression, describing their application suitability and compression rates.

Video Codec Description Bit-rate


Each frame is represented by a bitmap of pixels. The bit-rate given is for a digitised
Uncompressed 65 Mbps
PAL TV signal at 25 fps [25].
DV and variants offer high quality digital video and are used in digital camcorders
which lower the data rate via mild intraframe compression removing information that
DV (Digital Video) is not visible to the human eye and allowing easy editing on a per frame basis. Further < 25 Mbps
compression is achieved by coding RGB colour separate from luminance and allowing
more bits for luminance than for colour without affecting the perceived quality [25].
M-JPEG (Motion-JPEG) uses JPEG still image compression on each frame similar but
less sophisticated than the DV standard. M-JPEG is ideal for video editing, low
M-JPEG latency streaming and can be cheaply implemented in hardware such as webcams and < 20 Mbps
video capture cards. Other compression schemes that also make use of temporal
compression can offer much better quality than M-JPEG at lower bit-rates [25].
MPEG-1 is the first video compression standard by MPEG (Motion Pictures Expert
Group) which was designed to fit about 70 minutes of VHS quality (352x240) onto a
single CD. The standard defines three types of video frames I (Intra), P (Predictive)
and B (Bi-directional) frames. P and B frames are used for temporal compression
MPEG-1 encoding only the difference between frames and I frames encode a complete video < 1.5 Mbps
frame using only spatial compression such as JPEG and act as synchronisation points
for the decoder allowing playback from specific file locations or recovery from packet
loss in streaming systems. MPEG codecs are asymmetrical, in that decoding
complexity if lower than that of encoding which is often left to hardware [25].
MPEG-2 was designed to provide broadcast quality video (704x480) and it also offers
support for multi-channel audio for surround sound systems. DVD systems use
MPEG-2 1.5 – 15 Mbps
MPEG-2 compression as it offers high quality video at relatively low bandwidth when
compared to special only techniques such as M-JPEG [25,29].
MPEG-4 is designed to support a broad range of applications and bit-rates from
streaming web content to broadcast/DVD quality to professional quality studio use.
High compression is achieved through tracking of individual objects in a scene (i.e.
64 Kbps –
MPEG-4 people moving across a fixed backdrop) which allows a level of interactivity whereby
300 Mbps
the user can click objects in the video find out about them or perform some action.
The popular DivX codec is based on MPEG-4 allowing high quality films to be stored
on standard, easily replicated CDs [25,27].
H.261 is a standard by the ITU (International Telecommunications Union) designed
originally for ISDN conferencing, where ISDN bandwidth is allocated in blocks of 64
H.261 Kbps and optimised for low motion video. The encoding complexity of H.261 is far n*64 Kbps
less than MPEG codecs [28]. H.261 supports CIF (Common Intermediate Format)
which is a standard resolution of 352x288 and also QCIF (Quarter CIF) at 176x144.
H.363 extends H.261 to support a wider range of screen resolutions including CIF,
QCIF, 4CIF (4*CIF), 16CIF (16*CIF) and SQCIF (Sub QCIF) and bit-rates not 20 – 500
H.263
limited to multiples of 64 Kbps. The H.263 codec uses a similar technique to MPEG Kbps
I-P-B frames but is nevertheless incompatible.
Table 5 - Widely used Video Codecs
Some of the codecs listed in Table 6 (above) are suited to specific applications such as DV for
high quality video editing and H.261/H.263 for video conferencing whereas others such as
MPEG-4 offer a more generalised service. When choosing a codec for a specific application
we need to take into account not only the compressed bit-rate but also coding complexity as
low-end devices such as PDAs may have to rely solely on software and battery power to
decode their video.

20
PhD 1st Year Report - Nick Blundell - 04/11/2002

Video Broadcasting
In recent years we have seen widescale deployment of digital TV set-top-box receivers in
homes throughout western countries with many TV broadcasters offering both digital and
analog transmissions of their programmes. As many as 6 digital channels can be fitted into
the same bandwidth taken up by a single, traditional analog TV signal and with the benefit of
error correction techniques the received digital signal is presented to the viewer with little or
no degradation from when it left the broadcaster. These set-top-boxes currently connect to
specialised broadcasting networks through radio, cable, optical fibre and satellite [25].

Video broadcasting is currently possible on the Internet through technologies such as IP


Multicast on the MBone (see section MBone) and third-party CDNs (see section Content
Delivery Networks) with organisations such as NASA frequently broadcasting space mission
footage on the MBone to academic networks throughout the world and TV shows such as Big
Brother in the UK and USA broadcasting 24 hour live camera coverage over CDNs to
millions of Internet viewers through the quality is far from that offered by dedicated broadcast
networks [30,31].

More recently than digital TV appearing in homes is the introduction of broadband


connections allowing higher speed Internet access (up to 2 Mbps) and the possibility to
receive, amongst other things, VHS quality video streams [32]. It is likely that with the
increase in home bandwidth access, TV broadcasting will be done entirely over the Internet
allowing anyone to become a TV broadcaster and the potential for an infinite number of
channels to exist [25]. However issues with the scalability and deployment of Internet group
communication techniques must be tackled in order to enable distribution of high bandwidth
data to the millions of potential Internet TV subscribers.

Video on Demand
The next step up form Internet video broadcasting of set, scheduled programmes is Video on
Demand (VoD) whereby viewers can select to watch programmes when it is convenient for
them. Some digital TV movie broadcasters use multiple channels to show popular films,
starting multiple film broadcasts with about 15 minutes between them allowing viewers more
choice when to watch the film.

In Internet (or even digital and analog TV) broadcasting it is not scalable to have separate
video streams for all the possible viewers wishing watch popular programmes [33].
Techniques have been proposed for VoD systems which try to optimise delivery for total
server bandwidth usage, initial viewing delay and receiver storage requirements. The
proposals assume multicast delivery in the network and generally synchronise receivers to
share the same multicast streams.

As an example, a technique known as skyscraper VoD broadcasting breaks a video


programme up into segments of geometrical increasing lengths and broadcasts each video
segment on a separate multicast channel [34]. The choice of segment sizing is a compromise
between repeated broadcast of many small segments which requires higher server bandwidth
and the broadcast of few large segments requiring less server bandwidth but increasing viewer
wait-time to begin watching the programme. The initial small segments used in skyscraper
broadcasting allow a small wait-time for viewers to begin watching from the start of the
programme and subsequent larger segments keep down the number of required multicast
channels. The skyscraper technique requires clients to download video from two multicast
channels concurrently – one channel for the currently viewing segment and the other for pre-
caching the next segment. This technique can provide viewers with a mere 38 second (0.007
* 5400) maximum wait-time to begin watching a 90 minute movie [35].

21
PhD 1st Year Report - Nick Blundell - 04/11/2002

Another technique proposed to aid video streaming and VoD systems is dynamic video
caching. The SOCCER (Self-Organising Cooperative Caching Architecture) architecture
uses proxy helpers (caches) strategically placed around the network which self organise into
cache meshes, exchanging information about their current cache state with each other [36].
Initially when all caches are empty and a client requests a video stream the request is directed
to the origin server and the stream flows from the origin server then through the clients
nearest proxy cache, where a few seconds are cached in a ring buffer, then to the requesting
client. If another client request the same video after a short period of time it may share the
cached stream from the earlier clients local proxy helper. If the second client request time is
longer than the initial clients proxy helper ring buffer then the early data in the video stream
will have to be patched from either the origin server or some other cache in the network.

Video Conferencing
Video conferencing usually refers to video with audio conferencing which can be setup and
controlled in much the same way as audio-only conferencing as described in section ().
However, video data cannot be mixed or expressed in spurts to save bandwidth as effectively
as audio resulting in the use of less scalable conferencing models. For example, a video
conference using IP Multicast with n participants would require each member to have a
minimum bandwidth of n*V, where V is the bandwidth required by a single video stream
whereas the minimum bandwidth required for an audio-only IP Multicast conference
participant would be t*A, where t is the number of concurrent talkers and A is the bandwidth
required by a single audio stream allowing it to scale to large conferences.

Though less effective than audio mixing at maintaining quality while reducing bandwidth
video from several sources can be downscaled and panelled into a single video stream by a
video mixer. The result is an image composed of several other images like a monitor
displaying several security camera images on the same screen. If the source streams have
already been compressed decompression, downscaling and then recompression by the mixer
will substantially lower the mixed image quality.

H.261 and H.263 codecs (see section Video Compression) are designed specifically with
video conferencing in mind optimising image quality for low motion ideal for displaying the
detail of a fairly stationary person’s face. MPEG-4 is also well suited to low bandwidth
conferencing with the ability to track image objects though is aimed as more general video
other than just conferencing.

Summary
This section has presented a detailed view of streaming video systems, differentiating
between analog and digital video and describing various techniques of video capture and
compression. Distinction has been drawn between video and audio group communication for
broadcasting and conferencing systems.

22
PhD 1st Year Report - Nick Blundell - 04/11/2002

Overlay Networks
This section examines the current trend in use of application level networking, also known as
overlay networks, to solve problems of data distribution among hosts in the Internet.
Multicast, described in the following subsection, is one technique for efficient multiparty
communication on the Internet allowing multimedia streaming applications for large groups
of users. A subsection on CDNs (Content Delivery Networks) is also given which have in the
past been used for relieving heavily loaded web servers by moving stored web content to the
edge of the Internet, nearer to the clients but now such overlay networks are also being used
to distribute live content. Finally programmable networks are examined for their similarity to
application level networking and usefulness for deploying efficient proxy-based ALM servers
about the Internet.

Multicast
Group communication refers to one-to-many and many-to-many data communication between
hosts on a network. This type of communication is typical of systems such as TV and radio
broadcasting where a relatively small number of sources transmit programmes to potentially
millions of subscribers. On a smaller scale multicast communication is used by conferencing
and gaming systems where data is required to be sent from any participants to all of the other
participants.

With the advent of commonplace multimedia enabled computers and the ability to stream
audio and video data between them much time and effort has been spent by researchers trying
to find efficient techniques to implement multicast.

A simple solution to Internet group communication uses unicast whereby the source sends a
duplicate of each data packet to each receiver. Unicast scales badly for large groups or for
high bandwidth data typical of audio/video streams as the sender requires a network
bandwidth of n times the data stream bandwidth to send to n receivers simultaneously.

Figure 5 (below) shows a video sever unicasting 1.9Mbps replicated video streams to 5 video
clients. If the video server has a 10Mbps network connection then 5 clients (5 x 1.9 =
9.5Mbps) would be the maximum achievable using unicast. Inefficient network use can also
be observed in other areas of the network where multiple identical data streams pass over the
same network paths.

Figure 5 - Load Placed on a Unicast Video Server and Network for a Small Group

23
PhD 1st Year Report - Nick Blundell - 04/11/2002

The following subsections describe IP Multicast, the traditional network layer solution to
group communication and then the more recent ALM (Application Layer Multicast) approach
requiring no extra support from the network.

IP Multicast
IP Multicast currently offers the most efficient technique for one-to-many and many-to-many
communication in IP networks by moving data packet replication from the end-system (as
with unicast) into multicast routers to distribute data using multicast routing protocols.

IP Multicast uses the concept of multicast groups which clients can join to distribute data.
Multicast groups are identified by a reserved subset of IP addresses in the range of 224.0.0.0
to 239.255.255.255 which is further subdivided to include reserved addresses and locally
scoped addresses (see Table 6 below) [37].

IP Multicast Address Range Usage


Link Local Addresses: Reserved for local network protocols to
allow dynamic service discovery such as DHCP and router
224.0.0.0 – 224.0.0.255
location. Routers are configured to block messages with these
addresses from leaving the local network.
Globally Scoped Addresses: Used for Internet-wide
224.0.1.0 – 238.255.255.255
multicasting.
Limited Scope Addresses: Multicast in this address range is
239.0.0.0 – 239.255.255.255 limited to administratively scoped domains allowing address
reuse across different domains.
Table 6 - IP Multicast Address Ranges

IGMP (Internet Group Management Protocol)

IGMP is used by clients to communicate their interest in multicast groups to their local
multicast router(s). When a host wishes to join or leave a multicast group it sends a
REPORT(G) or LEAVE(G) message to the local multicast router respectively [38]. Also the
multicast router sends QUERY() messages periodically to all local hosts in order to verify
their group membership in case of failing hosts not issuing leave messages.

Multicast Routing

When a multicast router learns of hosts on the local network interested in some multicast
group it sets out to establish the required routing state within the global multicast network to
start receiving and forwarding group data to the local hosts.

Figure 6 - Video Server Utilising Efficient Multicast to Deliver Data to Clients

24
PhD 1st Year Report - Nick Blundell - 04/11/2002

Various protocols exist to route data from senders to receivers of multicast groups and
generally the multicast routers arrange themselves into trees for efficiency (see Figure 6
above). The choice of multicast routing protocol depends on whether groups have only one
sender or many senders, available space for routing state, or the size and number of multicast
groups. The following sections describe the most widely used multicast routing protocols.

DVMRP

DVMRP (Distance Vector Multicast Routing Protocol) routers use their own routing protocol
to discover other DVMRP routers nearby and together they construct SPTs (Shortest Path
Trees) for every possible sending network (see Figure 7 below).

Figure 7 – DVMRP SPTs for each Network


When a source S starts sending multicast data to group G it is flooded along the pre-
constructed SPT identified as (S,G) and is forwarded to receiving hosts through their local
multicast router(s). If a multicast router has no hosts interested in the group G it must
actively stop the flow of data to it by sending a prune message up the SPT (S,G) every few
minutes. DVMRP is a ‘dense mode’ protocol as it assumes that all networks want to receive
the data before being told explicitly to prune off those that don’t.

PIM-DM

Like DVMRP, PIM-DM (Protocol Independent Multicast – Dense Mode) is a dense mode
multicast routing protocol that assumes the majority of networks are interested in receiving
multicast data. However, unlike DVMRP, PIM-DM makes use of the underlying (unicast)
routing protocol tables and therefore doesn’t need to explicitly discover the network topology.

25
PhD 1st Year Report - Nick Blundell - 04/11/2002

Figure 8 - Reverse Path Forwarding


When a source S starts sending multicast data to group G it gets flooded on all router
interfaces away from the source on the RSP (Reverse Shortest Path). If data is received by a
multicast router on a non-shortest path interface it is discarded and a prune message is sent
back to the offending router that sent the data telling it to stop sending data through that
interface (See Figure 8 above). The pruning creates a shortest path tree (S,G) from source to
all receiving networks who must then actively send prune messages if they do not want to
receive group data as with DVMRP.

PIM-SM

PIM-SM (Protocol Independent Multicast – Sparse Mode) is protocol independent as with


PIM-DM in that it makes use of existing routing tables to route multicast data however unlike
DVMRP and PIM-DM, PIM-SM does not assume that every network is interested in
receiving data before receiving prune messages but requires multicast routers on the network
edge to explicitly pull data towards their local hosts.

PIM-SM supports shared trees to accomplish multi-source transmission on a multicast group


whilst conserving routing state in the routers. A shared tree is identified by (*,G) and routers
map the group identifier to a multicast router named a rendezvous-point (RP). The RP acts as
a source to a SPT used to transmit data throughout the group. Hosts wishing to send to the
group tunnel their data to the RP where it flooded through the tree (*,G) to all participants.

If the data-rate from a source being sent through the shared tree exceeds some threshold (i.e.
high data-rate or long lived transmission) the multicast routers can switch from shared tree
(*,G) to SPT (S,G) (see Figure 9 below). This switchover happens in the following steps
[39];
• Source S is tunnelling data to the RP and through the shared tree (*,G).
• The data-rate at the RP exceeds some predefined RP threshold (i.e. the RP may
become overloaded if there are many high data-rate senders).
• RP sends a Join(S,G) towards the source S and begins to receive data natively (un-
encapsulated) from the source.
• If leaf routers notice data-rate from source S has exceeded some receiver threshold, as
with the RP, they send a Join(S,G) towards the source and switch from the shared tree
to receive on the more efficient SPT (S,G).

26
PhD 1st Year Report - Nick Blundell - 04/11/2002

Figure 9 - PIM-SM Shared-to-Source Tree Switchover

MBone

Administrative domains may implement IP Multicast within their local networks for efficient
group communication, but as of yet IP Multicast has not been deployed in the global Internet.
Currently when IP Multicast enabled domains wish to participate in multicast groups with
other enabled domains they connect through the MBone (Multicast backBONE) a research
network made up of tunnels between IP Multicast enabled domains including universities and
commercial networks.

Application Level Multicast


Though IP Multicast has been shown to be efficient for group communication applications in
the MBone research network, over a decade after its proposal, for reasons identified in the
next section, it has not been deployed in the global Internet. In response to the delay in
deployment and further complexities of IP Multicast there has been a revisited interest in the
use of unicast at the application layer to achieve one-to-many and many-to-many
communication.

Figure 10 - Comparison of (a) Unicast, (b) IP Multicast and (c) ALM

Application Level Multicast (ALM) aims to tackle scalability issues of original unicast
techniques by distributing data replication among the group members in an adaptable and
efficient manner. Figure 10 (above) illustrates the difference between simple unicast, IP
Multicast and ALM.

Though not as efficient as IP Multicast in terms of data duplication on links or in terms of


delay the illustrated ALM technique reduces load on the server depicted in the unicast
scenario (Figure 10(a)) without requiring any help from the network infrastructure.

Problems with IP Multicast

27
PhD 1st Year Report - Nick Blundell - 04/11/2002

The major problems facing IP Multicast deployment are routing protocol scalability and the
need for changes to router software and possibly hardware at the infrastructural level.

IP Multicast offers an unreliable service which makes it inefficient to implement reliable


protocols at the applications layer [40]. For example, a source would need to retransmit to all
group members if only a few didn’t receive data otherwise it would have to try and unicast the
lost packet to them. The following sections summarise various problems with IP Multicast
slowing its deployment.

Model Scalability
IP Multicast requires routers to hold state for active groups in order to forward multicast data
to group members which places heavy storage requirements on routers making it unscalable
for large numbers of groups. This per group state breaks the stateless nature of IP routers that
has contributed to the success and growth of the Internet and will inevitably slow down router
forwarding through extra processing of long multicast routing tables [40].

Requires Large-Scale Infrastructural Changes


To deploy IP Multicast throughout the entire Internet would require modification and possibly
replacement of every router connecting the millions of hosts worldwide. This alone would
take many years to complete and it might not be in everybody’s interest to make such
changes.

Administrative Domain Acceptance


Many ISPs (Internet Service Providers) and network domain administrators are reluctant to
switch on global multicast services due to a lack of channel access control which could
potentially allow malicious users to carry out Denial of Service attacks on the network [41].

Routing protocols are also a problem for private networks. For example, dense mode routing
protocols as used in the MBone result in networks frequently being flooded with unwanted
data from active multicast groups. Also, sparse mode protocols require configuration of RPs
(Rendezvous Points) which must be shared between private networks but commercial
networks such as ISPs are usually unwilling to trust other ISPs to provide RPs for their clients
[38,42].

Application Requirements
The base service offered by IP Multicast is not suited to applications requiring a reliable
service for the delivery of data. Though reliability could be achieved through the use of more
complex protocols at the network layer it would be undesirable as this would increase router
cost requiring more processing power and storage.

IP Multicast assumes that all members of a group have equal, minimum capabilities which
may not be the case for groups of heterogeneous host devices ranging from powerful desktop
computers with high speed CPUs and network connections to handheld PDAs (Personal
Digital Assistants) connected on low bandwidth wireless networks.

Advantages of ALM

ALM avoids issues such as infrastructural changes and administrative domain reluctance that
is hampering the deployment of IP Multicast and lends itself to naturally support value added
services such as reliability and channel access control at the application level. The following
subsections describe the main benefits of ALM over IP Multicast.

Deployment

28
PhD 1st Year Report - Nick Blundell - 04/11/2002

ALM techniques rely only on the best effort unicast service provided by the Internet requiring
no infrastructural changes to Internet routing hardware or software. ALM systems typically
fall into two categories: peer-to-peer (P2P) and proxy-based. P2P ALM applications come
packaged with there own ALM protocols allowing the construction of overlays and instant
multicast communication with other peers whilst proxy-based ALM systems consist of
incrementally deployed proxy servers placed around the Internet to which local clients
connect with to send and receive multicast data. The proxy servers construct overlay
networks just as clients do in P2P systems with the overlay efficiency increasing as more
proxy servers are deployed [40].

Application Requirements

By creating multicast overlay networks at the application layer services required by a


particular application may be tailored to its needs. For example, an application may require a
specific multicast routing protocol such as dense mode or sparse mode or even a reliable
multicast protocol. Reliability can be simplified by using an existing application layer
protocol such as TCP.

Hosts participating in an ALM session can utilise their processing and storage capabilities to
provide additional services to the overlay such as transcoding or mixing data to support
heterogeneous hosts in the group [40].

Adaptation

Application level networks have the advantage of being able to adapt to network conditions
such as congestion or failed routers by testing the quality of connections to overlay peers and
simply dropping or adding new links as necessary. Such flexibility is not possible in Internet
routers which usually take longer to correct problems and may require static routing
configuration to add or remove links to other routers. In some cases it is possible to exploit
more efficient routes between Internet sites through application level tunnelling.

Current ALM Systems

This section briefly describes characteristics of current proposed ALM architectures. The
ALM systems described in this section tend to either support large-scale single-source
multicast or small-scale any-source communication and the overlay nodes typically organise
themselves into mesh or tree structures mapping efficiently to the underlying network
topology for some application defined metric such as bandwidth, RTT (Round Trip Time) or
both. For each architecture both the overlay construction and adaptation techniques are
described along with any specific features.

Narada
Overlay Structure:
Narada supports any-source multicast for small-scale group applications. The Narada data
overlay network is created in a two-step process firstly with nodes establishing themselves
into an optimised well-connected control mesh then construction of source trees from each
potential sender to every receiver using a subset of existing mesh links much like DVMRP
used by IP Multicast routers in the MBone (see section DVMRP) [40].

Adaptation:
The quality of resultant source trees is dependent on the mesh quality which is periodically
tested by nodes to see if adding or dropping mesh links will result in better connectivity to the
majority of mesh nodes. Nodes may probe any other node they have learnt of in the group to
see if a link to that node will increase the mesh quality.

29
PhD 1st Year Report - Nick Blundell - 04/11/2002

Narada nodes learn of all group members by passing tables of other nodes they discover to
their neighbours making it possible to detect and repair mesh partitions caused through node
failures. To detect failure nodes also expect to periodically receive keep-alive messages from
their peers.

ALMI
Overlay Structure:
ALMI (Application Level Multicast Infrastructure), like Narada is aimed at small any-source
multicast applications however unlike Narada’s distributed overlay construction, ALMI relies
on a central session controller node to calculate a bi-directional MST (Minimum Spanning
Tree) data distribution overlay to be built between the nodes registered with it. The session
controller can reside on one of the participating group nodes or on a well known external
node.

A new node joins by firstly contacting the session controller who returns a list containing a
subset of the existing group nodes to become its neighbours. One of the nodes returned is
becomes parent of the new node and the others become its children [42].

Adaptation:
Initially the session controller knows nothing of the topological relationships between its
nodes and so early calculations of the MST are based on random decisions however the
session controller strategically instructs certain nodes to probe a selection of other registered
nodes and report back with the application defined metrics they discover. The session
controller gradually optimises the MST based on the cost of links between nodes it learns of
and issues nodes with neighbour update messages to configure them into the newly optimised
MST. When a node is given a new set of neighbours, in order to avoid loops it remembers a
few generations of past neighbour configurations and uses a tree incarnation number in the
protocol packets to route older packets, that originated on the previous tree generation,
through the appropriate node neighbour configurations.

Overcast
Overlay Structure:
Overcast is aimed at single-source high bandwidth multicast applications such as VoD (Video
on Demand) or TV broadcasting. Overcast is a proxy-based ALM system whereby overlay
network nodes are strategically located proxy servers. Clients connect to multicast sessions
through joining the closest strategically positioned proxy servers.

When a new proxy node is installed it contacts a well known Overcast root proxy to discover
which virtual Overcast networks it should join. The new proxy node joins a source tree for
each of the content distribution networks it is assigned to using a simple algorithm which
places it as far away from the network sources as possible without sacrificing bandwidth
resulting in a long narrow tree structure [43].

Adaptation:
An Overcast proxy node periodically checks to see if it can move lower down in the tree by
taking bandwidth measurements through its current siblings. If it can move further down the
tree without too much bandwidth degradation it will do so. Likewise it will re-check again
with its grandparent to see if it is necessary to move up in the tree due to a reduction in
perceived bandwidth. In the event of children detecting a failed parent they simply rejoin
their grandparent or other ancestry above the failed node. Overcast avoids loops in the tree by
nodes not becoming parents of nodes believed to be ancestors.
Each node periodically reports in with its parent with a keep-alive message and details
overlay changes such as birth and death certificates of its children and their descendent

30
PhD 1st Year Report - Nick Blundell - 04/11/2002

nodes. The information propagates up towards the root who learns the status of all nodes in
the tree allowing it to divert joining clients to the most suitable up-and-running Overcast
proxy. To increase robustness and lighten the load on a single proxy root from joining clients
the root DNS name resolves to a number of replicated root node IP addresses in round-robin
fashion.

Yoid
Overlay Structure:
Yoid (Your Own Internet Distribution) is a complex ALM system aimed at small to large-
scale groups with potentially thousands of nodes and supports a variety of applications
ranging from file transfer to real-time conferencing.

A new node joins a Yoid group through a rendezvous node associated with the group
identified by a URL such as yoid://rendezvous.domain.com:3456/meeting1 where meeting1 is
the name of the specific group to join at the rendezvous node. The rendezvous node has only
limited knowledge of the current group membership such as who is the current overlay root
node and a handful of potential parent nodes for passing to new nodes allowing them to graft
onto the shared tree. Once a node has selected a suitable parent from those offered by the
rendezvous it joins the shared tree which is then used to send and receive data.

Nodes also graft onto a well connected mesh by discovering and connecting to several
random nodes on the network who become mesh neighbours. Nodes use the control mesh to
discover other nodes such as standby parents should the current parent fail or suffer
degradation in quality. In extreme cases of tree reorganisation and disruption data can also be
broadcast over the mesh [44].

Adaptation:
Yoid does not rely on active probing techniques to organise the tree efficiently but adapts to
latency and data-loss only when data is flowing. To optimise a nodes location to sender
latency it opens tentative links to other potential parents who forward one in N data frames
along both the tentative links and the shared tree which the node timestamps on arrival. If the
tentative link offers a substantial reduction in latency the node tries to join the more optimal
parent and drops its worst tree neighbour.

Yoid does not directly optimised the data tree for bandwidth but reorganises nodes when they
detect high-loss rates. In the event of loss nodes communicate their perceived loss-prints
(patterns of frame losses) to their tree neighbours. When a node compares the upstream and
downstream loss-prints received and finds that loss is only experienced downstream it
deduces that it is likely the cause for loss and reduces its fanout by dropping one or more of
its downstream connections.

Yoid detects loops that arise from simultaneous parent changes through nodes propagating
their new root-paths to children. If a child sees that it is already in the root-path then there
must be a loop and so breaks the loop by joining another parent.

Yoid can also make use of locally scoped IP Multicast for nodes belonging to the same
network domains by arranging such nodes into clusters and having one member of the cluster
connect to the shared tree relaying data between the tree and other members of the cluster
using multicast.

Scattercast
Scattercast is a large-scale proxy-based ALM system like Overcast but uses a similar mesh-
first and source tree approach to Narada. Like Yoid, Scattercast also makes use of localised
IP Multicast where available [45].

31
PhD 1st Year Report - Nick Blundell - 04/11/2002

Scattercast proxy servers are logical entities that run inside of strategically place cluster units.
When a client joins a particular group it joins the nearest appropriate cluster unit which has a
proxy instance connected to the group distribution overlay.

Scattercast does not assume any particular transport protocol for data delivery but allows
configurable transport modules to be installed on the cluster units providing support for real-
time, reliable, congestion control and other transport protocols suitable for various
applications. Data from a source can consist of multiple channels for different media such as
audio, video and text which may use different transport protocols. For example, a text
channel may be routed reliably whereas audio and video channels can be routed unreliably.
The plug in modules can also perform higher level functions such as transcoding images or
media streams or filtering web content for low bandwidth PDA clients.

Scalable Adaptive Hierarchical Clustering


Overlay Structure:
When a node joins the overlay network it takes a measurement of its distanced to the well-
known overlay root node and then sends a join request and results of its measurement to the
root node [46]. The root node categorises the joining nodes distance from itself into a zone
(i.e. for a metric of round-trip time ‘zone 0’ may 0-10ms, ‘zone 1’ many be 10-50 ms and
‘zone 3’ 50-100 ms etc…). If the root has children that lie within the same zone (i.e. a similar
distance from the root) as the joining node the root will request that the new node tries to join
each of these children, recursively, in the same way it tried to join the root until eventually the
node settles finding the nearest node to be its parent. However if the root node has no
children that lie within the same zone as the new node the root adopts the node as own child
completing the join.

Content Delivery Networks


In the past popular websites have relied on powerful server farms and high bandwidth
connectivity to the Internet to handle content requests from clients. Even with the advances
in Internet backbone bandwidth and computational power of servers this technique is not
scalable. Server hardware costs grow exponentially with client capacity and with the ever
increasing demand for high bandwidth content such as audio/video this is even more apparent
[47].

The result of such overloaded systems is that users quickly become bored waiting to
download web pages and decide to look elsewhere for the services/information they seek.

The aim of Content Delivery Networks (CDNs) is to alleviate the load on origin servers
belonging to popular websites and reduce content access latency perceived by clients by
caching and replicating the website content around the Internet where clients requests can be
intercepted and satisfied closer to their location.

CDNs are usually hosted by third-party organisations who offer their services to popular
websites. Examples of such CDNs are Akamai, Inktomi, RealNetworks, Venation, Cisco,
Castify and Digital Island.

The following sections describe the main tasks of CDN organisations from distributing
website content throughout their CDNs to matching clients to the nearest POP (Point of
Presence) able to satisfy their queries.

32
PhD 1st Year Report - Nick Blundell - 04/11/2002

Content Distribution and Management


CDN organisations install their own content servers and caches around the Internet trying to
cover as much of the network they wish to serve. ISPs (Internet Service Providers) are ideal
candidates for CDNs to locate their POP allowing the CDNs to extend to ISP clients.

When an organisation chooses to use a CDN to handle website distribution it specifies the
content it would like to make available to a large audience through the CDN organisation who
then arrange for it to be distributed among their POPs.

Stored content such as HTML documents, images and video files can be cached at the POP
either before or after client requests are made for certain objects, known respectively as pre-
caching and just-in-time caching.

Many CDNs now also support live streaming of audio and video which is distributed
throughout the CDN by splitting where a live stream is fed from the origin server to the CDN
where it is replicated to POPs with interested clients. Live streamed data in CDNs can take
up to 20 seconds to reach a client and is said to be semi real-time [47].

Content Routing
Content routing refers to the redirection of clients to the most suitable source of content in the
CDN. Redirection is usually done transparently so the client believes their content is being
served directly from the origin server and the clients applications such as browsers or media
players require no modification. When a client contacts a CDN hosted website it is first
globally redirected to the most suitable POP and then if necessary locally redirected to a
server in the POP server farm (see Figure 11).

Figure 11 - Operation of a General CDN (Content Delivery Network)

Global Redirection

A simple global redirection technique uses the origin site’s DNS server to resolve the
requested DNS name to the IP address of the most suitable POP for the client rather than that
of the origin server. The suitability of the POP is usually determined by the clients network
address (location) and current loads on other POPs [48].

Akamai use another form of global redirection whereby the origin site serves modified web
pages with links to rich content such as images, multimedia and large files altered to point to
POPs nearer to the client [49].

33
PhD 1st Year Report - Nick Blundell - 04/11/2002

Local Redirection

Server class machines can support up to several hundred simultaneous high quality video
streams and dedicated media servers even higher at several thousand but for popular CDNs
used by popular websites POPs are usually required to be server farms composed of several
server machines so after global redirection to the appropriate POP a client must be locally
redirected to the appropriate POP server [47].

One technique for local redirection uses intelligent layer 4-7 switches as an entry point for
clients to a server farm. The layer 4-7 switch can redirect client requests to the most suitable
server based on information in OSI protocol layers 4-7 of the request such as URLs or media
type [47].

Local redirection is also achieved through WCCP (Web Cache Communication Protocol)
implemented by Cisco routers. Local caches belonging to cache farms use WCCP to
announce the type of protocols they cache (i.e. HTTP or video) to Cisco routers who then
forward packets containing specific protocol ports to the appropriate caches [47].

Active Networks
Active networks allow new routing and packet manipulation programs to be dynamically
loaded onto network nodes to customise the way in which they handle packets. New
programs are sent in active capsules which are special packets sent along with normal data
packets on network links [50]. Depending of the level of node programmability active
networks can allow customisation ranging from dynamic installation of new network protocol
and services such as multicast and QoS routing to protocol translation for gateways to high
level packet manipulation allowing web content transformation or media transcoding.

Active networks have been implemented at the network layer as active routers which support
easy updating of protocols and allow loading of temporary services to support individual
streaming sessions. LARA++ is one such active router architecture by researchers at
Lancaster University [51].

Active networks have also been implemented at the application layer allowing
programmability of overlay network nodes. Funnel-Web by University of Technology,
Sidney is one such application layer active network which allows Java programs to be
dynamically loaded onto remote overlay network nodes performing services such as media
transcoding, caching web content and establishing TCP bridges between remote networks
[52].

Summary
This section has examined various overlay networks from multicast to CDNs (Content
Delivery Networks) to active networks. The deployment problems of IP Multicast were
described and the potential of ALM (Application Level Multicast) as an alternative for group
communication on the Internet.

34
PhD 1st Year Report - Nick Blundell - 04/11/2002

Motivation
Research into overlay network support for streaming multimedia systems and ALM in general
is motivated mainly by problems with IP deployment and acceptability, cost and limitations
of CDNs and promising results so far achieved by overlay network research. These
motivating factors are discussed in more detail in the following sections.

IP Multicast not Deployed and Inflexible


IP Multicast offers efficient group communication but technical difficulties have hampered its
widespread deployment and interest is being diverted to alternatives such as CDNs and ALM
for data delivery to multiple hosts on the Internet. Though inherently less efficient ALM can
offer services such as reliable multicast, channel access control and adaptation to varying
network conditions not supported or too complex to implement in IP Multicast.

It is also argued that the very success of the Internet today is based on the simple nature of IP
and the unicast routing protocols it uses. Adding per group state to routers is not scalable and
as it is possible to implement multicast at the application layer this feature should be left out
of the network layer as with such services as TCP where reliable transport is left to
applications [40].

Statically Configured CDNs


CDNs offer multicast communication but require static configuration to serve files or streams
from hosted websites. A website must register with a CDN organisation and then specify the
content it wishes to make available through the CDN which make take hours or days before
becoming available throughout the CDN. CDNs cost money to deploy and maintain and so
the service can be very expensive for websites. Also, with the variety of different CDNs
available each with differing network coverage and QoS it is difficult for a web organisation
to decide which one offers the best service for distributing their data especially with lack of
inter CDN communication.

CDNs can pretty much guarantee to significantly lighten the load on origin servers but cannot
make the same guarantees to serve clients from the most efficient POP due to the inaccuracy
of commonly used client location techniques based solely on IP addresses [53].

Current ALM Technology


There are many emerging ALM technologies which prove to reach efficiency comparable to
IP Multicast for communication in broadcasting and conferencing systems however the area
is still developing and lessons need to be learnt by objectively comparing properties of each
technique such as the time taken for an ALM to converge into an efficient overlay or
resilience to failure and oscillation.

The problems associated with single-source ALM are better understood than those with any-
source ALM. Any-source communication is usually accomplished by nodes forwarding data
to peers or tunnelling it to the root of a shared tree resulting in high worst-case latency and
overloading of the root respectively or is achieved by construction of SPTs for each sender
which can result in SPTs interfering with each other when certain members send

35
PhD 1st Year Report - Nick Blundell - 04/11/2002

simultaneously. More research must be carried out in the area of any-source ALM looking at
support real-time conferencing applications such as audio and video conferencing.

Proposed Research
The proposed research aims to investigate use of overlay networks in support of live
multimedia streaming systems (i.e. audio/video conferencing, live broadcasting). Overlay
networks need not simply emulate IP Multicast at the application layer but can utilise the
flexibility of end-systems to enable more intelligence in the overlay network. The following
subsection identifies open issues or areas with the potential for further research into
application level networking for live and real-time multimedia streaming systems. A
discussion of early ideas is given in the final subsection.

Open Issues
This section gives a discussion of open issues regarding research into overlay networks for
multimedia streaming systems.

How Can Overlays be Tailored to Applications?


An overlay network offering a general multicast service such as Yoid may not be the most
suitable for a given specific application. Most of the ALM techniques (see section
Application Level Multicast) are designed with specific types of applications in mind: either
large-scale single-source or small-scale any-source communication optimised for metrics such
as latency, bandwidth or latency and bandwidth. To gain full advantage of using overlay
networks an application may implement its own proprietary overlay or in another extreme
could use a configurable overlay API to define the services it requires (i.e. choose
communication semantics such as multicast routing protocol, performance metrics etc…).

Conferencing systems have stringent delay requirements in order to allow participants to


interact naturally and typically require any-source multicast allowing any member to
contribute speech/video to the conference. Audio and video distribution on the overlay could
be approached separately for example audio can be transmitted in talkspurts and mixed to
conserve bandwidth without much hindrance on quality whereas video is less flexible and
must be delivered continuously with good quality video requiring relatively high bandwidth.
In low bandwidth scenarios it may be desirable to sacrifice video data for audio requiring a
distinction of the two streams.

Broadcasting systems are less sensitive to delay than conferencing systems allowing several
seconds of delay to go unnoticed compared to tens of milliseconds for conferencing. Overlay
networks for such systems are usually optimised for bandwidth. The overlay could be
intelligently arranged to exploit the acceptance of higher latency in such systems and have
capable nodes transcode or mix data for lower bandwidth nodes. Again, with the separation
of overlay delivery for different types of media streams such as audio and video this could
allow movies to be efficiently streamed with audio for different languages or with different
text for subtitles.

How Can Overlays Exploit Heterogeneity?


Hosts of a multicast group may have different capabilities such as bandwidth, CPU power,
display capabilities and mobility. It is not sufficient to simply distribute minimal quality data
to accommodate the low end participants but to have the overlay adapt to such scenarios

36
PhD 1st Year Report - Nick Blundell - 04/11/2002

through transcoding and mixing where appropriate. Further work in overlay networks is
required to explore the exploitation of heterogeneity among groups of hosts in order to
intelligently organise overlays (i.e. dynamically establish transcoding and mixing in the
overlay).

How Can Overlays Adapt with Minimal Disruption?


Unlike IP multicast routing, application level networks can adapt to network conditions
during a streaming session by adding and dropping links to peers. This property of overlays
makes them resilient to network congestion and node failures but can also result in high data-
loss and jitter during transitioning. It is important not to make the transitioning decision too
sensitively in order to avoid instability and oscillation through constant reconfiguration while
maintaining efficient use of both the overlay and underlying network. Further work is
required to measure the effects of such transitioning and explore techniques to reduce
disruption to streaming systems especially in the case of real-time and live systems.

Initial Ideas
This section gives a few early ideas formed with regard to optimising overlay networks for
delay sensitive streaming systems such as group conferencing.

Adaptation to Application Usage Patterns


An overlay network could not only adapt to the underlying network such as node bandwidth
and latency but also to the application usage patterns. For example, consider an audio
conference consisting of highly bandwidth-restricted nodes using end-system mixing (see
section VoIP Conference Models) to distribute audio between participants. If there are
several mixers between communicating participants then the cumulative mixer delay may be
too high for them to communicate naturally. It may be possible to reduce the perceived delay
of participants by moving them closer, on the overlay, to the mixer that is mixing for the
person with whom they talk simultaneously. Listeners not talking will be less sensitive to the
conversation delay and so can be moved further away from the current conversation mixer.
This movement could be done gradually by swapping pairs of nodes between mixers and
using delay adaptation to mask the change in delay from the user.

Resilience through Controlled Loops


Loops are usually avoided in data overlay networks but if introduced in a controlled manner
could provide extra resilience when node failure occurs. For example a node wishing to send
data will rely on one of its peers to send data to a certain portion of the network. In a delay
sensitive system it is not be desirable for a node to wait to find out that a peer has failed
before reconnecting to the network portion served by the failed peer. Redundant or MDC
(Multiple Descriptor Coded) data could be sent on a separate overlay network with different
peers. The technique of using MDC on separate paths or overlays also has the potential to
help lower latency with limited disturbance by allowing testing of alternate overlay links at
the cost of only temporarily reduced audio quality [9]. This could be considered an extension
to Yoid’s ability to broadcast data on its control mesh but this happens only after detecting a
failed peer [44].

37
PhD 1st Year Report - Nick Blundell - 04/11/2002

Conclusions
This report has identified Internet multimedia streaming technologies and current issues with
such systems in order to give motivation for a PhD thesis researching the suitability of
overlay networks for such systems especially where live and real-time multimedia data
transmission is involved such as live video streaming and video conferencing systems.

Despite increases in network bandwidth and computer processing power it is still necessary to
create efficient and scalable solutions for data delivery especially for group and broadcast
communication. While processing power increases smaller and smaller devices will have
increased capabilities but there will always exist differences in capabilities from one
specialised device to another (i.e. a desktop computer and a network enable wrist watch)
requiring solutions that take into account heterogeneity (i.e. with regard to battery life,
processing power, network connectivity etc…).

References
[1] S.W. Smith, "The Scientist and Engineer's Guide to Digital Signal Processing",
California Technical Publishing, 1997.
[2] I. Kouvelas and V. Hardman, "Overcoming Workstation Scheduling Problems in a
Real-Time Audio Tool", In Proceedings of Usenix (1996) p235-242.
[3] R.J.B. Reynolds and A.W. Rix, "Quality VoIP - An Engineering Challenge", BT
Technology Journal 19 (2001).
[4] D.J. Thorne, "VoIP - The Access Dimension", BT Technology Journal 20 (2001).
[5] N. Laoutaris and I. Stavrakakis, "Intrastream Synchronization for Continuous Media
Streams: A Survey of Playout Schedulers", IEEE Network Magazine 16 (2002).
[6] J.C. Bolot, "End-to-end Packet Delay and Loss Behavior in the Internet", SIGCOMM
(1993) p289-298.
[7] J.-C. Bolot, S. Fosse-Parisis and D. Towsley, "Adaptive FEC-Based Error Control for
Internet Telephony", INFOCOM (1999).
[8] V. Hardman, M.A. Sasse, M. Handley and A. Watson, "Reliable Audio for Use over
the Internet", In Proceedings of INET (Oahu, Hawaii) (1995).
[9] Y.J. Liang, E.G. Steinbach and B. Girod, "Real-time Voice Communication over the
Internet Using Packet Path Diversity", In Proceedings of ACM Multimedia (Ottawa,
Canada) (2001).
[10] H. Sanneck, "Concealment of Lost Speech Packets Using Adaptive Packetization", In
Proceedings of IEEE Multimedia Systems (Austin, TX) (1998).
[11] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A Transport Protocol
for Real-time Applications", RFC 1889 (1996).
[12] O. Hodson, C. Perkins and V. Hardman, "Skew Detection and Compensation for
Internet Audio Applications", (2000).
[13] BBC Radio Website, "Live Internet Broadcasts", http://www.bbc.co.uk/radio/
[14] Oklahoma City Police Department Citizens Academy Website, "Live Police Radio",
http://www.ocpdcpa.com/PoliceRadio.htm
[15] A.S. Tanenbaum, "Computer Networks", Prentice Hall, 1996.
[16] "BT Telephone Conferencing Website", http://www.conferencing.bt.com/ (2002)
[17] J. Rosenberg and H. Schulzrinne, "Models for Multi Party Conferencing in SIP",
Internet Draft (2000).
[18] Internationl Engineering Consortium Online Tutorials, "Signaling System 7 (SS7)",
http://www.iec.org/online/tutorials/
[19] I. Dalgic and H. Fang, "Comparison of H.323 and SIP for IP Telephony Signaling".

38
PhD 1st Year Report - Nick Blundell - 04/11/2002

[20] D.R. Wisely, "SIP and Conversational Internet Applications", BT Technology Journal
19 (2001).
[21] P.J. Cordell, J.M.M. Potter and C.D. Wilmot, "H.323 - A Key to the Multimedia
Future", BT Technology Journal 19 (2001).
[22] PC Tech Guide Tutorial, "Digital Video Tutorial",
http://www.pctechguide.com/24digvid.htm
[23] K.J. Kuhn,"Conventional Analog Television - An Introduction",
http://www.ee.washington.edu/conselec/CE/kuhn/ntsc/95x4.htm (1995)
[24] M. Brain,"HowStuffWorks: How Television Works",
http://www.howstuffworks.com/tv.htm
[25] "PC Tech Guide: Digital Video Tutorial", http://www.pctechguide.com/24digvid.htm
(2002)
[26] K. Nice and G.J. Gurevich,"HowStuffWorks: How Digital Cameras Work",
http://www.howstuffworks.com/digital-camera.htm
[27] M. Liste, "A Practical Guide to Streaming Media", TERENA Networking Conference
Slides (Antalya, Turkey) (2001).
[28] J. Hunter, V. Witana and M. Antoniades,"A Review of Video Streaming over the
Internet", http://archive.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html
(1997)
[29] P.N. Tudor, "MPEG-2 Video Compression", Electronics and Communication
Engineering Journal 7 (1995).
[30] NASA Website, "NASA", http://www.nasa.gov/
[31] Channel 4's Big Brother Website, "Big Brother",
http://www.channel4.com/entertainment/tv/microsites/B/bigbrother/index.html
[32] "BT Broadband Website", http://www.bt.com/broadband/
[33] F. Fluckiger, "Understanding Networked Multimedia: Applications and Technology",
Prentice Hall, 1995.
[34] K. Hua and S. Sheu, "Skyscraper Broadcasting: A New Broadcasting Scheme for
Metropolitan Video-on-Demand Systems", SIGCOMM (Cannes, France) (1997) p89-
100.
[35] A. Mahanti, D.L. Eager, M.K. Vernon and D. Sundaram-Stukel, "Scalable On-
Demand Media Streaming with Packet Loss Recovery", SIGCOMM (2001).
[36] M. Hofmann, T.S.E. Ng, K. Guo, et al., "Caching Techniques for Streaming
Multimedia over the Internet", Technical Report BL011345-990409-04TM, Bell
Laboratories (2000).
[37] Cisco Online Documentation Website, "IP Multicast",
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ipmulti.htm
[38] L. Mathy, "Introduction to Multicast Routing in the Internet", Presentation (2001).
[39] D. Estrin, D. Farinacci, A. Helmy, et al., "Protocol Independent Multicast-Sparse
Mode (PIM-SM): Protocol Specification", RFC 2117 (1997).
[40] Y.-h. Chu, S.G. Rao and H. Zhang, "A Case for End System Multicast", In
Proceedings of ACM Sigmetrics, Santa Clara (2000).
[41] I. Stoica, T.S.E. Ng and H. Zhang, "REUNITE: A Recursive Unicast Approach to
Multicast", INFOCOM 3 (1999).
[42] D. Pendarakis, S. Shi, D. Verma and M. Waldvogel, "ALMI: An Application Level
Multicast Infrastructure", In Proceedings of the 3rd USNIX Symposium on Internet
Technologies and Systems (USITS 2001) (2001).
[43] J. Jannotti, D.K. Gifford, K.L. Johnson, et al., "Overcast: Reliable Multicasting with
an Overlay Network", (2000).
[44] P. Francis, "Yoid: Extending the Internet Multicast Architecture", Technical Report
(2000).
[45] Yatin and Chawathe, "Scattercast: An Adaptable Broadcast Distribution Framework",
To Appear in a Special Issue of the ACM Multimedia Systems Journal on multimedia
Distribution (2002).

39
PhD 1st Year Report - Nick Blundell - 04/11/2002

[46] L. Mathy, R. Canonico, S. Simpson and D. Hutchison, "Scalable Adaptive


Hierarchical Clustering", IEEE Communications Letters 6 (2002) p117-119.
[47] M. Liste,"White Paper: Content Delivery Networks (CDNs) – A Reference Guide",
http://www.ciscoworldmagazine.com/webpapers/2001/03_thrupoint.shtml (2001)
[48] Cisco Systems, "Cisco Internet CDN Software User Guide",
http://www.cisco.com/univercd/cc/td/doc/product/webscale/content/cdnsp/cdnsp21/ic
dn21ug/
[49] F5 Networks, "The Combination of Akamai’s Content Distribution Services and F5
Networks’ ITM Products Provide Speed and Reliability For Internet Sites",
http://www.f5.com/solutions/whitepapers/akamai.pdf
[50] A.T. Campbell, H.G.D. Meer, M.E. Kounavis, et al., "A Survey of Programmable
Networks", ACM SIGCOMM Computer Communication Review 29 (1999) p7-24.
[51] S. Schmid, J. Finney, A.C. Scott and W.D. Shepherd, "Component-based Active
Networks for Mobile Multimedia Systems", In Proceedings of the 10th International
Workshop on Network and Operating System Support for Digital Audio and Video
(NOSSDAV) (2000) p26-28.
[52] A. Ghosh, M. Fry and G. MacLarty, "An Infrastructure for Application Level Active
Networking", (2000).
[53] K.L. Johnson, J.F. Carr, M.S. Day and M.F. Kaashoek, "The Measured Performance
of Content Distribution Networks", In Proceedings of the 5th International Web
Caching and Content Delivery Workshop (2000).

40

You might also like