You are on page 1of 71

FHTW Berlin, Germany

University of Applied Sciences for Engineering and Economy


International Media and Computing (Bachelor)

Bachelor Thesis
Mobile Games over
3G Video Calling
Analysis of Interactive Voice and Video Response
for Mobile Applications and Games
Author
Christoph Kpernick
Mhlenstr. 20A
14167 Berlin, Germany
+49 171 4527999
christoph@koepernick.de
Matr.-No.: s0514154

Start: 24 November 2008
Hand In: 27 February 2009

1
st
reviewer
Prof. Dr. Ing. Carsten Busch
Kopfbau, Raum: 109
Wilhelminenhofstrae 75A
12459 Berlin
+49 30 5019-2214
carsten.busch@fhtw-berlin.de
2
nd
reviewer
Prof. Thomas Bremer
Kopfbau, Raum: 109
Wilhelminenhofstrae 75A
12459 Berlin
+49 30 5019-2481
bremer@fhtw-berlin.de
i

Abstract
Interactive voice and video response (IVVR) is a
mobile technology enabling interactive services
based on 3G video telephony. IVVR applications can
take advantage of bidirectional real-time multimedia
streaming, enabling speech and camera-based
interaction. Games are a proven way to draw users
into new technologies. This thesis analyses IVVRs
underlying technologies and is dedicated to IVVR
applications usability aspects and game concepts
suitable for 3G video call games. 3GBattle is a
prototype for a camera-based card battle game,
demonstrating IVVRs capabilities and ways to
adapt to its limitations.
ii
Preface
When I came across interactive voice and video response (IVVR) during my internship in
Malaysia, I was looking for substantial literature about 3G-324M and IVVR. I discovered
thatfor 2008 and so far in 2009there are no books or studies that exclusively cover IVVR
in depth. I foresee that systems founded on the notion of IVVR will have good prospects,
including Mobile Rich Media applications that mix media streaming with interactivity,
creating an innovative mobile experience for consumers. In consideration of my upcoming
final thesis and the fact that IVVR and games excite me, I have decided to write about Mobile
Games over 3G Video Calling with the goal of studying IVVR in depth and thinking beyond
current IVVR services. The topic of my thesis includes many areas covered in my studies and
beyond; therefore, it is a good choice to interconnect different specialities to a
multidisciplinary work.
Acknowledgements
I would like to thank FHTW for providing me with a high-quality education and especially
my professors Prof. Dr. Ing. Carsten Busch and Prof. Thomas Bremer for supporting my wish
to write about this special topic and for their advice. I am also grateful that my friends and
family always encourage me to work hard on my thesis and focus on my studies. Furthermore
I thank the professional editors from papercheck.com for proofreading my thesis.
Christoph Kpernick, February 2009

iii
Contents
Introduction .............................................................................................................................. 1
Chapter 1 Basic Concepts ....................................................................................................... 4
1 3G Mobile Phone Standards and Technology ............................................................ 5
1.1 Characteristics of Wireless Networks ...................................................................... 6
2 3G-324M The 3GPP Umbrella Protocol .................................................................. 7
2.1 3G-324M Multimedia Terminal .............................................................................. 8
2.2 3G-324M System ..................................................................................................... 8
3 Multimedia Codecs, Compression, and Streaming .................................................. 10
3.1 Speech Codec ........................................................................................................ 11
3.2 Video Codec .......................................................................................................... 12
3.3 Interaction Delay ................................................................................................... 13
3.4 Side-Effects on the User Experience ..................................................................... 14
4 Interactive Voice and Video Response ...................................................................... 16
4.1 IVVR Applications ................................................................................................ 17
4.2 IVR Supplements ................................................................................................... 17
4.3 Information Portals ................................................................................................ 18
4.4 Video on Demand .................................................................................................. 18
4.5 Mobile TV ............................................................................................................. 18
4.6 Video Sharing ........................................................................................................ 19
4.7 P2P Video Avatar .................................................................................................. 19
4.8 3G-to-IP ................................................................................................................. 19
4.9 3G-to-TV ............................................................................................................... 19
4.10 Mobile Banking ..................................................................................................... 20
4.11 Mobile Games........................................................................................................ 20
5 IVVR Game Santa Claus Sleigh Ride ....................................................................... 21
6 Classification of Mobile Games over 3G Video Calling .......................................... 22
6.1 Thin-Clients and Gaming Terminals ..................................................................... 22
6.2 Mobile Game Streaming ........................................................................................ 23
6.3 Person-to-Application ............................................................................................ 24
6.4 Direction, Interaction and Conversation ................................................................ 26
6.5 All-IP Approach .................................................................................................... 27
6.6 Mobile Games over 3G Video Calling Defined .................................................... 27

iv
Chapter 2 Usability and Design Opportunities of IVVR ................................................... 29
7 Considerations about Mobile Video Telephony ....................................................... 30
8 Usability Opportunities and Design Rules for IVVR .............................................. 31
8.1 Simplicity............................................................................................................... 31
8.2 Sounds ................................................................................................................... 32
8.3 Visual Design Rules .............................................................................................. 32
8.4 Resuming Sessions ................................................................................................ 34
8.5 Consistency and Multi-tap Text Entry................................................................... 34
8.6 Camera-Based Information Entry and Interaction ................................................. 35
Chapter 3 Design of Mobile Games ...................................................................................... 38
9 Technical Foundation for IVVR Games ................................................................... 39
10 Appropriate Game Concepts .................................................................................. 41
10.1 Visual Novels ........................................................................................................ 41
10.2 Mobile Gambling ................................................................................................... 43
10.3 IVVR Multiplayer Games ..................................................................................... 43
10.4 Parallel Reality Games .......................................................................................... 44
Chapter 4 Mobile Role-Playing Game: 3GBattle ............................................................... 45
11 Early Prototype ....................................................................................................... 46
11.1 Setting .................................................................................................................... 46
11.2 Game Concept ....................................................................................................... 46
12 Preparation for a Working Prototype ................................................................... 48
12.1 Machine-Readable Playing Cards ......................................................................... 48
12.2 Theme .................................................................................................................... 48
13 Further Improvements ........................................................................................... 50
Conclusion .............................................................................................................................. 51
14 Further Studies ........................................................................................................ 52


v

List of Figures
Figure 1. High-level architecture of UMTS network ................................................................. 5
Figure 2. 3G-324M system diagram .......................................................................................... 9
Figure 3. Visual distortion of a H.263 stream due to transmission errors (with error
concealment). ........................................................................................................................... 13
Figure 4. Person-to-person 3G video telephony ...................................................................... 16
Figure 5. Person-to-application video telephony (IVVR)........................................................ 16
Figure 6. Example of IVVR supplement application for customer care with barcode
recognition ............................................................................................................................... 18
Figure 7. IVVR game Santa Claus from CreaLog GmbH ................................................... 21
Figure 8. IVVR application template with 16x16 raster .......................................................... 33
Figure 9. 12-digit numpad........................................................................................................ 34
Figure 10. High-level system architecture for the delivery of dynamic IVVR services .......... 39
Figure 11. Screenshot from popular visual novel Brass Restoration. .................................. 42
Figure 12. IVVR slot machine to win coupons........................................................................ 43
Figure 13. 3GBattle prototype configuration ........................................................................... 46
Figure 14. Semacode tag representing number 1. .................................................................... 48
Figure 15. Example character card 1. ...................................................................................... 49
Figure 16. Example character card 2. ...................................................................................... 49
Figure 17. Example battle card 1. ............................................................................................ 49
Figure 18. Example battle card 2. ............................................................................................ 49

List of Tables
Table 1 Evolution and Comparison of H.324, H.324M and 3G-324M ..................................... 7
Table 2 Various UMTS Services from User Point of View .................................................... 23
Table 3 UMTS Services from Network Point of View............................................................ 24
1

Introduction
Games are a proven way to draw users into new applications and devices. Video games are
popular and widely adopted by all age groups and in all social environments. First, video
games are detached from arcade cabinets, becoming available on personal computers and
game consoles for home usage. Soon, gaming was possible on the go with handheld game
consoles and even mobile phones. In 2009, it is common to play casual games on ones
mobile phone using Java technology or BREW, or on other platforms such as Symbian,
iPhone, or Windows Mobile. Some games are available as Flash Lite applications or over
WAP, but the majority needs to be installed on a phone, and the devices have to meet certain
software and hardware requirements. Moreover, multiplayer games are also quite popular on
many platforms. Mobile connectivity makes mobile phones a perfect platform for online
and/or multiplayer games.
Current mobile phone capabilities offer numerous ways for service providers, mobile network
operators, and content providers to create profitable mobile services. These services include
(1) WAP Push-driven, premium-rated short message services; (2) mobile instant messaging;
(3) mobile dating; (4) video and game downloads; (5) TV voting; (6) colour ring-back tones;
(7) web services extended to mobile users over mobile IP data services; or (8) premium-rate
telephone services such as customer care, tech support, or adult chat lines. Most of these
servicessuch as SMS or premium telephony servicesuse the circuit-switched
characteristics of mobile networks; others are based on the packet-switched mobile data
services of GPRS, UMTS, or HSDPA. A number of these services generate direct revenue for
both the service provider and the mobile network operator; some only-mobile network
operators benefit as chargeable voice or data traffic is generated on the mobile network.
However, all these services lack a user-friendly combination of real-time interaction and
ease-of use coupled with an instant multimedia experience. Moreover, most do not support
features such as content protection or push communication, and they do not use an ultra thin
client approach. This is where the new video call capabilities of contemporary 3G handsets
come into play to create the new mobile technology interactive voice and video response
(IVVR). Utilizing the full potential of IVVR can enable service providers to create the
ultimate thin client service that is easy to use and features bidirectional multimedia
communication in real-time, with full content control in a user-friendly and easy-to-catch
manner.
2
Exploiting the capabilities of IVVR based on 3G video telephony for mobile gaming opens
up many opportunities but also many challenges. Mobile games can be delivered without
prior installation, are operating-system independent, and can be played without additional
software such as J2ME. Games stream instantly to the phone and do not require any
additional local storage or processing power. Mobile games using IVVR technology also
enable developers to create games where levels, avatars, and objects can update when
desired, based on information sources on the web or from other gamers. This opens up a
variety of services featuring multiplayer and social networking capabilities. During an IVVR
session, the caller is automatically sending his camera and microphone signal to an
application server, paving the ground for motion- or gesture-based interaction, voice
commands, and game concepts that ask gamers to take pictures of special symbols to control
the game.
Although 3G phones with video call capabilities have been available since 2001 and are
pervasive nowadays, they are still unused for mobile games over 3G video calling. Network
operators are slowly realizing that 30-plus years of evidence prove that people just do not like
the idea of seeing whom they are calling, and this preference will not change dramatically.
Therefore, they are looking for the next killer-application that offers a unique user
experience to co-exist with the mobile web for 3G and classic communication services such
as voice calling.
As of spring 2009, there are no known successful games over 3G video calling. In this thesis,
I will analyse the advantages and opportunities of exploiting the 3G video call feature for
mobile games and present design guidelines and evaluate which game concepts are
appropriate for IVVR games.
In detail, I will cover the following aspects:
The first chapter introduces the basic concepts behind 3G video calling including the relevant
UMTS architecture for circuit-switched video calling, multimedia streaming and codecs, the
3G-324M umbrella protocol, and research about IVVR and 3G video call applications in
detail.
Chapter 2 focuses on user interaction by speech commands and camera-based interaction,
covering usability aspects for IVVR applications.

3
Chapter 3 explains the current state of my findings for a system architecture to provide IVVR
games and discusses which game concepts are suitable for mobile games over 3G Video
Calling.
Chapter 4 features the game design and conceptual prototyping for my IVVR game 3GBattle
that uses a camera-based interaction approach.

4
Chapter 1 Basic Concepts
Before designing, developing and analysing IVVR applicationsand especially Mobile
Games over 3G Video Callingit is essential to understand numerous basic concepts on
which those applications rely. These basic concepts influence the creation of IVVR
applications, help exploit all 3G video call features, and cope with the downsides of the
bearer technologies for achieving an undiluted user experience.
From the technological perspective of 3G video calling in general, the essential technologies
involved are (1) the circuit-switched characteristics of the UMTS mobile network system for
video telephony; and (2) the 3G-324M umbrella protocol used for conversational multimedia
services. The 3G-324M standard recommends that the media codecs AMR and H.263(+) be
used for audio and video streaming. To assess the quality of IVVR services, it is essential to
understand the characteristics of these codecs in the mobile environment.
Aside from the details of IVVR technology and provision of IVVR application examples, the
idea of Mobile Games over 3G Video Calling is classified and clearly defined.


5
1 3G Mobile Phone Standards and Technology
Third-generation (3G) systems were designed with the notion of enabling a single global
standard to fulfil the needs of anywhere and anytime communication (Etoh). Compared to 2G
systems, 3G systems focus more on multimedia communication such as video conferencing
and multimedia streaming. ITU defined IMT-2000 as a global standard for 3G wireless
communications and, within this framework, 3GPP developed UMTS as one of todays 3G
systems. W-CDMA is the main 3G air interface for UMTS (Holma and Toskala)
implementing various person-to-person, circuit-switched services such as video telephony.
The high-level UMTS network architecture from 3GPP-R5 is described in documents from
its Technical Specification Group in the figure below (Etoh 22).

BSS Base Station System
CS Circuit-Switched
HSS Home Subscriber Servers
IMS IP Multimedia Subsystem
MS Mobile Station
NMS Network Management Subsystem
PS Packet-Switched
RNS Radio Network Subsystem

Figure 1. High-level architecture of UMTS network
As shown in figure 1, the UMTS core network primarily consists of a circuit-switched (CS)
and a packet-switched (PS) domain. Typically, the PS domain is used for end-to-end packet
data applications, such as mobile Internet browsing and e-mail. On the other hand, the CS
domain is intended for real-time and conversational services, such as voice and video
conferencing. Circuit-switched connections are most efficient for constant, continuous data
streaming by definition (Etoh). In addition to the CS and PS domains, 3GPP-R5 also specifies
the IP Multimedia Subsystem (IMS).
Using the PS domain, IMS is projected to provide IP multimedia services that also satisfy
real-time requirements, including those that were previously possible only in the CS domain.

6
In this thesis, I will discuss Mobile Games over 3G Video Calling based on 3G video
telephony in the CS domain (see figure 1, highlighted in dark green).
1.1 Characteristics of Wireless Networks
Wireless networks are inherently error prone. Bitrates in wireless systems tend to fluctuate
more as compared with wired networks. In wired networks, phenomena such as fading,
shadowing, or reflection are non-existent so that, for the most part, the same bandwidth and
much higher bandwidths are present during transmission. Influences on signal propagation
cause the constant changing bandwidths in wireless systems. Generally, the receiving power
depends on the distance between sender and receiver. The receiving power p decreases
proportionally to the square of the distance betw n sender and receiver: ee
p =
1
J
2

where d is the distance between sender and receiver (Schiller).
Receiving power is influenced further by frequency dependent fading, shadowing, reflection
at large obstacles, refraction depending on the density of the medium, scattering at small
objects, and diffraction at edges.
The effect of multipath propagation can cause jitter when the radio signal reaches the
receiver by two or more paths at different times (Schiller). Moreover, the mobility of the user
adds another set of problems that results in fading of received power over time; the channel
characteristics change over time and location. This exacerbates the effect of multipath
propagation because signal path change will be increased as the user changes his or her
location. Changes in the distance between sender and receiver cause different delay variations
of different signal parts.
The phenomenon of cell-breathing is a special problem in CDM systems. In CDM systems,
all terminals use the same frequency spectrum. Therefore, the more information that
terminals are sending and receiving in a cell, the more noise that is produced. A higher noise
level means that the noise level for far terminals will increase to the point that reception is
impossible; ergo, the cell shrinks.
The UMTS network (W-CDMA) counters but not eliminates these effects by implementing
error detection, error correction, and error concealment measures. For example, in W-CDMA,
cell-breathing is effectively prevented by implementing the wideband power-based load

7
estimation to keep the cell coverage within the planned limits (Holma and Toskala).
Nonetheless, these phenomena can still affect the audiovisual quality of 3G video calls such
as high delays, bit errors, or varying bitrates. This aspect is covered in more detail in Section
3: Multimedia Codecs, Compression, and Streaming.
2 3G-324M The 3GPP Umbrella Protocol
The 3G-324M standard implemented in contemporary 3G camera phones enables 3G users to
establish bidirectional multimedia calls in the sense of a person-to-person, circuit-switched
service for the purpose of video telephony or video conferencing.
The 3G-324M umbrella protocol is based on H.324, and its first draft was specified by 3GPP
in 1999. The current No. 7 release of 3GPPtechnical specification 3GPP TS 26.110
introduces the set of specifications that apply to 3G-324M multimedia terminals. In TS
26.110, most of these specifications are referred as multimedia codecs for circuit-switched
3GPP networks. In the sense of TS 26.110, the term codec refers not only to codecs used for
the encoding and decoding of media streams, but also to mechanisms for multiplexing/de-
multiplexing and call control (3GPP). More specifically, the codecs used for media streams
are AMR and H.263; for instance, H.223 and H.245 are the codecs for multiplexing/de-
multiplexing and call control, respectively.
In addition to these codecs, 3G-324M also defines codecs for error detection and correction
since 3GPP networks are inherently error prone (see 1.1 Characteristics of Wireless
Networks).
Table 1 Evolution and Comparison of H.324, H.324M and 3G-324M
H.324 H.324M
a
3G-324M
Focus POTS Mobile Networks 3G Wireless Networks
Standardisation Began 1990 1995 December 1999
Standardisation Body ITU-T ITU-T 3GPP
Audio Codecs G.723.1, AMR G.723.1 Annex C
b
G.723.1, AMR
Video Codecs H.263, MPEG-4 Part 2 H.263 Appendix II
c
H.263+
d
, MPEG-4, H.264
Notes: a. H.324 Annex C refers to H.324M.
b. With bitrate scalable error correction and unequal error correction.
c. With error tracking improvements described in Annex K, N and R of H.263 version 2 from 1998.
d. H.263 version 2 from 1998.

8
H.324 was originally developed by ITU-T for low bitrate multimedia communications with
voice, video, and data transmission in the PSTN over analogue (circuit-switched) phone lines
and was later extended to other GSTN networks like ISDN.
H.324 terminals provide real-time video, audio, or data, or any combination, between two
multimedia telephone terminals over a GSTN voice band network connection.
Communication may be either 1-way or 2-way. Multipoint communication using a separate
MCU among more than two H.324 terminals is possible (ITU-T). Over the years, several
extensions have been added: One of them is H.324M that adapts the H.324 series to mobile
networks to make the system more robust against transmission errors. H.324M was intended
to enable efficient communication over error-prone wireless networks.
One of the general principles set for the development of H.324M and 3G-324M
recommendations was that they should be based upon H.324 as much as possible; this would
simplify further development of existing systems and ease the introduction of new features in
standards derived from H.324 (Table 1 gives key facts about the evolution to 3G-324M).
Technical specification 26.111 contains the modifications in 3G-324M that were made to
H.324 in order to address error-prone environments.
2.1 3G-324M Multimedia Terminal
In the scope of 3G-324M, a terminal that implements the 3G-324M umbrella protocol and all
its features is called a 3G-324M multimedia terminal. Terminals can be 3G-handsets with a
W-CDMA air interface and a built-in camera. More generally, any equipment that complies
with the requirements of Technical Specification TS 26.110 is a 3G-324M multimedia
terminal. In this sense, for example, a Linux-based machine connected with an E1 line to the
PSTN can also be a 3G-324M multimedia terminal, as long as it supports all the protocols
requirements. More specifically, such IVVR (application) servers are the foundation to
provide and deliver IVVR services.
2.2 3G-324M System
Figure 2 shows the 3G-324M system followed by a description of its components relevant for
IVVR.

9

Figure 2. 3G-324M system diagram
Note: e. (3GPP 7)
H.324 Annex H (Optional Multilink) defines the operation of H.324M
1
over as many as 8
independent physical connections, aggregated together to provide a higher total bitrate (ITU-
T Study Group No. 16). A single physical connection is defined by one 64 kbit/s circuit-
switched connection that is compatible with N-ISDN (Etoh). In ISDN terms, such a
connection is also known as an S
0
interface. Although the total bitrate for 3G-324M
connection can be multiples of 64 kbit/s, all mobile network operators
2
and handsets
3
the
author has tested support only a single N-ISDN compatible channel. A sole 64 kbit/s
connection is used for all logical media channels of a session, meaning for both transmission
and reception of multimedia streams and control data. Although bandwidth allocation is
dynamically based on demand and wireless network characteristics, a bitrate of roughly 30
kbit/s is available for each party to transmit or receive the media streams, respectively. H.223
is used to multiplex the logical media channels used for speech, video, and data
communication into one bitstream (Etoh).
During call set-up, the 3G-324M multimedia terminal capabilities are exchanged using H.245
messages, the master/slave relationship for the session is determined, the logical channels for
audio and video transmission are opened, and the multiplexing agreement is made up (Jabri).

1
All characteristics of H.324M also apply to 3G-324M due to the general principles described earlier.
2
Mobile Networks tested: Germany: Vodafone, T-Mobile, E-Plus, o2. Malaysia: Maxis, Celcom.
3
For example, Nokia N96 specification states, CS max speed 64kbps (Nokia).

10
The exchange of terminal capabilities has the same motivation as does the use of the SDP in
SIP. Terminals can have different capabilities, especially concerning supported multimedia
codecs and the need to agree upon codecs that both terminals support. The device capabilities
have qualitative effects on QoS, as evolved multimedia codecs like MPEG-4 Part 2 or AAC
are more efficient and robust than legacy codecs such as H.263 or AMR audio. Furthermore,
H.245 messages are used to transmit DTMF signals during a 3G-324M session, with the
caller using the handsets numpad to type in numbers or characters. DTMF signals
transmitted through H.245 messages are the foundation for simple interaction with IVVR
applications. NSRP is an optional retransmission protocol, and CCSRL provides a mechanism
for segmenting H.245 messages into fragments to improve performance in conditions where
the likelihood of errors is high (Myers).
3GPP Technical specification 26.111 requires AMR to be used as speech codec. For
maintaining audio and video synchronisationthat is, lip-synchthe Optional Receive Path
Delay can compensate for the video delay. H.263 baseline is required as video codec to
compress the media stream. The use of MPEG-4 Part 2 is recommended as it provides higher
error robustness capabilities and improved coding efficiency as compared to H.263.
TS 26.111 also specifies data protocols like T.140 that could be used for real-time text
conversations. Text conversations may be opened simultaneously with voice and video
applications, or as text-only sessions
4
(3GPP). However, ITU-T is not aware of a deployed
terminal having T.140 implemented (ITU-T). Unfortunately, even in 2009 there is no known
handset that implements T.140.
3 Multimedia Codecs, Compression, and Streaming
3G video telephony generally operates over a single 64 kbit/s connection where both parties
need to share the available bandwidth. Effectively, the application then is left with 60 kbit/s,
or less that are dedicated for both media types, since H.245 call control messages reduce the
gross bandwidth. In 3G-324M systems, the bandwidth is allocated dynamically; however,
generally said, every party has 50% of the bandwidth available for sending audio and video
signals. In a typical unidirectional scenario, 12.2 kbit/s are allocated for the speech codec, and
a bitrate of 43-48 kbit/s is allowed for the video data (Sang-Bong, Tae-Jung and Jae-Won).

4
Also known as textual chat.

11
By employing rate control methods in the media encoders, the network can dynamically
change these bitrates depending on network conditions and application demand. When two
parties communicate simultaneously, the bitrates for the speech and video codec can be
reduced in the encoders of both parties, keeping the overall bitrate below 64 kbit/s. For
instance, when just one party shows speech activity, the speech bitrate for the other party can
be reduced to a minimum where only comfort noise is generated on the receiver side (Holma
and Toskala); AMR can perform these bitrate changes every 20ms. For video, the encoder
can reduce the average bitrate by either reducing the frame rate or simply dropping frames
during transmission. To increase the overall frame rate on the receiver side, the decoder can
employ H.263 temporal scalability.
In 3G video telephony, the audio and video signals are bidirectionally streamed over
dedicated circuit-switched W-CDMA paths. Streaming describes media is continuously being
received or sent and played back on a terminal. Non-conversational one-way audio or video
streaming requires a transport delay variation of below 2s (3GPP). In contrast, two-way video
telephony introduces even higher real-time requirements with an end-to-end, one-way delay
of below 150-400ms
5
(3GPP) to maintain a smooth conversation. The overall one-way delay
in W-CDMA networks
6
is already approximately 100ms, and it should be noted that in
addition to the transmission time, media generation time is required when delivering IVVR
services. Due to these tight delay requirements, there is no time for retransmission when
transmission errors are detected. Retransmission would reduce bit errors and consequently
improve video quality, but it would also add undesired delays when resending PDUs.
Therefore, to avoid retransmission, H.223 and the media codecs are working hand-in-hand to
detect errors, accomplish resynchronisation, and perform error concealment.
3.1 Speech Codec
For audio coding the AMR narrowband (NB) speech codec is used and operates at a nominal
bitrate of 12.2 kbit/s
7
. The actual bitrate depends on network conditions and speech activity,
and it can switch every 20ms, leading to a different average bitrate. AMR-NB was developed
to handle narrowband speech; that is, digitized speech with a sampling frequency of 8 kHz
(Etoh).

5
Where <150ms is preferred and the lip-synch delay should be <100ms.
6
From the user equipment to the PLMN border.
7
Other operation modes are possible and include bitrates of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, and 10.2 kbit/s.

12
3.2 Video Codec
In video coding, a sequence of frames is compressed and converted into a bitstream that is
typically many times smaller than the data representing the frames. In H.263+ and other
typical video codecs, a frame may be coded on one of several modes: the use of I-frames
(intraframe), P-frames (predicted frame), and B-frames (bidirectionally predicted frames). It
should be highlighted that the use of B-frames is unsuitable for conversational applications as
it creates additional buffering delays (Etoh). The use of B-frames results in a reordering of
frames, making the display order and coding order different.
3G-324M standard recommends a video resolution of 176 by 144 pixels (QCIF) with a frame
rate of between 10 and 15 frames to be encoded using H.263(+). H.263(+) operates in the
YC
B
C
R
colour space, and uses 4:2:0 chroma subsampling (YUV420), which compresses the
video signal. For coding, usually a frame is split into macroblocks. A macroblock consists of
one block of 16 by 16 luna samples (Y) and two blocks of 8 by 8 blue-difference and red-
difference chroma samples (C
b
and C
r
).
The H.263+ decoder can attempt to conceal errors, but previously, errors need to be detected.
In Subsection 2.2, we saw that the encoded media streams are multiplexed into a H.223
bitstream. This H.223 bitstream consists of a series of media packets called Adaption Layer
Protocol Data Unit (AL-PDU). Using H.223 AL2
8
, the AL-PDUs can contain a control field
and CRC checksum (NMS Communications) in addition to the payload. With the help of
CRC, H.223 is capable of detecting errors or loss of AL-PDUs and then report these to the
H.263+ decoder.
However, errors that may not be detected in H.223 are passed to the H.263+ decoder without
indication; the decoder can still detect errors by watching out for syntactic or semantic
violations of the bitstream (Hansen). When an error is detectedeither by transport decoder
H.223 or video decoder H.263+the video decoder can decide to drop the complete packet
or try resynchronisation. In the H.263+ media stream, at the beginning of each GOB
9
, a
resynchronisation marker is inserted to help the decoder distinguish valid video information
after an error has occurred (Kwon and Driessen).


8
H.223 Adaption Layer 2.
9
In H.263, it was possible to place a sync marker only at the beginning of each picture.

13

Source: Warren Miller movie trailer encoded to QCIF H.263 with 64 kbit/s using Real Mobile Producer and streamed in the PS
domain over RTSP.
Figure 3. Visual distortion of a H.263 stream due to transmission errors (with error concealment).
When an error is detected, error concealment rather than retransmission is performed. Error
concealment aims to hide visual artifacts and residual errors. Spatial interpolation and
temporal motion estimation is performed to hide the visual distortion. The effects of
transmission errors and error concealment measures can be seen here.
3.3 Interaction Delay
To assess the interactivity of IVVR applications later in this thesis and in further studies, it
should be explained how different delays in an IVVR system add together and create the
overall interaction delay d:
J = J
t
+J
n
+J
c
+J
g
+ J
nt
+J


where J
t
is the transmission delay of the 3G network from the mobile station (MS) to the
PLMN border. The delay caused by transition from one network to another is represented
by J
n
; this is mainly the transition from a circuit-switched network (3G CS or PSTN) to a
packet-switched network (IP). This transition is obvious in the following scenario:
The dialled number is routed over the PSTN to a PRI in a datacentre and answered by a 3G-
324M Gateway directly connected by a digital telephony card to the PRI line. In this case, it
is assumed that the path between the MS to the answering gateway is completely circuit-
switched. When the gateway routes the video call over an IP-based LAN to an IVVR
Application Server there is a transition from the CS to the PS domain. This transition causes
delays because the continuous bitstream needs to be wrapped in packets to transmit the data
over a LAN.
On the other hand, J
n
can also describe delays caused by network transitions between the
PLMN border and an E1 line. Nowadays, to reduce costs, parts of the PSTN are already

14
replaced by NGN that use the PS domain to interconnect circuit-switched networks for voice
d video communication (VocalTec). an
J
c
describes the delay caused on an application server to control the flow of program
ecution depending on user input. ex
J
g
is the delay caused by dynamic generation of an audio and video stream to be transmitted
and displayed on the MS.
Similar to J
n
, the variable J
nt
describes delays caused when sending media from an IVVR
Application Server to the MS. The transition from the PS domain (LAN) to the CS domain
(E1 line) particularly causes delays due to packet delay variation. Packet delay variation
also known as delay jitteroccurs when there is a packet jam on the transition from the PS to
the CS domain. Packets need to be unwrapped and put into a continuous bitstream. When the
available bandwidth in the CS domain is smaller than the bandwidth in the PS domain
(bottleneck), packets jam and a jitter buffer is used to ensure that data is continuously played
ut to the CS do ain (Wikipedia contributors). o m
J

is similar to J
t
in that it describes the delay for receiving the bitstream on the MS caused
during transmission between the PLMN border and the MS over the W-CDMA air interface.
The authors experiments using a similar configuration as described in Section 9: Technical
Foundation for IVVR Games and evaluating IVVR services from companies listed in
Appendix B showed that a typical value for the overall interaction delay d is between 0.8 to 1
seconds. Although this could be further optimized to get near to the theoretical limit of
200ms, this is the reference value for further studies, especially to select and develop
appropriate game concepts as described in Section 10: Appropriate Game Concepts.
3.4 Side-Effects on the User Experience
Media compression, error concealment measures (see above), and the characteristics of
wireless networks (see Subsection 1.1) have side effects on the quality of 3G video telephony
and IVVR applications.
3G-324M requires only the use of speech codecs. In contrast to audio codecs, speech codecs
are designed for speech transmission within a narrow frequency range, making them
inappropriate for transmission of music or a range of artificial sounds. This fact needs to be

15
considered when designing IVVR applicationsespecially games, as most games utilize
music and sound effects to create an immersive atmosphere.
H.263 and MPEG-4 Part II baseline were designed for images of natural scenes with
predominately low-frequency components, meaning that the colour values of spatial and
temporally adjacent pixels vary smoothly except in regions with sharp edges. In addition,
human eyes can tolerate more distortion of high-frequency components than of the low-
frequency components (Kwon and Driessen). In reference to the explanation of Kwon and
Driessen, video codecs used for 3G-324M video telephony are great for natural scenes and
talking-head scenarios. Depending on the type of IVVR application, these characteristics
work against a good user experience.
Typical desktop or web applications have a monochromatic user-interface with boxes,
buttons, and fonts that are clearly readable. Based on user interaction, the user interface can
change its appearance frequently, perhaps only for some parts of the user interface or perhaps
the whole screen. It is obvious that codecs used for 3G-324M video telephony are unsuitable
for this kind of video transmission. Compressing such user interfaces with H.263 creates
blurred fonts and tattered buttons and lines, leading to a user interfaces too distorted for a
good user experience. The comparable high round-trip delays can make interaction tedious,
with interfaces that require a high rate of user interaction and screen changes.
Depending on the type of game, the compression characteristics of video codecs used in 3G-
324M can be advantageous. Contemporary 3D games such as first-person shooters or
simulation games try to model the game environment as realistic as possible, creating natural-
looking scenes and making them appropriate for compression using video codecs defined for
3G-324M.
However, more problematic are the delay requirements for mobile games that are essential
for a good gameplay experience. 3GPP defines a delay variation of below 75ms for real-time
games
10
and considers first-person shooters the most demanding ones with respect to delay
requirements (3GPP). Other types of games, such as turn-based strategy games or visual
novels, may tolerate a higher end-to-end delay and may require lower data rates.

10
In my opinion,75ms is dedicated to network delays in multiplayer scenarios; interaction delay should
probably be a lot lower.

16
4 Interactive Voice and Video Response
Interactive voice and video response (IVVR) is a mobile technology that enables interactive
services based on a 3G video call (Ugunduzi Ltd.). IVVR is using the 3G video call
technology to deliver applications and services to 3G users.

Figure 4. Person-to-person 3G video telephony

Figure 5. Person-to-application video telephony (IVVR)
In contrast to person-to-person video telephony (Figure 4), an application server answers when a
3G user places a video call. The application server generates and transmits an audio and
video stream that is shown on the handset of the caller (Figure 5).
Instead of a real person or scene, the application server can transmit any kind of audio and
videowhether a pre-recorded talking head, movie trailer, a live feed from TV, or a video
showing traffic informationor even the order process for electronic shopping. As 3G video
calls are generally bidirectional, the user is automatically sending the handsets build-in
camera and microphone signals to the other party. This paves the ground for interactive
applications based on gesture and speech recognition. Another, simpler way to realise
interaction is by processing DTMF signals sent with the 3G-324M session when the caller
uses the handsets numpad to type.
The term IVVR is derived from interactive voice response (IVR). IVR is an interactive
content-to-person service and a prevalent technology to IVVR. IVR allows callers to retrieve
information, make bookings, and get connected with a contact based on the callers selection.
IVR applications are written mostly in VoiceXML to describe a simple series of voice menus
where the caller chooses from a given selection of options or makes spoken requests when
the IVR system supports natural language speech recognition. Based on the callers selection
and purpose of the IVR application, the system plays pre-recorded audio clips, dynamically
generates speech, or connects the caller with a person of charge. Although IVR systems are

17
flexible, used extensively, and accessible through most mobile and landline phones, they
show some limitations. The IVR system can respond only in the auditive dimension; complex
menus need to be broken down to limited choices. When these choices are nested, it can be
tedious for callers to dig through a complex voice menu or remember the series of choices.
Voice-only communication also excludes the dumb from using IVR.
Simply said, IVVR is based on video telephony and adds a visual dimension to IVR, enabling
service providers to create new services that take advantage of media streaming capabilities.
4.1 IVVR Applications
IVVR applications use the IVVR mobile technology to create services based on 3G video
calling. I have identified various types of IVVR applications, explained in the following
subsections.
4.2 IVR Supplements
IVR Supplements are dialog-based IVVR applications founding on the notion of typical IVR
applications for call dispatching or information services. They add the visual dimension to
IVR applications by showing slides as a graphical representation of an IVR voice menu on
the phone screen. This increases accessibility for the dumb, making it possible to interact
with IVR where listening might not be desiredsuch as in business meetings or during a
lectureand accelerating perception of options and information.
Humans receive 80% of information by seeing, but only 15% by hearingand 5% by feeling
(Dahm). Additionally, seeing is a non-linear process by which people can perceive
information simultaneously when skimming a text, for instance. In contrast, when the focus is
on the transmission of facts without emotions, hearing is a linear process where the listener
must wait until the speaker or recording has finished. The same information might be faster to
transmit using written text or visualisations. Because humans can perceive information
visually faster than auditively, IVVR takes advantage of this fact by using the handsets
display to present information.
Moreover, an IVVR application can use the video transmission capabilities as seen in the
following example (Figure 6). Customers looking for repair service can let the system detect
their product by taking a snapshot of the products barcode.

18

Figure 6. Example of IVVR supplement application for customer care with barcode recognition
4.3 Information Portals
Information Portals serve the purpose of making information such as a weather forecast
accessible through a 3G video call. Of course, IVVR information portals compete with
mobile web browsing. A huge advantage of 3G video calling is the easy access to
information: Users do not need a mobile web browser or a data plan for accessing this
information. IVVR is based on 3G video call, which is a W-CDMA circuit-switched service;
therefore, access is charged the same way as voice calling, on a per minute basis
11
. Moreover,
IVVR applications are optimized for display on mobile devices and no need for device
portability, as all 3G-324M compatible multimedia terminals follow the same standards, for
example, video resolution and proportions.
4.4 Video on Demand
Video on demand (VoD) is a pull-based type of communication where people dial a number
and access videos on demand over a 3G video call. Video telephony has a build-in content
protection mechanism: No data is stored on the terminal, and in closed systems such as cell
phones, there is no possibility of recording of copyrighted material.
4.5 Mobile TV
Mobile TV is the use of 3G video telephony to receive TV signals. TV on-the-go is promising
but suffers from high costs or the need for new mobile phones with DVB-H or DMB

11
Although prices may differ compared to voice calling, toll-free numbers and premium numbers are possible.

19
interfaces. Today, 3G video telephony can make live TV available on phones as well as
affordable with the right call plan
12
.
4.6 Video Sharing
Video Sharing makes use of the built-in microphone and camera by recording video over an
ongoing video call on the server side, making it available to a video community. Combined
with a menu similar to an IVR menu, community members can select the videos they want to
watch. Should a larger number of videos be accessiblethat is, more than 8navigation
with numbers (using numbers 09 from the handsets numpad) is tedious. Text entry is more
flexible and can be achieved by typing text messages with multi-tap text entry.
4.7 P2P Video Avatar
P2P Video Avatar is an idea to counter the resistance in user acceptance of classic person-to-
person video telephony (for further explanation, see 7 Considerations about Mobile Video
Telephony). The idea is to apply a dynamic overlay before the calling partys video is
displayed on the handset. A mask or avatar can be superimposed on the callers video. It
should be mentioned that a P2P Avatar is not a real IVVR application but an enhancement of
mobile person-to-person video telephony.
4.8 3G-to-IP
3G-to-IP bridges a video call between the wireless and circuit-switched domain to a
(predominately) wired and packet-switched domain. Connecting 3G and IP users is a
challenge. 3G-324M was designed for circuit-switched networks; fortunately, the media
codecs are generally media independent. In the PS domain, H.323 is used for video
telephony. Recent VoIP activities have led to the use of SIP for call-set up and RTP for media
transport. These protocols are widespread and their use is similar to H.245 and H.223 in the
3G-324M domain. A typical 3G-to-IP gateway would extract media streams from the H.223
bitstream and encapsulate them into a RTP stream after performing call set-up employing
SIP. Like P2P Avatars, 3G-to-IP is not a genuine IVVR application but a gateway service.
4.9 3G-to-TV
3G-to-TV is a technology that enables innovative TV formats such as live newscasts or
participatory TV. The audio and video signal sent by a 3G participant is transmitted to a
broadcasting studio to be shown on TV (Mirial s.u.r.l.).

12
Vodafone Germany offers unlimited voice and video calls as part of their SuperFlat plans.

20
4.10 Mobile Banking
Mobile Banking is well suited to be performed using 3G video telephony. Security
requirements for mobile banking are exceptionally high and need to be considered along with
the desire for global mobility. 3G video telephony is a streaming service; no data is stored on
the network or the terminals. When accessing an account, for example, the account statement
is not received as ASCII characters but as an encoded video stream, making eavesdropping
even harder. In contrast to HTTP services where no transport layer security is mandatory,
communication on UMTS networks is secure from within, thanks to encryption between the
MS and BTS using procedures stored in the USIM (Schiller). The security of mobile banking
services can also benefit from the fact that the handsets audio and video signals are available
for identification and authentication purposes. By combining PIN-authentication with
identification based on biometricssuch as the account holders voice and facial featuresa
banking service would be more secure than typical online banking with PIN-only
authentication alone. This could be done without need for additional hardware such as
smartcard readers.
4.11 Mobile Games
Mobile Games using IVVR technology are games that are instantly streamed to the players
phone and played using various interaction possibilities such as keystrokes, speech
commands, or the use of the handsets built-in camera for object, symbol, or gesture
recognition. This enables any 3G camera phone to be used as a game console, without the
need to download or install additional software. IVVR is well suited for games that do not
require fast interaction and for casual games played on a per-session basis. Game developers
do not need to worry about device porting or system requirements concerning graphic cards,
for instance, as 3G-324M is a well-defined and widely deployed standard in 3G networks and
devices. Moreover, billing is intuitive for gamers as it works on a per-minute basis with
premium numbers, standard phone numbers, and even toll free numbers. However, despite all
these advantages, there are some drawbacks to using todays circuit-switched 3G video
telephony for mobile games. Game designers need to cope with these limitations, circumvent
themor, better, create games that can exploit the possibilities of 3G video calling and cope
with its disadvantages at the same time. Section 10 features game concepts suitable for
IVVR.

21
5 IVVR Game Santa Claus Sleigh Ride
In 2007, CreaLog GmbH, located in Munich, Germany, has developed the example IVVR
game Santa Claus Sleigh Ride. The aim of the game is to steer the reindeer sleigh to the
North Pole by using speech commands or keystrokes. The game is constructed from a series
of pre-recorded video clips showing Santa Clause steering either to the right or the left,
played back depending on the callers commands. Although this game cannot be considered a
real-time interactive game, is technically near to an IVR supplement, perfectly illustrating
how to cope with the various limitations of 3G video calling.


Figure 7 shows a number of screenshots from the game that makes clear that the game is
adapted to the characteristics of IVVR technology. It copes with the high delays by simple
reducing the interaction rate to a minimum. After a 3G video call is placed, video sequences
start automatically, explaining how to play the game and how to steer the reindeer sleigh to
the North Pole. The only caller interaction required is to say left or right, or to use numbers 4
or 6, as seen in the second screenshot. During my testing sessions, the speech recognition
misinterpreted my voice commands in approximately one-third of the time. The response
delay after a keystroke was about 1 second before the next video sequence was played back
(for an in-depth explanation of the interaction delay in IVVR applications, see 3.3 Interaction
Delay).
Figure 7. IVVR game Santa Claus from CreaLog GmbH
In Subsection 3.4, I have explained that media codecs used in 3G-324M are targeted for
speech communication and talking head scenarios. CreaLogs game handles this limitation by

22
having no sound effects but rather a voiceover technique where somebody playing Santa
gives hilarious comments about the players decisions. Furthermore, the game uses a 3D
animated environment consisting of colourful gradients with a small amount of sharp edges
that, unsurprisingly, compresses well using video codecs aimed for the compression of
natural scenes, creating a decent visual quality of this mobile game.
6 Classification of Mobile Games over 3G Video Calling
This section introduces a number of concepts and approaches that help to define clearly
Mobile Games over 3G Video Calling. Knowing these concepts helps stakeholders in the
game development process to describe various types of games based on 3G video telephony
by using the same vocabulary.
2G networks were originally designed for efficient delivery of voice services. Not until the
spread of 3G networks was the foundation for circuit-switched and packet-switched
multimedia and data services set up. Moreover, the UMTS network was designed for flexible
delivery of any type of service, making a great deal of various services possible. Video
telephony is such a service; however, thanks to the future-proof approach of the
standardisation bodies ITU-T, 3GPP and UMTS Forum, the technical basis of video
telephony in UMTS networks is not limited only to conversational person-to-person services.
This basic idea enables developers and service providers to create new servicessuch as
IVVRbased on an existing and well-deployed foundation. In the following, I am
referencing services and application definitions from standardisation bodies and other
literature to help classify and define IVVR, especially Mobile Games over 3G Video Calling.
6.1 Thin-Clients and Gaming Terminals
The architectural foundation for IVVR services is a simple client-server system. The
handsetalso known as the cell phone, smartphone, PDA, mobile station (MS), 3G-324M
multimedia terminal, handheld game console or gaming terminalis the client. The server
more specifically, the combination of 3G-324M Gateway/PBX and IVVR application
serveris the server system in this client-server architecture. The 3G-324M multimedia
terminal is only a device that sends and receives media streams. It runs simple presentation
software that plays back incoming media, and it transmits sound, video, and keystrokes to the
server. A 3G-324M multimedia terminal does not perform any application processing,
persistent application data storage, or even graphics rendering, making it an extremely thin-

23
client (Sommerville). When using the 3G-324M multimedia terminal for gaming purposes, it
should be called a gaming terminal to be easily understood and identified by users.
6.2 Mobile Game Streaming
The UMTS Forum has identified the following services (Hansen) (UMTS Forum):
Table 2 Various UMTS Services from User Point of View
Information and
M-Commerce
Education Entertainment Business
and Financial
Communication Telematics
and Special
Web-Browsing Virtual School Audio video on
demand
Mobile
banking
Video
telephony
Telemedicine
Interactive
Shopping
Online Library Gaming on
demand
Online billing Video
conferencing
Security
monitoring
Remote Training Live Streaming
and Interactive
TV
Mobile
payment
Interactive Voice
Response
Office
extension

Gaming on demand (GoD) is similar to audio and video on demand where a user pays only
for a game when he or she likes to play, without buying or downloading the full game.
According to Table 2, Mobile Games over 3G Video Calling is a combination of GoD and
video telephony. In contrast to progressive downloading of media or games in on-demand
cases, the use of video telephony as a streaming service makes it possible to stream a game to
the gaming terminal
13
while playing it, excluding any wait for delivery. As the use of the
video telephony service makes it possible to stream the game to the users handset while
playing it, I introduce this as the new term mobile game streaming. The game is streamed
instantly to the users handset over a standard 3G video call, without additional software or
UMTS network components. The idea of game streaming is already present and in use in the
wired and stationary world. Tenomichi offers StreamMyGame, a service for broadband
Internet users. Members of StreamMyGame can access and play their games remotely via
broadband or share their games with other members. To provide this service, special software
interconnects the computers hosting games (server) and computers used to play games
(client). Similar to mobile game streaming using 3G-324M protocol, the game graphics and
sound are streamed to the client, and input of the clients peripherals is sent to the server
(Tenomichi). Another existing approach, application streaming, is currently differently
defined and more similar to application virtualisation as it focuses on streaming the

13
In this case, a 3G-324M multimedia terminal or the users mobile phone.

24
application logic and data to a client for stepwise execution rather than streaming the
applications visual output to the client. However, StreamMyGame technology can also be
used to stream applications following the notion of game streaming to clients.
6.3 Person-to-Application
The service classification in Table 2 and my concept of mobile game streaming was created
from a user point of view as it highlights how users can understand this new service based on
the ideas of services they already know. However, the process-centred classification of
possible UMTS services from Holma et al. (11-30) is created from the network point of view
and emphasises the involved types of communication partners. The table below is based on
the approach from Holma et al.:
Table 3 UMTS Services from Network Point of View
Technology Connection Parties Examples
Person-to-Person
Circuit-Switched
Services
CS Peer-to-peer (or
with intermediate
server)
Two persons or
groups
Voice calling
Video calling
Video conferencing
Person-to-Person
Packet-Switched
Services
14

PS Peer-to-peer (or
with intermediate
server)
Two persons or
groups
Multimedia Messages (MMS)
Real-time video sharing
Push-to-talk over Cellular (PoC)
Voice over IP (VoIP)
Multiplayer games
Content-to-Person
Services

CS / PS Client-server Content server
and receiving
user
Web browsing
Video on demand
Live streaming
Content download
Multimedia Broadcast Multicast
Service (MBMS)
Business
Connectivity
PS Multimodal Laptop to
Internet or
Intranet
Web browsing
E-mail
Secure access to corporate
Intranet

14
CS services can later be provided through PS services, which opens up more service possibilities due to
higher bandwidth.

25
In reference to table 3, we can derive the following types of video calls, seen from the
network point of view:
Peer-to-peer (P2P) video calling is classic video calling where one person uses a multimedia
terminal to communicate with another person who is also using a multimedia terminal. The
P2P approach is independently from the type of network. Should a video call be set up
between parties in different networks (wired/wireless or CS/PS domain) or using different
communication protocols (3G-324M/H.324 or proprietary protocols such as Skype), there is
the need for a gateway interconnecting these networks, translating between the different
protocols, sometimes transcoding the media streams to different formats when both parties
cannot negotiate a common set of media codecs (cross-media adaption) (Basso and Kalva).
Peer-to-peer multi-point video callingbetter known as video conferencingis where
multiple participants are connected for a multi-point voice and video conversation. To
interconnect the various parties, a Multi-point Control Unit (MCU) is used, and the feeds
from the participants are displayed on the handset at the same time side-by-side. To achieve
this, either separate logical channels for each participant are opened or the video multiplex
mode in H.263 can be used. The multiplex mode can display up to four different video sub-
bitstreams, sent within the same video channel (ITU-T).
Furthermore, media adaption and transcoding issues need to be considered when
interconnecting parties with different terminals and connections. For example, if one party is
using a low bitrate communication terminal (3G-324M handset) and all other parties are
using broadband Internet connections with high-resolution web cameras, the video streams
from the broadband users need to be transcoded into a lower bitrate version so they can be
transmittable to the handset user. This is done by employing H.263 spatial scalability for
adapting the media to varying display and bandwidth requirements or constraints.
Person-to-application video calling is based on the notion of content-to-person services,
introduced in table 3, but focuses on video calling as a way to access applications in the sense
of on-demand or application service providing (ASP) mixed with the thin-client approach.
Person-to-application is a main concept behind the authors understanding of IVVR
technology as described in Section 4.
Peer-to-peer over Application (P2PoA) video calling is a mixture of peer-to-peer, P2P multi-
point, and person-to-application video calling. Its notion is to enable conversational services

26
that connect people with each other in a more dynamic way than classic peer-to-peer video
calling does. Examples are video dating, online conferences, ad-hoc groups, or group
coordination similar to Push to Talk over Cellular (PoC). Using P2PoA, callers use an IVVR
application to find and select a person or a group with whom they would like to
communicate, and then initiate a conversation without placing a new call but rather letting a
special MCU or gateway server interconnect them.
6.4 Direction, Interaction and Conversation
Atul and Tsuhan take the dimensions of direction, interaction and conversation into account
when assessing mobile services (51). Based on their approach, these dimensions are
discussed with the focus on IVVR applications and mobile games.
The direction of communication of an IVVR application or game can be either unidirectional
or bidirectional. Even though in a 3G-324M session both parties can automatically send and
receive their media streams, it does not indicate that the other party will consider or process
the incoming media stream. The unidirectional case can have two variations: On one hand,
the mobile terminal can play-back incoming media streams; examples are security
monitoring, live TV, or traffic surveillance. On the other hand, the mobile terminal is sending
a media stream to another mobile user, an application server, or gateway to other networks;
examples are sending of videos or photos to media-sharing communities or 3G to TV live
linking if no satellite connection or professional equipment is on site. However, the
dimension of direction is hardly useful to describe mobile games, as it depends on the point
of view what direction a game takes. Consider a single-player game over a 3G video call. The
gaming server generates the game and streams it to the players handset, making it a
predominately unidirectional communication. However, the player controls the game by
sending DTMF tones, making the communication, in a sense, bidirectionally.
The idea of interaction has generally two extremes: An application can be interactive or non-
interactive. By definition, video games are always interactive; however, the rate of interaction
heavily depends on the type of game. Therefore, we need a graduate definition of interaction.
There are slow-paced games with a minimum rate of interaction, and there are games that
require a high amount of fast interaction. Games that require a high rate of interaction are
generally real-time strategy games, shooters, or real-time sports simulations. An example for
a game requiring a minimum amount of interaction was covered in Section 5.

27
In this context, the understanding of the term conversational always refers to a
communication among humans; a service enabling a human to communicate with a weak or
even strong AI is not to be considered conversational. We have already learned that the focus
of 3G-324M is to provide conversational services. In contrast to video telephony and video
conferencing, which are typically conversational, IVVR applications and games are not
always conversational.
Most IVVR applications listed in Section 4such as IVR Supplements, Mobile TV or
Mobile Bankingare non-conversational, as the caller does not perform a conversation with
another person. Typical conversational IVVR applications include P2P Video Avatar and
certain types of mobile games. It is not part of this thesis to discuss whether playing a
multiplayer game is already some type of communication among the players; it is presumed
that a multiplayer game is not automatically conversational just because the players could use
game objects as a medium of communication.
A mobile game can have a conversational character when players use text, voice, or even
video chatting. Using such a communication channel can be helpful when teammates need to
coordinate their activities in ego-shooters or when a group of players needs to develop a
strategy for defeating a challenge in a dungeon of a MMORPG.
6.5 All-IP Approach
In reference to the UMTS service classification from Etoh, IVVR is a service of the circuit
teleservices group as it operates in the CS domain and consist of simple video calls. This
classification will be obsolete when the IMS will have withered away the distinction between
PS and CS; video telephony, like any other UMTS application and service, will be based on
the IMS, generally using an all-IP approach. Issues such as the need for cross-media
adaption, transcoding or gateways interconnecting different types of networks (introduced in
the subsections above) will become less important when all devices use the same IP bearer
technology.
6.6 Mobile Games over 3G Video Calling Defined
According to the definitions, concepts and approaches above, Mobile Games over 3G Video
Calling can be defined as follows:


28
Mobile Games over 3G Video Calling is an interactive person-to-application IVVR service
and describes video games played on a 3G handset by establishing a simple 3G video call. By
accessing and playing the game over a 3G video call, the handset turns into a thin gaming
terminal. The game itself is processed, and its sound and graphics are generated on a remote
gaming server. These mobile games can be controlled with the terminals built-in camera,
microphone, or keypad while the game graphics and sounds are streamed to the gaming
terminal. The concept of Mobile Games over 3G Video Calling exploits 3G-324M
technology for circuit-switched conversational multimedia servicesalso known as video
telephony or video conferencingof todays 3G infrastructure to create new types of services
with existing technology. Network operators and standardisation bodies are working on the
4G technology IP Multimedia Subsystem (IMS) that will replace, among others, the circuit-
switched video telephony service with an all-IP version. This evolution will counter
limitations and drawbacks of the current bearer technology opening up more possibilities for
streamed games.

29
Chapter 2 Usability and Design Opportunities of IVVR
Usability studies for IVVR face challenges in two major stages in the foreseeable evolution
of IVVR to Mobile Rich Media. In the first stage, designers, information architects, and
usability engineers need to cope with the limitations of the current bearer technology 3G-
324M and its employed media codecs.
In Chapter 1, we saw that W-CDMA networks are inherently capricious. They provide a
inconstant and unreliable bandwidth, face round-trip delays of around 200ms, and suffer from
bit errors and mobile noise. The effects of transmission errors were impressively shown in
Figure 3. Codecs used for 3G-324M communication are taking counter measures to lessen the
effects of wireless characteristics such as high delays, poor video quality with limited frame
rates, and quality problems when fast screen changes occur. However, there can still be
negative impacts on the user experience. By applying good practices, the impact of those
effects can be lessened, and designers can create easy-to-use interfaces and applications. 3G-
324M video telephony restrictions not only need to be considered when making decisions on
the visual design but also on the logical design of IVVR applications. In contrast, 3G video
calling offers new possibilities to create valuable services by exploiting its flexibility,
simplicity, and support for alterative interaction concepts.
Even when H.264/MPEG-4 AVS will be available as a video codec for 3G-324M-based
video telephony, it will not boost the visual experience, as H.264 was designed for natural
scenes like H.263
15
. Only in the second stage can IVVR applications benefit from new bearer
technologies, codecs and multimedia frameworks. The second stage will take off when the
IMS will be deployed and new media codecs from the MPEG-4 suite of standards will be
available and in use. MPEG-4 Part 20 in particularalso known as MPEG-4 Lightweight
Application Scene Representation (LASeR) is aimed for the delivery of rich media
applications to handset users (LASeR Interest Group). In this upcoming stage, usability
engineers will have a flexible framework for the creation and delivery of IVVR applications.
Most probably, the acronym IVVR will be withered away by that time and changed to Mobile
Rich Media to give a catchier term to users, managers, and marketeers.
This chapter focuses on todays usability challenges of IVVR that might also be useful for
Mobile Rich Media applications in NGN.

15
Moreover, baseline H.264 has no improvements over H.263.

30
7 Considerations about Mobile Video Telephony
Even with a great deal of marketing, early attempts to convert users to the video telephony
technology flopped (Jones and Marsden). In contrast, desktop video conferencing is
incredibly popular for private person-to-person conversations and widely used for video
conferencing in business environments such as telepresence for computer-supported
cooperative work (CSCW) (Kleinen).
In desktop video conferencing scenarios, typically a stationary computer is used. Camera and
microphone are fixed and usually maintain the same distance from the person participating
during the conversation. Moreover, lighting conditions are generally better than on-the-go,
as a desktop is easier to illuminate correctly than a scene in the mobile environment. When
performing mobile video telephony, lighting conditions change over time when the caller
moves or the environment changes; moreover, the camera is usually not fixed. During mobile
video telephony, the caller is likely to hold the handset in front of his face by extending his
arm, making the video wiggly. In combination with the meagre bandwidth and low-resolution
video, this can considerably degrade the video quality shown on the callees side. These
considerations about the video quality problems in the mobile environment also play a major
role in IVVR applications that take advantage of the instant video streaming capabilities that
3G-324M video telephony offers. Bad video quality negatively influences camera-based
games, gesture recognition, or P2P services that intentionally change the video for dynamic
video overlays such as for the P2P Avatar, because motion analysis algorithms perform better
with a sharp and clear video signal.
In desktop video conferencing, the video conferencing application is normally bundled to an
Instant Messaging software that includes text chat capabilities. Users can appoint or
prearrange a video conference using textual chat. In contrast, the current evolution of video
telephony in UMTS networks based on the circuit-switched 3G-324M service does not
seamlessly combine video conferencing with other communication channels. The notion of
video telephony in the mobile environment is nearer to standard voice calling than in the
stationary world. Therefore, it is more likely that somebody will place a video call without
prior announcement. This leads to privacy and inconvenience concerns. The callee might not
want to be seen during a conversation for a variety reasons: A video call turns you ugly
(Harlow) because the build-in cameras are usually not placed just above the users line of
sight but in the suboptimal position below the nose. Further, the video quality is meagre, and
lightning conditions are poor. People might feel that exposing their face over a video call

31
invades their privacy and, most times, callees do not want callers to see how he or she looks.
Furthermore, the use of video telephony can depend on social factors. Societies in South East
Asian countriesfor example, Malaysiaare considered non-confrontational. This can be
seen when people make decisions on which channel they use for communication. The
authors experiences in South East Asia revealed that most people prefer non-confrontational
communication such as SMS, instant messaging or e-mail, even in the business environment
or with good friends. Voice calling is avoided as much as possible for a first or unexpected
contact. It is obvious that P2P video calling is considered even more intrusiveand therefore
unlikely to succeed in these societies.
According to an informal research of Sachendra Yadav (Yadav), opinion leaders and
technology experts feel that video calling does not add much to a conversation compared to
voice calling. In comparison to desktop video conferencing, which is mostly free nowadays,
the cost-benefit analysis leads to resistance for using mobile video telephony.
For many reasons, 3G video telephony as a person-to-person conversational service is not as
successful as projected. The existing technical foundation for video calling can be used to
deliver IVVR services. A wide range of IVVR applications is imaginable, and some service
providers and network operators already deploy them. Furthermore, special IVVR
applications such as P2P Video Avatar can even compensate the drawbacks of classic P2P
video telephony, making P2P-alike video telephony successful after all.
8 Usability Opportunities and Design Rules for IVVR
Most challenges in mobile user-interface design and usability engineering for mobile
applications and services originate from the platform characteristics of mobile phones, of
course. Mobile Interaction Designers such as Matt Jones and Gary Marsden, and Handheld
Usability evangelists like Scott Weiss have already invented and applied ways to get around
mobile platform limitationsa small screen, lack of a full-blown keyboard, limited
processing power, and restricted storage and memory capabilitiesand have summarized
them in their books. Most of their findings can also enhance the usability of IVVR
applications.
8.1 Simplicity
The major advantage of IVVR is its simplicity and easy-to-catch manner. Consumers can
instantly access IVVR services with any standard 3G camera phone by simply placing a
video call to a special phone number. Users can create a list of IVVR applications for fast

32
access in their phones address book, similar to application homes for J2ME or Symbian
applications on their phone. Furthermore, following the notion of direct dialling, consumers
can use extension numbers to call-through to a screen of an IVVR application right away.
Providing shortcuts within the application is always a good idea to enable frequent users to
access quickly the functions they want. Like most IVR systems, IVVR IVR Supplements
should enable users to quickly type in numbers to go straight to a submenu without the need
to wait for each screen in between to appear.
8.2 Sounds
Generating media streams for IVVR is different from media used in person-to-person
communication. This not only applies to video but to sounds, too. Comfort noise is an
artificial background noise that fills the silence in a transmission. In person-to-person
communication, comfort noise is added generally on the receiving end so that the listener can
tell that the transmission is still connected during silent periods. For IVVR applications,
comfort noise is not recommended and should be disabled on the server-side. Moreover,
audio codecs defined for 3G-324M are targeted for speech coding; this limits the use of
artificial sounds (e.g., music) for IVVR applications. Sound designers are advised to create
sounds in the 8 kHz narrowband frequency spectrum.
8.3 Visual Design Rules
Applications cannot enlarge a small screen visually, but they can implement techniques that
virtually increase the size of the display. One way is providing horizontal or vertical scrolling
of the user interface to make new information visible while hiding other content. Another
idea is a Peephole display that shows a different portion of a bigger picture when the phone is
moved to the left, right, up or down (Jones and Marsden). Unfortunately, neither approach
works well with IVVR. There are no positional sensors usable with 3G-324M, and scrolling
requires fast screen updates with the ability to hold a key as long as the user wants to scroll.
High delays in the current 3G-324M deployment and lack of transmitting the information that
a key is hold for a time prevent the implementation of such features. However, applications
can have multiple layers, such as a deck of cards that can be shown or hidden depending on
the users selection. Furthermore, designs can take advantage of the media- streaming
capabilities and multimodal information channels by providing some information using
speech output, some with pictures or text, and others by using video sequences.

33
Mobile users demand visually attractive user interfaces that are clearly readable and intuitive
to use. Application flow design is beyond the scope of this thesis, and every application and
game will have its own characteristics to model and challenges to overcome. Nevertheless,
some basic guidelines for slide-based IVVR applications can be given.
Slide-based applications such as IVR Supplements shown in Figure 6 are best visually
designed using pixel-based image editors such as Adobe Photoshop. The video codecs used
in 3G-324M work in the YUV420 colour space, and the target image size is 176 x 144 pixels
(QCIF). With basic understanding of chroma-subsampling and how spatial and temporal
compression in video codecs works, designers can create slides that will compress well while
maintaining sharpness where essential. A precondition is to align the slides layout to a raster
of 16x16 pixels with one subdivision (8x8) as seen in the figure below:

Figure 8. IVVR application template with 16x16 raster
To ensure best readability despite video compression, designers should use sans-serif fonts.
Moreover, the font colour and the background colour should have a high difference in
luminance. The human eye can distinguish difference in luminance easier than in colour; this
fact is used by video compressors and is the foundation of chroma subsampling. For instance,
a white font on a light yellow background is already hardly readable without compression.
After compression, however, with only half of information available for colour differences
when using YUV420, the font will not be distinguishable from the background. The authors
experiments showed that especially Microsofts Calibri font creates a nice typeface even after
compression. Calibris subtly rounded stems and corners are perfect for H.263 DCT-based
compressors that create smooth edges. Note that the minimum font size is 18px when lossy
compressed in order to be readable for mobile users. We can only hope that next-generation

34
IVVR applications can use T.140 or similar ways to transmit ASCII-text directly, making
readability considerations obsolete.
8.4 Resuming Sessions
Video call set-up times are generally between 1 to 5 seconds
16
independently of the IVVR
application one is going to use, which is sometimes faster than the initialisation process of
complex J2ME applications. This makes quick on-the-go lookup or entry of information
pleasant. However, what happens when the caller needs to interrupt a gaming session or the
call is interrupted because of missing network coverage or exceeded battery life? Games
should enable users to start and stop with breaks in between, since the time they have to
spend may be brief (Weiss). Mobile games are used especially to pass time for just a couple
of minutes or even seconds when waiting for the bus, riding the subway, or to relieve
boredom during TV commercials. Therefore, all mobile applications need to apply ways to
interrupt a session and quickly resume to the last state as the user desires. This requirement
also applies to IVVR applications. As application and user data of IVVR programs can be
completely stored on the server-side, there are no limitations to auto-save program states or
record the users actions. With the callers unique phone number as an identifier, it is easy to
develop resumable applications.
8.5 Consistency and Multi-tap Text Entry
To create applications that are internally consistent, application designers should use not only
the same terms for the same things but should also think about consistent interaction concepts
across a suite of applications. For example, most IVVR applications will be controlled by
keystrokes as this is the simplest to implement and the most understandable and exact method
for consumers. Picture a standard 12-digit numpad, as seen in Figure 9:

Figure 9. 12-digit numpad

16
1-second call set-up when using MONA specified in H.324 Annex K.

35
A good practise is to reserve certain keys for standard functions such as back, menu/main,
and confirm/OK. As people in Western societies generally think from left to right, it seems
appropriate to use the star key (*) for the back function and the hash key (#) to confirm an
action or as OK button. The digit zero (0) can be used to return to the main page of an
application or to show a menu. Therefore, the application designer is left only with digits 1 to
9 for application-specific interaction such as option selection, game controls, or for input of
information. As screen size is limited and options need to be presented in a fairly large font to
be readable, 9 digits suffice for option selection anyway.
In the case of controlling a game, IVVR games can use the keys 2, 6, 8 and 4 for steering left
(4), right (6), or for accelerating (2) or breaking/backing up (8) as most mobile games
already do, for example. When it comes to information input, 9 digits is a fairly limited
number of keys to input text. The figure above does not only show a 12-digit numpad but the
so-called fastap keypad with eight numerals keys having three or four associated alphabetic
characters (Jones and Marsden).
By using multi-tap text entry or T9, users can type in text that is then transmitted by DTMF
signals and processed by an application component. Appendix A shows the source code of
my implementation of a multi-tap text entry component done in Actionscript. The component
receives DTMF tones from an intermediate PBX over a socket server to calculate which
characters the user likes to input. The script also features three modes: The caller can input
either numbers, text using multi-tap, or select options from the screen.
A challenge when typing in texts using multi-tap text entry method is the lag between
keystrokes and visual feedback on the screen, which is due to the well-known delays in
mobile networks. Using this method to input a large amount of information will make users
feel they are not in control of the system and lead inevitably to a sluggish, clunky user
experience. Unfortunately, there is no way around this when using keystrokes for text input.
8.6 Camera-Based Information Entry and Interaction
Alternatively to multi-tap text entry, application developers could implement speech
recognition mechanisms to convert spoken words to text. However, these mechanisms are
error prone; correcting recognition errors with speech commands is tedious and can lead to
new errors.

36
Another way to feed IVVR applications with information is to use the phones built-in
camera to transmit video, processing it on the server-side to extract information. In Section 4
already covered the conceivable applications that use video input, mostly for person-to-
person communication or recording of video clips for presentation in VoD portals. A more
advanced usage of the video transmission capabilities in IVVR services is camera-controlled
applications and games enabling new types of handheld game interactions. Optical flow
techniques can be used for position and orientation tracking (Bucolo, Billinghurst and
Sickinger). When the handset user moves the phone, the camera captures a scene from a
different perspective. The basic idea of optical flow techniques is to analyse the video feed
for changes and then calculate the direction and speed of the moving phone. Moreover,
captured video can be used for real-time mixed-reality applications by placing virtual
objects in the real scene. Games like Mosquito Hunt apply this interaction model where the
movement of the phone is used to position a crosshair in a mixed-reality environment to
shoot mosquitoes.
Another way of using the camera video for interaction is motion detection in front of a fixed
scene. A user can use gestures to control an application, or a player can use an object or
simply his hands to control a game character. The mentioned techniques can be used
appropriately only when the user gets real-time feedback about the movement on the screen
to adapt his next motions accordingly. The current version of IVVR is not capable to achieve
this, as there is always a lag of several hundreds of milliseconds between input and feedback.
Instead of using camera-interaction techniques requiring real-time feedback, application
designers can use simpler ways of camera-based interaction that rely only on the recognition
of a single piece of information where the capture and feedback phases are temporally
separated.
Optical machine-readable representations of data such as barcodes or data matrices are
inexpensive to produce, and computers can recognise them quite easily even if the video
quality is low, lightning conditions are imperfect, or the symbol is captured from different
angles
17
. Examples for data matrices, or so-called tags, are the QR-Code or Semacode. The
latter is in use, among other use cases, to visually encode URLs that can be captured by
mobile phone users to quickly access a URL. The use of Semacode for the camera-controlled
game 3GBattle is shown in Chapter 4.

17
This is performed with the help of code markers and block matching algorithms (Tran and Huang).

37
In general, applications designers are advised to design applications that do not rely on the
input of large amounts of information or else provide easy ways to do so. This can be
achieved by using voice commands or camera-based input techniques. However, when
applications are developed without comprehensive speech or image recognition technologies,
developers should focus on the presentation of information and provision of entertainment
that require minimal user input.

38
Chapter 3 Design of Mobile Games
Entertainment and gaming are ideal applications for mobile devices. They are nearly always
in hand, and games provide an easily accessible entertainment mechanism when users are
bored (Weiss). To create entertaining mobile games as IVVR services, game designers, game
developers, and solution providers have to overcome several obstacles. IVVR service
characteristics hinder the realisation of game concepts that require fast interaction, sharp user
interfaces, and sophisticated sound effects and music. Moreover, hosting and programming
IVVR games is not as well developed as it is for games played using other technologies. In
addition to this thesis, the author of this paper is working on ways to find appropriate
solutions for the latter problem. Therefore, the author briefly describes the current state of the
findings, and then focuses on appropriate game concepts for IVVR that counter its limitations
and take advantage of its interaction opportunities.

39
9 Technical Foundation for IVVR Games
Commercially available IVVR appliances
18
still focus on dialog-based services and use
VoiceXML to create IVVR applications that are generally IVR Supplements, without
sophisticated features like real-time video generation or camera-based interaction. This
hinders the creation of advanced IVVR applications as mentioned in Section 4. Such
applications are generally founded on pre-recorded or pre-generated video sequences that are
played back based on speech commands or keystrokes. To create a more flexible solution that
is capable of delivering games and interactive services, sounds, and graphics need to be
generated in real time. The current research and testing results recommend the configuration
featured in figure 10.
analysis Simple System Architecture
Name: Si mpl e System Archi tecture
Author: Chri stoph Kperni ck
Versi on: 2. 1
Created: 04/08/2008 14:33:56
Updated: 21/02/2009 11:33:07
3G
Handset
BTS
Cal ler
MSC
3G-324M
Gateway
Game/Application
Server
Bil ling
Server
SI P
Regi strar
(PBX)
H.245 Cal l Control and DTMF
Audi o: AMR or AAC
A/V/DTMF Input
H.223 Bi t Streams
3G-324M
O Interf ace SS7
A Interface PCM-30
A Interface PCM-30
U_m Ai r I nterface
A/V Recepti on
Vi deo: H.263 or MPEG-4 Part 2 Si mpl e Profi l e
SIP for cal l control and
RTP for medi a transport
SI P
SI P

Figure 10. High-level system architecture for the delivery of dynamic IVVR services
Adobe Flex has evolved into a suite of technologies appropriate for the creation of rich media
applications and games. For the solution presented in this research, Flash is employed as the
Game Engine as it is very flexible and can be extended to create 3D games.
19
The Flash
application is not run on a handset, but the audio and video it creates are transmitted over a
3G-324M session to a handset so they can be controlled based on the handsets camera and
microphone signals and keystrokes. To achieve this, the Flash application runs on a

18
For example, DTG 3000 from Dilithium is a combination of MCU, 3G-324M Gateway, and transcoder.
19
By using the open source real-time 3D Engine Papervision3D.

40
Windows-based Game Server and the media streams are transmitted to the handset using a
3G-324M Gateway. The gateway consists of a digital telephony card, an Asterisk installation,
and the implementation of the 3G-324M protocol stack.
20
Adobes runtime engine for Flash
applications is called Adobe Flash Player. Although it is available for many operating
systems, its design approach is to present the application on the desktop of the machine that is
used to execute it. To present the output from the Flash application on a 3G-324M
Multimedia Terminal, the Game Server needs to provide Flashs output as RTP media
streams for the 3G-324M Gateway in order to transmit them over a circuit-switched link to a
mobile phone.
Medialooks Flash Source Filter
21
is a DirectShow filter capable of instantiating the Flash
runtime engine, executing an SWF application, and providing its output to other DirectShow
filters for further processing in a filter graph. Microsoft DirectShow does not include AMR,
H.263, or RTP encoder filters originally; therefore, this research uses VLC Player
22
with its
FFmpeg library to compress Flashs output with AMR speech codec and H.263 video codec.
Moreover, live555 library included in VLC Player is used as an RTP encoder. However, VLC
Player and the employed libraries are not compatible with DirectShow filter graphs. To
connect the Flash Source Filter with VLC, a special DirectShow filter acting as a bridge
between DirectShow and VLC from Sensoray is used.
To enable interaction with the Flash application based on DTMF signals from the 3G-324M
multimedia terminal, the author has written a simple XML socket server that transmits DTMF
signals relayed by the 3G-324M to a special Actionscript in Flash applications (see Appendix
A for example Actionscript).
In order to feed the IVVR Flash applications with the camera and microphone signals from a
caller, a special DirectShow filter needs to be developed that receives these signals via RTP
streams from the 3G-324M Gateway, emulating virtual Webcam and microphone devices on
the Game Server. As Flash is capable of processing and playing back input from Webcams

20
The stack was implemented by Sergio Garca Murillo and can be found at http://sip.fontventa.com/.
21
A trial version of this filter can be found at
http://www.medialooks.com/products/directshow_filters/flash_source.html
22
VLC Player can be found at http://www.videolan.org/vlc/

41
and microphones, motion detection
23
algorithms or speech recognition techniques can be
implemented for Flash-based IVVR games.
10 Appropriate Game Concepts
Mobile games played over 3G video calls should cope with IVVRs limitations and ideally
take advantage of its unique capabilities in order to be entertaining. But, in order to be
playable, interaction delay requirements of IVVR games should be lower than the overall
interaction delay of IVVR services as calculated and stated in Subsection 3.3. A game that
complies with the mentioned requirement has to be a slow-paced game that only requires a
low number of interactions.
More specifically, to deal with the overall interaction delay, the maximum rate of interaction
should be approximately one interaction per second. Slow-paced single-player games are as
suitable for IVVR as multiplayer games that are either turn-based or even asynchronous, with
gamers performing actions that do not have tight temporal restrictions (Koivisto and
Wenninger).
10.1 Visual Novels
Typical games that have a minimum of gameplay are visual novels and certain types of
mobile gambling. Visual novels are a subgenre of adventure games, featuring mostly static
graphics and written text. Moreover, most visual novels have multiple storylines with
different endings depending on the players choices at decision points (Wikipedia
contributors). Figure 11 shows a screenshot from the popular visual novel Brass
Restoration..

23
A popular example of Flashs motion detection ability can be seen in the game PlaydoJam
(http://www.playdojam.com/).

42

Figure 11. Screenshot from popular visual novel Brass Restoration.
To tailor the idea of visual novels to the IVVR world, static graphics and written text should
be substituted or complemented to take advantage of IVVRs capabilities and circumvent its
limitations. Written text should be avoided and substituted with voiceovers from narrators or
synthesised speech. Designers can also use sound effects and music as long as they can be
encoded properly with a narrowband speech codec. Static images can be replaced or
complemented with animated video sequences or short real-life video clips from actors. For
IVVR based visual novels, it would be advantageous to use IVVRs real-time video and
audio transmission capabilities to create a more immersive experience.
In combination with speech or melody recognition techniques, a visual novel could require
the player to hum a melody to influence the storyline or to solve a quest. Another way to
create a more involved environment in visual novels is to use a players voice or portrait to
adapt certain parts of the game. Besides placing a gamer-generated picture or video sequence
within the game, a gamer could virtually communicate with or give orders to game
characters. Using speech commands for controlling game characters, synthesized voice to
receive information, or speech for person-to-person communication is more practical than
using text since a mobile phone is not ideal for typing or reading texts. This is especially the
case when text cannot be displayed sharply and is lossy compressed within a video stream.
Moreover, when a player is on the move and needs his or her eyes for viewing the real-life
environment, voice is a safer option.
However, using voice chat in role-playing games could break the immersion (Koivisto and
Wenninger). Ideally, using voice and video that are adapted to the game environment could

43
increase the sense of immersion for a player. For example, changing a voice to make it lower
or placing a mask in front of ones face could create a more realistic game experience.
10.2 Mobile Gambling
Another genre suitable to be played over 3G video calls is Mobile Gambling, also known as
Remote Gambling. Simulating a slot machine or poker game is fairly simple and the results
can be highly addictive.
24
Such games can often be played on a brief per-session basis. User
interaction required for a slot machine simulation can be extremely low when the player only
has to spin the reels (see Figure 12). Poker is also ideal since slow reactions lead to winning
the game and concentration, retentiveness, and tactics are helpful.

Figure 12. IVVR slot machine to win coupons
10.3 IVVR Multiplayer Games
The aforementioned games need to implement mechanisms that allow players to suspend
gaming sessions and resume play when desired. Especially in multiplayer scenarios, game
developers need to discover ways to provide players with the freedom to interrupt a game
without displeasing other players. Further, game concepts should cope with the interaction
delays of IVVR technology. In single-player games, the game should automatically pause and
save the current game state when a user hangs-up the video call, resuming the last game state
when the user chooses to play again. Implementing a similar functionality in multiplayer
games is more challenging. As real-time shooters require rapid interaction, they are
inappropriate for the W-CDMA network. One way to cope with the high 3G network latency
is turn-based multiplayer games in which fast reactions to other players decisions are not
required. Moreover, some actions in a game can be performed asynchronously, such as
character development or adding new items for sale in ones in-game shop.

24
Federal laws and regulations in the country where this service would be offered need to be strictly regarded.

44
Event notifications combine well with asynchronous gameplay since it allows the game to
contact the player when a certain kind of change in the game state has occurred (Koivisto and
Wenninger) or when other players are ready to play. In the sense of push-communication,
gamers could be alerted by receiving a video call. When these alerts are received on a regular
basis and only from a limited number of friends, they can increase pervasiveness without
annoying the user. In such a case, users can decide whether they would like to accept the call
when they are ready to play. Combining the concepts of asynchronous gameplay, event
notifications, and a turn-based strategy helps developers to create entertaining multiplayer
games suitable for IVVR.
10.4 Parallel Reality Games
Real-time video transmission found in IVVR is a unique feature not as readily available in
other mobile technologies. As discussed in subsection 8.6, IVVR games can also be
controlled by using the handsets camera. In parallel reality games, the game takes place in
the virtual world and the real world. The basic idea is to motivate gamers to take actions in
real life because events in the real world affect the virtual world and vice versa (Koivisto and
Wenninger). A prominent example of this is location-based games, which are unfortunately
not as feasible with todays IVVR technology as with other mobile technologies due to the
lack of GPS information available for IVVR applications. However, using a handsets camera
to take pictures of buildings, objects, or symbols allows game designers to create parallel
reality games using IVVR technology. Such a game could require users to take pictures of
corporate symbols, distinctive buildings, or Semacode tags that are placed in cities or on
campuses, for example, to prove that they visited those places.

45
Chapter 4 Mobile Role-Playing Game: 3GBattle
3GBattle is a turn-based card battle game played with a 3G camera phone over a 3G video
call. Players need a set of physical game cards that can be bought from a shop or printed out
at home. In order to perform actions in the game, the player needs to select an appropriate
game card and hold it in front of the phones built-in rear camera, making 3GBattle a camera-
controlled mobile game. This IVVR game is a multiplayer game designed for player versus
player battles. Generally, games using physical cards need to be played in the same location,
where players sit together around a table. To play 3GBattle, players need a 3G camera phone
that is connected to an IVVR game server. This makes it possibleeven attractiveto play
the card-based battle game in remote locations.
Rapid prototyping is performed using easy-to-reach material like 2 decks of playing cards,
paper, and two 3G camera phones. This first prototyping phase focuses on gameplay and
assessing the feasibility of 3GBattles being played over 3G video calls. The early prototype
of 3GBattle is not intended to be a full-blown game, but a foundation for creating more
sophisticated card battle games based on IVVR technology. Subsequent prototyping stages
would include the development of machine-readable playing cards and a working prototype
using the configuration recommended in Section 9: Technical Foundation for IVVR Games.
The authors motivation for creating 3GBattle was to develop a game that exploits IVVRs
capabilities for camera-based games. Moreover, 3GBattle is inspired by fantasy role-playing
games like Dungeons & Dragons, the notion of controlling a game with camera-based user
interfaces (Tran and Huang), and the PlayStation 3 game The Eye of Judgement, which uses
the PlayStation Eye camera peripheral for capturing physical game cards that trigger battles
on a virtually augmented playing grid.

46
11 Early Prototype
In the first stage, the game is simulated using playing cards and equipment that is readily
available. For this simulation, two 3G camera phones that are connected to a W-CDMA
network are used. In addition, 2 decks of French style playing cards, 2 players, and 1 game
master are required.
11.1 Setting
From the decks, only A, 2-10, A, and 2-10 are needed. As seen in Figure 13, the 2
players are sitting back to back so that they cannot see each others cards, simulating the
situation in which 2 players are in remote locations. The game master is supervising the game
and performs the same tasks the IVVR application would in working prototypes.

Figure 13. 3GBattle prototype configuration
11.2 Game Concept
The game concept is fairly easy to understand: There are 10 character cards ( A and 2-10)
and 10 battle cards ( A and 2-10) available for each player, totalling 40 cards. The
numbers on the playing cards represent their power for battles during the game, and aces have
a nominal value of 1. The game master selects 6 different cards of each kind for the players,
making a hand of 6 character and 6 battle cards for each. In each of 6 rounds, players lay their

47
combination of character and battle cards in two phases. Players lay cards by holding them
in front of their handsets (rear) camera. Due to the lack of a MCU, only 2 handsets with a
P2P connection are available. Therefore, the 2 players pass the first handset back and forth,
and the game master monitors the game with the second handset. To determine who is
winning the current round, the combination of character and battle cards from the first player
is compared with the selection from the second player. That player with the highest
combination of cards wins. Laying the cards is performed in two phases; initially, the
character cards are laid and then presented to both players simultaneously. In the next phase,
each player can select a battle card and lays it. The game master monitors the selection of
cards on the display of his or her 3G handset and can calculate who has won the current
round based on the combination of character and battle cards each player has laid.
Afterwards, the game master notes the point difference on a scoreboard. The scoreboard is
also used to keep track of assigned and laid cards to prevent cheating. A game is won when a
player has a higher total score than the opponent.
Informal research and experiments revealed that 3GBattle is feasible, as the game rules are
very easy to understand, and the game concept is ideal for short gaming sessions of 3 to 5
minutes. However, attendees mentioned that the playing cards are too dry and that the only
fun of the early prototype is in quickly beating the opponent. However, a certain degree of
tactics is required to win a game: The beginning of the game is determined by luck, as cards
for players hands are assigned randomly. When character cards are laid, players need to
choose a battle card without knowing what battle card the other player will lay. This battle
card should be high enough to beat the opponent. Furthermore, by memorising cards the
opponent has already laid, the player can try to determine his or her opponents hand to
prudently choose character and battle cards to win the game.
Although this prototyping stage of 3GBattle seems very simple for an entertaining card-based
game and still lacks game elements that create an immersive atmosphere, it shows that card-
based games using 3G video call are feasible. Video resolution and delays were adequate to
capture and interpret the gaming cards. In order to make machine-based interpretation of
playing cards feasible, too, cards should show a overarching and distinguishable pattern.

48
12 Preparation for a Working Prototype
In order to prepare the creation of an IVVR game from the early prototype of 3GBattle, the
playing cards have to be machine-readable for camera-based interaction and a theme has to
be found that makes 3GBattle more enjoyable.
12.1 Machine-Readable Playing Cards
A widespread way to make information on physical objects machine-readable was mentioned
in subsection 8.6. For this prototype, Semacode tags are used. These tags represent numbers
from 1 to 10, as seen on the French style playing cards of the early prototype. A different tag
(see Figure 14) needs to be placed on each playing card.

Figure 14. Semacode tag representing number 1.
12.2 Theme
To make the game enjoyable, the author has created a number of example playing cards
partly based on popular TV series South Park. The theme is not to be meant violent or
offensive, but a parody and example for a thrilling gameplay. Characters were designed using
an online character generator
25
and illustrations for battle cards were designed with
Photoshop:

25
http://www.sp-studio.de/

49

Figure 15. Example character card 1. Figure 16. Example character card 2.


Figure 17. Example battle card 1. Figure 18. Example battle card 2.
Instead of simply calculating who has won a round, each battle in 3GBattle: South Park
should be visualised with a short animation. This animation should show the characters
fighting using attacks according to the laid battle cards, followed by a cheer of triumph for
the character that has won the round. The battle sequences, combined with the illustrated
playing cards and perhaps a short introductory story around the game, should create a
pleasant atmosphere for the players.

50
13 Further Improvements
To enhance gameplay and make 3GBattle more immersive, players should be able to develop
their own characters using a character generator. For refinancing or profit reasons, an in-
game shop could be offered where players could buy new accessories or clothes for their
characters. These character enhancements should only be ornamentalespecially concerning
the battle sequencesnot meant to improve the characters strength. To enhance the players
enjoyment of interaction and challenges with other individuals, the game could open a
bidirectional voice channel for in-game conversation and provide a high score table.
Furthermore, to make 3GBattle playable on the go, the playing cards should be riveted like a
fan so that a player can easy select a card with one hand while holding the handset in the
other.

51
Conclusion
Todays IVVR technology is a mixed blessing, but its core concepts foreshadow future thin-
client services that combine real-time multimedia streaming and sophisticated interaction
concepts for a unique user experience. On the one hand, with IVVR applications and games
based on 3G video telephony, developers can overcome interoperability issues and design
games that are not subject to memory or processing power limitations; games that have
multiplayer functionality, as well as full layout and application flow control. Service
providers do not need to bother with content protection, can implement time-based billing
mechanisms, and can provide seamless connectivity with Web sources or IP users, all without
disrupting a call session. Consumers can benefit from highly accessible services, security by
nature, the availability of 3G handsets, and large 3G coverage.
IVVR makes mixed reality games possible, and can boost social communication by providing
live video feeds to friends and viewers on mobile phones or the Web. On the other hand,
IVVR based on the current evolution of mobile video telephony (3G-324M) suffers from a
number of drawbacks that hinder the popularity of its services. There are various technical
limitations, mostly due to the wireless characteristics in 3G networks, such as limited
bandwidth and round-trip delays that are unacceptable for fast-paced games and applications
with a high rate of interaction. Proponents of IVVR need to bear in mind that the underlying
technology, codecs, and quality of service criteria were developed for person-to-person
communication, not for the delivery of interactive applications that have different capability
and quality of service requirements.
Todays IVVR technology based on circuit-switched 3G video telephony is just the
preliminary stage of a new era of Mobile Rich Media and Mobile Game Streaming that uses
upcoming technologies such as the 3GPP IP Multimedia Subsystem with MPEG-4 LaSER. In
the future, bandwidth will increase, round-trip delays will decrease, and data protocols for
text and lossless graphic transmission will become available. Moreover, accessing Mobile
Rich Media will not be bound to per-minute billing, but will be charged as any other data
service, making it affordable for consumer.
The IVVR application examples, considerations about appropriate game concepts, and
interaction opportunities presented in this paper are even more feasible with next-generation
Mobile Rich Media and IP technologies such as SIP and RTP. The concept of a Flash-based
IVVR system architecture founded on IP technology is therefore independent from 3G-324M.

52
The high penetration of VVoIP clients on stationary computers can already open a market for
streamed games and applications. VVoIP is generally used with broadband Internet
connections and high-quality media codecs, and data protocols are available to make it an
appropriate platform to create interactive streaming services.
14 Further Studies
As discussed in this thesis, development of IVVR mobile technology requires extensive effort
to find appropriate game concepts and circumvent limitations. Therefore, future studies
should focus on Mobile Rich Media and Mobile Game Streaming based on IMS and MPEG-4
LaSER. The use of bidirectional media streaming for gaming is a relatively unknown field of
study. Hence, games that take advantage of this concept are worth additional research, and
studies should be conducted to determine how they increase immersion. Furthermore, the
authors recommendation for an IVVR system able to deliver interactive real-time games
should be completed in order to run a prototype of 3GBattle.

53
Appendices
Appendix A: Source Code of Multi-tap Text Entry with Actionscript
/** Stop at the interactive home page **/
this.stop();

/** Socket Connection **/
var ipAdress:String = '192.168.1.104';
var port = 8099;
var connected = false;

//Creating new socket and connection
keySocket = new XMLSocket();
keySocket.connect(ipAdress, port);

/**
* Socket onConnect handler
*/
keySocket.onConnect = function(success) {
if (success) {
connected = true;
_root.digitInput.text = ':)';
showNavigation();
} else {
connected = false;
_root.digitInput.text = ':(';
}
};

/**
* Socket onClose handler
*/
keySocket.onClose = function() {
connected = false;
_root.digitInput.text = ':|';
};

/**
* Socket onClose handler
*/
XMLSocket.prototype.onData = function(socketMessage) {
//Symbol is number 0-9, * or #
var symbol:String = socketMessage;
processKeyStroke(symbol);
};


/* Key Stroke Processing */

//This indicates if the text field should be cleared first
var firstSymbolEntered = false;

//Input modes
var INPUT_MODE_NUMBER = 1;
var INPUT_MODE_TEXT = 2;
var INPUT_MODE_NAVIGATE = 3;

var inputMode = INPUT_MODE_NAVIGATE;
this.moMultiBar.input_mode.text = "NAV";

54

/**
* Processing the key strokes and dispatching.
* E.g. changing input mode, calling multi-tap etc.
*/
function processKeyStroke(key) {

switch(key) {

case '#':
changeInputMode();
resetMultiTap();
break;

case '0':
switch(inputMode) {
case INPUT_MODE_NUMBER:
typeSymbol(key);
break;

case INPUT_MODE_TEXT:
resetMultiTap();
submitText();
break;

case INPUT_MODE_NAVIGATE:
resetMultiTap();
doAction(key);
break;
}
break;

case '*':
switch(inputMode) {
case INPUT_MODE_NUMBER:
resetMultiTap();
backspace();
break;

case INPUT_MODE_TEXT:
resetMultiTap();
backspace();
break;

case INPUT_MODE_NAVIGATE:
resetMultiTap();
goBack();
break;
}

break;

default:

switch(inputMode) {
case INPUT_MODE_NUMBER:
typeSymbol(key);
break;

case INPUT_MODE_TEXT:
multiTap(key);

55
break;

case INPUT_MODE_NAVIGATE:
doAction(key);
break;
}

break;

}

}

function changeInputMode() {
switch(inputMode) {
case INPUT_MODE_NUMBER:
inputMode = INPUT_MODE_TEXT;
this.moMultiBar.input_mode.text = "abc";
showHelp();
firstSymbolEntered = false;
break;

case INPUT_MODE_TEXT:
inputMode = INPUT_MODE_NAVIGATE;
this.moMultiBar.input_mode.text = "NAV";
showNavigation();
break;

case INPUT_MODE_NAVIGATE:
inputMode = INPUT_MODE_NUMBER;
this.moMultiBar.input_mode.text = "123";
showHelp();
firstSymbolEntered = false;
break;

}

}

function backspace() {
if(_root.digitInput.text.length > 0) {
var subStrEnd = _root.digitInput.text.length - 1;
_root.digitInput.text = _root.digitInput.text.substr(0, subStrEnd);
}
}

function typeSymbol(symbol) {
if(!firstSymbolEntered) {
firstSymbolEntered = true;
_root.digitInput.text = symbol;
} else {
_root.digitInput.text += symbol;
}
}


var SWITCH_DELAY = 1000;
var lastKeyPressTime = 0;
var lastKey = null;
var keyPressedTimes = 0;
var keyPosition = -1;

56
var currentChar = null;
//Symbol mapping of keys
var keys:Array = new Array(
new Array(" ", ".", "!", "1"), //1
new Array("a", "b", "c", "2"), //2
new Array("d", "e", "f", "3"), //3
new Array("g", "h", "i", "4"), //4
new Array("j", "k", "l", "5"), //5
new Array("m", "n", "o", "6"), //6
new Array("p", "q", "r", "s", "7"), //7
new Array("t", "u", "v", "8"), //8
new Array("w", "x", "y", "z", "9") //9
);

//letters cycle round the option
function multiTap(key) {

//reset multi-tap status variables
if(lastKey != key) {
resetMultiTap();
}

var tmpDate:Date = new Date();
if(lastKeyPressTime > 0 && lastKeyPressTime + SWITCH_DELAY <
tmpDate.getTime()) {
resetMultiTap();
}

var tmpDate:Date = new Date();
lastKeyPressTime = tmpDate.getTime();
keyPosition = nextPosition(key, keyPosition);
currentChar = keys[key-1][keyPosition];

if(keyPressedTimes >= 1) {
changeCharacter(currentChar);
} else {
typeSymbol(currentChar);
}

keyPressedTimes++;
lastKey = key;

}

/**
* Reset multi-tap status variables
*/
function resetMultiTap() {

lastKeyPressTime = 0;
lastKey = null;
keyPressedTimes = 0;
keyPosition = -1;
currentChar = null;

}

/**
*
*/
function nextPosition(key, position) {

57

if(position < keys[key-1].length-1) {
position++;
} else {
position = 0;
}

return position;
}

function changeCharacter(character) {
if(firstSymbolEntered) {
backspace();
_root.digitInput.text += character;
} else {
firstSymbolEntered = true;
_root.digitInput.text = character;
}
}

function showHelp() {
_root.digitInput.text = "Info: Use multi-tapping to insert text!"
}


/** Navigation **/

function showNavigation() {
_root.digitInput.text = "1: Clip 1\n2: Clip 2";
}


function doAction(key) {

switch(key) {

case "1":
this.gotoAndStop(5);
break;

case "2":
this.gotoAndStop(10);
break;

case "0":
this.gotoAndStop(1);
break;

default:

_root.digitInput.text = key + " performed!";
var song_sound:Sound = new Sound();
song_sound.attachSound("logon_sound");
song_sound.start();

break;
}

}


58
Appendix B: Providers of IVVR and Related Services
Celudan Technologies, USA/Spain (http://www.celudan.com/)
CosmoCom, Inc.; USA (http://www.cosmocom.com/)
CreaLog Software Entwicklung und Beratung GmbH, Germany (http://www.crealog.com/)
Dialogic, Worldwide (http://www.dialogic.com/)
Dilithium, USA (http://www.dilithiumnetworks.com/)
Exit Games GmbH, Germany (http://www.exitgames.com/)
I6NET Solutions and Technologies, SL, Spain (http://www.i6net.com/)
Legion Interactive, Australia (http://www.legioninteractive.com.au/)
Mobile Communications Media Sdn. Bhd., Malaysia (http://www.mocome.net/)
Ugunduzi Ltd., Israel (http://www.ugunduzi.com/)
WHATEVER MOBILE GmbH, Germany (http://www.whatevermobile.com/)


59
Bibliography
3GPP. TS 22.105 Services and service capabilities. December 2008. 22 January 2009
<http://www.3gpp.org/ftp/Specs/html-info/22105.htm>.
. TS 26.110 3G-324M General description. Vers. Release 7. June 2007. 20 January 2009
<http://www.3gpp.org/ftp/Specs/html-info/26110.htm>.
. TS 26.111 Modifications to H.324. Vers. Release 7. June 2008. 20 January 2009
<http://www.3gpp.org/ftp/Specs/html-info/26111.htm>.
Anderson, Dean; Lamberson, Jim; Sensoray. Open Source VLC to Directshow Bridge. 2008.
12 January 2009 <http://www.sensoray.com/support/videoLan.htm>.
Atul, Puri and Chen Tsuhan. Multimedia Systems, Standards, and Networks. CRC Press,
2000.
Barth, Peter; Steffen, Thomas; FH Wiesbaden. Usability Lecture WS 04/05. Lecture.
Wiesbaden, 2004.
Basso, Andrea and Hari Kalva. Beyond 3G video mobile conversational services: An
overview of 3G-324M based messaging and streaming. IEEE ISMSE'04 (2004).
Bucolo, Sam, Mark Billinghurst and David Sickinger. User Experiences with Mobile Phone
Camera Game Interfaces. Christchurch, New Zealand: University of Canterbury, 2005.
CreaLog GmbH. CreaLog prsentiert erstes interaktives Videotelefon-Gewinnspiel
Deutschlands - Anrufer lenken den Rentier-Schlitten per Sprache. 17 December 2007. 9
February 2009 <http://www.crealog.com/de/news/archiv07.htm>.
. IVVR Mobile Game. Weihnachtsmann. 3G Video Call +49 (89) 381 55 555: CreaLog
GmbH, 2007.
Dahm, Markus. Grundlagen der Mensch-Computer-Interaktion. Mnchen: Pearson Studium,
2006.
Etoh, Minoru. Next Generation Mobile Systems 3G and Beyond. John Wiley & Sons, Ltd,
2005.
Furht, Borko and Mohammad Ilyas. Wireless Internet Handbook: Technologies, Standards,
and Applications (Internet and Communications). Auerbach Publications, 2003.

60
Hansen, Frode rbek. Real Time Video Transmission in UMTS. Postgraduate Thesis in
Information and Communication Technology. New Zealand: Agder University College, May
2001.
Harlow, Jo. Nokia S60 Summit. Barcelona, May 2008.
Holma, Harri and Antti Toskala. WCDMA for UMTS. Vol. Third Edition. John Wiley &
Sons, Ltd, 2004.
ITU-T. H.263 Video coding for low bit rate communication. Recommendation. 2005.
. ITU-T Recommendation H.324: Terminal for low bit-rate multimedia. 2005.
. LS Reply on H.324 Text Conversation. 2007.
ITU-T Study Group No. 16. Corrigendum to ITU-T Recommendation H.324. 2002.
Jabri, Marwan. Mobile Videotelefonie. telekom praxis (2005): 34-36.
Jones, Matt and Gary Marsden. Mobile Interaction Design. Chichester: John Wiley & Sons,
Ltd, 2006.
Kleinen, Barbara. Lecture FHTW Berlin. Computer Supported Cooperative Work. 2007.
Koivisto, Elina M.I. and Christian Wenninger. Enhanching Player Experience in
MMORPGs with Mobile Features. 2005.
Kwon, David and Peter Driessen. Error Concealment Techniques for H.263 Video
Transmission. IEEE (1999): 276-279.
LASeR Interest Group. Overview. 03 March 2006. 17 February 2009 <http://www.mpeg-
laser.org/html/overview_contextO.htm>.
Mirial s.u.r.l. 3G-to-TV Video Calling Solution. 2009. 28 January 2009
<http://www.mirial.com/pdf/Whitepaper/3G-to-TV_Video_Calls.pdf>.
Myers, David J. Mobile Video Telephony. McGraw-Hill Professional, 2004.
NMS Communications. 3G-324M Video Technology Overview. 2008. 28 January 2009
<http://www.nmscommunications.com/DevPlatforms/OpenAccess/Technologies/3G324Man
dIPVideo/TechnologyOverview.htm>.

61
Nokia. N96 Specifications. 2009. 30 January 2009 <http://www.nokia.co.uk/A4835651>.
Pias, Claus. Computer Spiel Welten. Dissertation. Professur Geschichte und Theorie
knstlicher Welten. Weimar, 2004.
RADVISION Ltd. 3G Powered 3G-324M Protocol. 2002.
Rber, Niklas and Maic Masuch. Playing Audio-Only Games, A compendium of interacting
with virtual, auditory worlds. Proceedings of DiGRA 2005 Conference. Magdeburg,
Germany: DiGRA, 2005.
Sang-Bong, Lee, et al. Error Concealment for 3G-324M Mobile Videophones Over a
WCDMA networks. unknown. IEEE. 6 Feburary 2009 <IEEE Xplore, Technische
Universitaet Berlin>.
Schiller, Jochen H. Mobile Communications. Vol. Second Edition. Pearson Eduction
Limited, 2003.
Shii. Visual Novel Terminology. 09 February 2009. 22 February 2009
<http://www.shii.org/translate/>.
Sommerville, Ian. Software Engineering. Vol. Eight Edition. Pearson Education Limited,
2007.
Sony Computer Entertainment America Inc. THE EYE OF JUDGMENT. 2008. 12 February
2009
<http://www.us.playstation.com/PS3/Games/THE_EYE_OF_JUDGMENT/Description>.
Tenomichi. About StreamMyGame. 2009. 26 January 2009
<http://www.streammygame.com/smg/modules.php?name=About>.
Tran, Khoa Nguyen and Zhiyong Huang. Design and Implementation of a Build-in Camera
based User Interface for Mobile Games. ACM Report. Perth: GRAPHITE, 2007.
Turner, Brough. Video over Mobile IP Operators Shoot Themselves in the Foot. February
2008. 11 February 2009 <http://www.tmcnet.com/voip/0208/next-wave-redux-video-over-
mobile-ip-operators-shoot-themselves-in-the-foot.htm>.
Turner, Ian. Trens in Linguistic Technology. CallCenter International Issue 1 2009: 30-34.

62
Ugunduzi Ltd. Ugunduzi - IVVR Services Summary. 2008. 19 January 2009
<http://www.ugunduzi.com/IVVR_Services.html>.
UMTS Forum. UMTS Forum Report No. 11. 2000.
VocalTec. Deutsche Telekom ICSS and VocalTec announce solution for international VoIP
interconnection. 1 July 2008. 10 Feburary 2009 <http://ghs-
internet.telekom.de/dtag/cms/content/ICSS/en/374426;jsessionid=8B4D6A1DE09A119828B
0A3B89CD60663>.
Voip-Info contributors. Asterisk H324M. 24 November 2008. 21 January 2009
<http://www.voip-info.org/wiki/page_history.php?page_id=2104&preview=19>.
Weiss, Scott. Handheld Usability. New York: John Wiley & Sons, Ltd, 2002.
Wikipedia contributors. Jitter. 26 January 2009. 10 February 2009
<http://en.wikipedia.org/w/index.php?title=Jitter&oldid=266478536#Anti-jitter_circuits>.
. Visual novel. 08 February 2009. 22 February 2009
<http://en.wikipedia.org/w/index.php?title=Visual_novel&oldid=269225498>.
Yadav, Sachendra. Why havent Video Calls (Mobile Video Telephony) taken off? 11 June
2008. 30 January 2009 <http://sachendra.wordpress.com/2008/06/11/why-havent-video-calls-
mobile-video-telephony-taken-off/>.
You, Yilun, et al. Deploying and Evaluating a Mixed Reality Mobile Treasure Hunt:
Snap2Play. MobileHCI (2008): 335-338.



63
Acronyms
2G Second generation mobile networks, services and technologies
3G Third generation mobile networks, services and technologies
3GPP 3rd Generation Partnership Project
AAC Advanced Audio Coding
AL-PDU Adaption Layer Protocol Data Unit
AMR Adaptive Multi-Rate
BREW Binary Runtime Environment for Wireless
BRI Basic Rate Interface
BSS Base Station System
BTS Base Transceiver Station
CCSRL Control Channel Segmentation and Reassembly Layer
CRC Cyclic Redundancy Check
CS Circuit-Switched
DTMF Dual-Tone Multi-Frequency
GOB Group of Blocks
GoD Gaming on Demand
GPRS General Packet Radio Service
GPS Global Positioning System
GSTN Generalised Switched Telephone Network
HSDPA High-Speed Downlink Packet Access
HSS Home Subscriber Servers
IMS IP Multimedia Subsystem
ISDN Integrated Services Digital Network
ITU International Telecommunication Union
ITU-T Telecommunication Standardization Sector of ITU
IVR Interactive Voice Response
IVVR Interactive Voice and Video Response
J2ME Java 2 Micro Edition
LAN Local Area Network
LAPM Link Access Procedure for Modems
LASeR Lightweight Application Scene Representation
MCU Multipoint Control Unit
MS Mobile Station
MSC Mobile Switching Center
NB Narrowband
NGN Next Generation Network
N-ISDN Narrowband Integrated Services Digital Network
NMS Network Management Subsystem
NSRP Numbered Simple Retransmission Protocol
P2P Peer-to-peer
PBX Private Branch Exchange
PCM Pulse-Code Modulation
PDU Protocol Data Unit

64

PLMN Public Land Mobile Network
PRI Primary Rate Interface
PS Packet-Switched
PSTN Public Switched Telephone Network
QCIF Quarter Common Intermediate Format
QoS Quality-of-service
RNS Radio Network Subsystem
RTP Real-Time Transport Protocol
SDP Session Description Protocol
SIM Subscriber Identity Module
SIP Session Initiation Protocol
SMS Short Message Service
SRP Simple Retransmission Protocol
SS7 Signaling System #7
SWF Small Web Format for Flash Applications
UMTS Universal Mobile Telecommunications System
USIM UMTS SIM
VoIP Voice over IP
VVoIP Voice and Video over IP
WAP Wireless Application Protocol
WB Wideband
W-CDMA Wideband Code Division Multiple Access
XML Extensible Markup Language




Declaration of Independent Work

With this, I declare that I have written this
paper on my own, distinguished citations, and
used no other than the named sources and aids.


____________________________________ _______________
Signature Date