Abdulaleem Zaid Mohammed MFSKSM2010

PROTOTYPE DEVELOPMENT OF VOIP STEGANOGRAPHY
ABDULALEEM ZAID MOHAMMED AL-OTHMANI
A project report submitted in partial fulfillment of the requirements for the award of the degree of Master of Computer Science (Information Security)
Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia
NOVEMBER 2009
ii
Dedication
Dedicated to My beloved parents, my darling supportive wife, my gorgeous daughter, my dearest siblings and to all whom were beside me
iii
ACKNOWLEDGEMENT
First and foremost praise and gratitude be to ALLAH, almighty, without whose gracious help it would have been impossible to accomplish this work.
I was extraordinarily fortunate in having Prof. Dr. Azizah Bt. Abd Manaf as my supervisor in UTM. I would like to express my gratitude and appreciation to her, who has supported me throughout my project with her patience and knowledge whilst allowing me the room to work in my own way. I attribute the level of my Masters degree to her encouragement and effort and without her this project, too, would not have been completed or written. One simply could not wish for a better or friendlier supervisor.
Many thanks to each lecturer in UTM-CASE, they were my guidance to achieve my goals, they gave me all the support I need and were always kind. My entire study in this honorable institute was an everyday opportunity to acquire fine knowledge.
Furthermore, my special thanks go to my friend and colleague Saeed AlQahtani for his indispensable advices, valuable assistance and sincere companionship.
Finally and most importantly, words fail me to express my appreciation to my wife whose dedication, love and persistent confidence in me, has taken the load off my shoulder. I owe her for being unselfishly let her intelligence, passions, and ambitions collide with mine. I am deeply and forever indebted to my parents for their inseparable support, love and prayers throughout my entire life. I am also very grateful to all my brothers and sisters.
iv
ABSTRACT
Information Security has evolved significantly over the last decade and even more quickly over the last few years. Most organizations recognize that one of their most important assets is their data. Steganography is an effective means of hiding secret data, thereby protecting the data from unauthorized or unwanted viewing. Indeed, along with encryption, steganography is one of the fundamental ways by which data can be kept confidential. Hiding the desired secret information in seemingly innocent multimedia files or medium will avoid the need to secure communication channel when sending secret messages. The biggest challenge to steganography is how to increase the amount of information to be embedded in the host channel without affecting the properties of that channel while keeping this secret transmission invisible to unauthorized parties. To face this challenge, many methods, media, techniques and algorithms are being introduced. One of the new and promising communication medium that can be used as a host for steganography is Voice over Internet Protocol. VoIP is a form of communication that allows people to make phone calls over an internet connection instead of typical analogue telephone lines. VoIP characteristics, such as, real-time transmission, bi-directional nature and vast amount of data make it very appropriate medium to hide secret data. Few recent researches were conducted to elucidate the steganography techniques and theories that can be applied on VoIP. This project concerns available steganographic
techniques that can be used for creating covert channels for VoIP streams. Based on that, the proposed prototype is configured to apply some of these techniques in lab environment. Consequently, some suggested improvements and techniques are introduced in this project report to enhance VoIP steganography.
ABSTRAK
Keselamatan maklumat telah berevolusi pesat selama sedekad lalu, lebihlebih lagi beberapa tahun kebelakangan ini. Kebanyakan organisasi sedia maklum bahawa salah satu aset terpenting mereka ialah data. Steganography ialah salah satu asset cara berkesan untuk menyembunyikan data sulit dan melindunginya dari dilihat oleh pihak yang tidak berkenaan. Bersama encryption, steganography adalah salah satu langkah asas agar data kekal sulit. Dengan menyembunyikan data sulit di dalam fail multimedia biasa atau medium yang normal, saluran komunikasi tidak perlu dikawal ketat apabila menghantar mesej rahsia. Cabaran terbesar untuk steganography ialah bagaimana untuk menambah saiz informasi yang boleh dimasukkan ke dalam host channel dan pada masa yang sama mengekalkan ciriciri saluran tersebut supaya ia tidak dapat dilihat oleh pihak yang tidak berkenaan. Pelbagai cara, media, teknik dan algoritma sedang diperkenalkan untuk menghadapi cabaran ini. Salah satu daripada medium komunikasi yang baru dan berpotensi untuk digunakan sebagai perumah untuk steganography ialah Voice over Internet Protocol. VoIP ialah satu bentuk komunikasi online di mana pengguna boleh membuat panggilan telefon menggunakan Internet. Ciri-ciri VoIP iaitu real-time transmission, bi-directional nature dan jumlah data yang banyak membuatkan ia medium yang sesuai untuk menyembunyikan data sulit. Beberapa kajian terbaru telah dijalankan untuk menerangkan teori dan teknik steganography yang boleh digunakan untuk VoIP. Projek ini ialah berkenaan teknik steganography sedia ada yang boleh digunakan untuk mencipta saluran rahsia untuk VoIP. Berdasarkan teknik ini, satu prototaip cadangan telah dibina untuk mempraktikkan beberapa teknik tersebut di dalam lab environment. Seterusnya, beberapa cadangan
pengubahsuaian dan teknik diperkenalkan dalam laporan projek ini untuk memperbaiki VoIP steganography.
vi
TABLE OF CONTENTS
CHAPTER
TITLE
PAGE
declaration DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 INTRODUCTION 1.1 1.2 1.3 1.4 1.5 1.6 1.7 2 Overview Background of the Problem Problem Statement Project Aim Project Objectives Project Scope Summary
I II III IV V VI IX X 1 1 2 3 4 4 5 6 7 7 8 8 8 10 17
LITERATURE REVIEW 2.1 2.2 Introduction Voice over Internet Protocol 2.2.1 Definition 2.2.2 Historical Background of VoIP 2.2.3 Background Concepts 2.2.4 VoIP Benefits and Challenges
vii 2.2.5 VoIP Architecture 2.2.6 VoIP Solution Components 2.2.7 VoIP Protocols and Communication Flow 2.2.8 Establishing VoIP Connections with H.323 2.2.9 Establishing VoIP Connections with SIP 2.3 Steganography 2.3.1 Definition 2.3.2 Uses of Steganography 2.3.3 History of Steganography 2.3.4 Digital Steganography 2.3.5 Steganography Capacity and Robustness 2.3.6 Audio steganography 2.3.7 Digital Audio Signal 2.3.8 Methods of Audio Steganography 2.3.9 Real-Time steganography 2.4 VoIP Steganography 2.4.1 Previous Researches 2.5 3 Summary 21 25 26 36 37 39 39 40 41 42 43 44 46 47 50 54 54 58 59 59 60 61 61 63 67 67 68 68 69 69 72 75
RESEARCH METHODOLOGY 3.1 3.2 3.3 Introduction Research Approach Research Phases and Procedure 3.3.1 Study and Analyze 3.3.2 Prototype Design and Implementation 3.3.3 Suggestions for Capacity Enhancement 3.4 Summary
PROTOTYPE DESIGN AND IMPLEMENTATION 4.1 4.2 Introduction VoIP Call 4.2.1 Initialize VoIP Call 4.2.2 VoIP Conversation 4.2.3 Ending VoIP Call
viii 4.3 Embedding and Extracting 4.3.1 Embedding Process 4.3.2 Extracting Process 4.4 4.5 4.6 5 Voice Waveforms Drawing Presenting Results Summary 75 75 77 79 81 81 83 83 83 85 86 86 94 94 96 97 97 98 98 99 99 100 102
RESULTS AND DISCUSSION 5.1 5.2 5.3 Introduction Current Relevant Steganography Techniques Testing Approaches and Methods 5.3.1 Program Performance 5.3.2 Results Listing and Analyzing 5.4 5.5 5.6 Limitations of LSB Steganography Technique Suggested Steganography Techniques Summary
CONCLUSION AND FUTURE WORK 6.1 6.2 6.3 6.4 6.5 6.6 Introduction Project Summary and Conclusion Meeting Research Objectives Project Contribution Future Work Summary
REFERENCES
ix
LIST OF TABLES
TABLE NO.
TITLE
PAGE
2-1 2-2 5-1 5-2 5-3 5-4 5-5 5-6 5-7
ITU Encoding Standards Common VoIP Audio Codecs Summary Comparison between Steganography Techniques Initial Comparison between LSB Techniques Result of Hiding 100bytes of Secret Data Result of Hiding 1KB of Secret Data Result of Hiding 4KB of Secret Data Result of Hiding 8KB of Secret Data Suggested VoIP Techniques
15 52 85 89 90 91 91 92 95
LIST OF FIGURES
FIGURE NO.
TITLE
PAGE
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 3.1 3.2 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10
Packetization of Voice Traffic Network Stack VoIP Stack and Protocols An RTP Data Transfer Packet RSVP Merge H.323 Call Setup Process SIP Proxy Operation SIP Redirector Server Conflicting Requirements for Data Hiding Audio Signal Coding Hidden Communication Scenarios for VoIP Research Phases Prototype Main Tasks Prototype Design Phases Prototype General Architecture Receiver may accept or reject INVITE message Messages of VoIP Call Flowchart of "Send" Threat Flowchart of "Receive" Threat Flowchart of Embedding Process (1 LSB) Replacement of 1 LSB in Embedding Function Flowchart of Extracting Process (1 LSB) Screenshot of Waveform Drawing of VoIP Prototype
16 22 27 28 33 37 38 38 43 46 56 61 65 68 69 70 70 73 75 76 77 78 79
xi 4.11 5.1 5.2 5.3 5.4 Screenshot of Prototype GUI Showing Call Results Screenshot of Prototype Call Waveforms Time Used Measure Used VoIP Segments Measure Time Used Measure 81 88 92 93 93
CHAPTER 1
INTRODUCTION
1.1
Overview
Steganography is the process of hiding secret data inside other, normally transmitted data. In other words, as defined by [40] steganography means hiding of a secret message within an ordinary message during sending or transmission phases and its extraction at the destination point. Ideally, anyone scanning this information will fail to know whether it contains covert data or not.
Steganography applications conceal information in other, seemingly innocent media. Steganographic results may masquerade as other file for data types, be concealed within various media, or even hidden in network traffic or disk space. We are only limited by our imagination in the many ways information and data can be exploited to conceal additional information [49].
Steganographic techniques for hiding messages have been around for as long as cryptography and have evolved with technology. One of major advantages of steganography over simply scrambling messages using cryptographic techniques is that probable eavesdroppers don't know what to listen to and whether there is a hidden data or not [23]. A covert channel is one of the most popular steganographic techniques that can be applied in the networks.
2 The covert channel offers an opportunity to manipulate certain properties of the communications medium in an unanticipated, unconventional, or unforeseen way, in order to transmit information through the medium without detection by anyone other than the entities operating the covert channel [40].
Voice over Internet Protocol (VoIP), or IP telephony, is defined by [1] as a general term for a family of transmission technologies for delivery of voice communications over IP networks such as the Internet or other packet-
switched networks".
At the present time, VoIP is one of the most popular services in the Internet. It was introduced to the telecommunication market and since then, changed it completely and gradually. As it is used worldwide more freely, the traffic volume that it generates is still increasing. Because of its popularity, it is becoming a natural target for steganography. That is why VoIP is appropriate to enable hidden communication throughout IP networks [40]. This new area and medium of steganography is going to be one of the most popular media to be used to hide data over internet networking due to the relatively huge amount of data that could be hidden, the real time nature of VoIP calls and the bi-directional capability that is provided by such transmission.
1.2
Background of the Problem
No real-world steganographic method is perfect: no matter what the method is, the hidden information can be potentially revealed. Generally, the more hidden information is inserted into the normally transmitted data, the greater the chance it will be detected [43].
Because the number of steganographic methods and mechanisms which are applicable on VoIP streaming is large, steganography in VoIP should be considered
3 as an excellent, secure and efficient way to hide data over IP networking. Moreover, the fact that until now there is no single method to detect hidden data within VoIP streaming, VoIP steganography should be considered safe and invulnerable for more few years ahead.
Like conventional steganography, VoIP steganography techniques can be unsafe to organizations because of the risk of confidential information leakage [28]. These are also the reasons that make some researchers suggested that VoIP steganography should be treated as a threat to public security. It is thus important to understand the essential nature of various steganographic methods and, in effect, be able to construct effective steganalysis solutions.
Nevertheless, for good purposes, it is very useful to identify and find out better steganographic techniques and approach or to improve the existing ones to make hiding data within VoIP streams more secure, efficient and practical.
1.3
Problem Statement
Mazurczyk and Szczypiorski [44] stated that securing Internet Telephony is a complex process. This not only means the ability to make secure conversation between two communicating parties, but also the security of signaling messages used to make this call possible at all. It may also means the security of the steganographic or watermarking channels that may be used within the VoIP streaming signals.
Providing methods and techniques to hide data in a very popular medium like VoIP is a challenging goal. It is obvious that there are limitations on the amount of secret data that should be buried in VoIP streaming without affecting the overall voice stream during VoIP call. The fact that there are limitations on the rate of covert data makes it harder to introduce secure and practical techniques which provide
4 larger rate of impeded data. In addition, [16] stated that if covert data rate is too high it may cause voice quality deterioration and increased risk of detection.
The problem that this project is proposed to solve is how to provide or suggest efficient VoIP steganography techniques or improvements to increase the rate of the impeded secret data within VoIP stream without effecting the overall conversation process of the VoIP calls? How VoIP steganography system that applies one or more of these enhanced techniques could be implemented, keeping in mind that, to my best knowledge, until now, there is no known such system?
1.4
Project Aim
While the overall idea of using steganography to implement covert communications is not new, burying hidden message in internet phone calls represents the latest evolution of steganography [23].
The main aim of this project is to implement a simple prototype by utilizing some steganography techniques on VoIP streaming in order to investigate how to increase the overall rate of covert data without affecting the voice quality and with less rate of detection.
These investigations will be done in accordance with what have been found by other researchers in this field, even though it is still a very new area of research and there is only very small number of researches on this subject.
1.5
Project Objectives
The objectives of this project are:
1.
To study VoIP steganography, analyze its current techniques and discuss their performance in terms of capacity.
2.
To develop a simple lab-based system that performs VoIP steganography using one or more LSB techniques to illustrate the VoIP steganography process.
3.
To suggest effective ways to improve the capacity of covert data in VoIP steganography.
1.6
Project Scope
VoIP steganography covers a wide range of information hiding techniques; including popular techniques based on IP or TCP and others protocols. The main idea is to use free, redundant or unused fields of these protocols [43].
There are many techniques that could be used to hide communications in various layers of VoIP traffic. One of them is by taking advantage of unused or rarely used data fields.
In this research, I will study available covert channels that may be utilized for hidden communication for SIP protocol used as a signaling protocol for VoIP service. Moreover, new steganographic methods that were introduced very recently will be studied in this project. For each of these methods, this research will estimate potential bandwidth to later evaluate how much information may be transferred in a typical IP telephony call. I will study also the ability to use the payload of the voice packets in VoIP real-time steganography process.
As mentioned before, this project program will be designed and implemented to work in a lab-based environment. This means, the VoIP call will be limited to two hosts using point-to-point connection with static IP addresses. This is different from the real life VoIP applications which use client-server calls. As a result, the system
6 will be based on LAN network which provides an environment with almost zero noise. This will greatly help to get better VoIP steganography performance and correctness. Due to the complexity of VoIP steganography, only specific basic types of visual and statistical measures will be considered.
The most important success factors of this project are keeping the overall changes of the VoIP stream very small so that they can be hardly noticed while embedding a large amount of covert data and providing maximum extraction of the hidden information.
1.7
Summary
In this chapter, an introduction to this project is provided as well as the aim, and objectives of this project. The underlying background of the problem was explained as the reason to choose this topic of the project. The scope of this project was identified and the problem statement was declared.
CHAPTER 2
LITERATURE REVIEW
2.1
Introduction
In this chapter, literatures and relevant previous studies will be reviewed to introduce the VoIP steganography and its relevant architectures, protocols, security features and concerns.
This chapter is structured in the following order: The first part introduces and describes some basic concepts, terminology and architecture of Voice over IP (VoIP). The second part defines and introduces the concept of steganography and provides a brief history of this science. The third part discusses audio steganography especially using real-time steganography with RTP. This part ends with a literature survey focuses on VoIP steganography and the fields of enhancements and improvements.
8 2.2 Voice over Internet Protocol
2.2.1
Definition
According to [1], Voice over Internet Protocol (VoIP) is a general term for a family of transmission technologies for delivery of voice communications over IP networks such as the Internet or other packet-switched networks.
As explained in Section 1.1, VoIP is a technology that allows telephone calls to be made over computer networks like the Internet. VoIP converts analog voice signals into digital data packets and supports real-time, two-way transmission of conversations using Internet Protocol (IP).
In [3], it is stated that through IP technology, voice communications are digitized and then segmented into standard digital data payloads (i.e., batches or collections of data) that are, in turn, encapsulated within IP packets so they can be transmitted via the IP transport network. This process allows voice and other information to coexist in a single IP data network so it can be transmitted using shared equipment and communications lines.
VoIP systems usually interface with the traditional public switched telephone network (PSTN) to allow for transparent phone communications worldwide. Although VoIP has existed for several years, it has only recently begun to take off as a viable alternative to traditional voice systems and PSTN networks.
2.2.2
Historical Background of VoIP
According to [6] and [7], the first VoIP devices were found back in 1995, when a small telecom company called Vocaltec released its first Internet phone software. This software was designed for a home
9 PC and used similar attachments like headsets, microphones, sound cards and speakers when the Internet was still a long way off from what it is today. The software was called "Internet Phone" and used the H.323 protocol instead of the SIP protocol that is more prevalent today. Even in those early days, though, this software was very well accepted; it suffered from the major drawback of non-availability of broadband and a resultant poor voice quality when compared to a normal telephone call.
At this point, things really began to move. As it was shown that there was a market for VoIP technology, a lot of people entered the game and started offering service - real time communication via the same "wire" that feeds your Internet connection. All of these things will drive more and more people to look into VoIP and how it can help better their lives.
In 1996, The Telecommunication Standardization Sector (ITU-T) began the standardization of VoIP initially with the H.323 standard. In the same year, US telecommunication companies ask the US Congress to ban Internet phone. As mentioned in [8], by 1998, VoIP traffic had grown to represent approximately 1% of all voice traffic in the US. However the advancement of technology was a big leap forward.
In 1999, the Session Initiation Protocol (SIP) specification RFC-2543 was released and subsequently, the first open source SIP PBX (Asterisk) was created by Mark Spencer of Digium. Networking manufactures such as Lucent and Cisco soon came out with software that could route and switch the VoIP traffic, and as a result, by the year 2000, VoIP traffic accounted for more than 3% of all voice traffic.
All the major issues concerning VoIP traffic were addressed in 2005 thus assuring the users of excellent voice quality and unbroken phone calls. The revenue that was made out of sale of VoIP equipment alone in the year has been around $3 billion, which was expected to reach around $8.5 billion by the close of 2008. In 2009, Skype released a Wi-Fi only application for the iPhone, Nowadays, major
10 voice quality issues have long since been addressed and VoIP traffic can be prioritized over data traffic to ensure reliable, clear sounding, unbroken telephone calls. Finally, VoIP is not a perfect technology or one that will solve all the world's problems, but it's something that can help a lot of different and positive ways.
2.2.3
Background Concepts
To introduce VoIP theory and its basic fundamentals, it is important to start with explaining and discussing some basic concepts and technologies behind VoIP existence. There are two fundamental technologies that are necessary for the existence of VoIP. The first, and most widely used, is the telephone (in which PSTN will be highlighted). The second, and the most growing and popular technology, is the Internet (in which Internet Protocol will be explained).
2.2.3.1
Public Switched Telephone Network
The Public Switched Telephone Network (PSTN) as defined according to [48], as the network of the world's public circuit-switched telephone networks, in much the same way that the Internet is the network of the world's public IPbased packet-switched networks. Originally a network of fixed-line analog telephone systems, the PSTN is now almost entirely digital, and now includes mobile as well as fixed telephones. The PSTN is largely governed by technical standards created by the ITU-T, and uses E.163/E.164 addresses (telephone numbers) for addressing.
The PSTN was the earliest example of traffic engineering to deliver Quality of Service (QoS) guarantees. A.K. Erlang (18781929) is credited with establishing the mathematical foundations of methods required to determine the amount and configuration of equipment and the number of personnel required to deliver a specific level of service. In the 1970s the telecommunications industry conceived
11 that digital services would follow much the same pattern as voice services, and conceived a vision of end-to-end circuit switched services, known as the Broadband Integrated Services Digital Network (B-ISDN). The B-ISDN vision has been overtaken by the disruptive technology of the Internet. Only the oldest parts of the telephone network still use analog technology for anything other than the last mile loop to the end user, and in recent years digital services have been increasingly rolled out to end users using services such as DSL, ISDN, FTTX and cable modem systems.
Many observers believe that the long term future of the PSTN is to be just one application of the Internet - however, the Internet has some way to go before this transition can be made. The QoS guarantee is one aspect that needs to be improved in the Voice over IP (VoIP) technology.
There are a number of large private telephone networks which are not linked to the PSTN, usually for military purposes. There are also private networks run by large companies which are linked to the PSTN only through limited gateways, like a large private branch exchange (PBX).
2.2.3.2
Internet Protocol
According to [3], the Internet Protocol (IP) is the method or protocol by which data is sent from one computer to another on the Internet. Each computer (known as a host) on the Internet has at least one IP address that uniquely identifies it from all other computers on the Internet. When you send or receive data (for example, an e-mail note or a Web page), the message gets divided into little chunks called packets. Each of these packets contains both the sender's Internet address and the receiver's address. Any packet is sent first to a gateway computer that understands a small part of the Internet. The gateway computer reads the destination address and forwards the packet to an adjacent gateway that in turn reads the destination address and so forth across the Internet until one gateway recognizes the
12 packet as belonging to a computer within its immediate neighborhood or domain. That gateway then forwards the packet directly to the computer whose address is specified.
Because a message is divided into a number of packets, each packet can, if necessary, be sent by a different route across the Internet. Packets can arrive in a different order than the order they were sent in. The Internet Protocol just delivers them. It's up to another protocol, the Transmission Control Protocol (TCP) to put them back in the right order.
IP is a connectionless protocol, which means that there is no continuing connection between the end points that are communicating. Each packet that travels through the Internet is treated as an independent unit of data without any relation to any other unit of data. (The reason the packets do get put in the right order is because of TCP, the connection-oriented protocol that keeps track of the packet sequence in a message.) In the Open Systems Interconnection (OSI) communication model, IP is in layer 3, the Networking Layer.
2.2.3.3
Basic PSTN and VoIP Network Functions
Before diving into the details of VoIP networking components and technologies, it is important to understand the basic network functions that make voice services possible. The Voice over IP 101 [2] describes how PSTN and VoIP networks use:
Database services to locate endpoints and translate between the addressing schemes used in two (usually heterogeneous) networks Signaling to coordinate the actions of the various networking components needed to complete a call between two endpoints
13 Call connect and disconnect (bearer control) mechanisms to transport audio content Coder-decoder (CODEC) operations to convert analog waveforms to digital information for transport
2.2.3.3.1
Database Services
PSTN and VoIP networks use database services to locate the endpoints for a given call and to translate between the addressing schemes used by two (usually heterogeneous) networks. These database services typically include:
A call control database that contains the address mappings and translations for the calls endpoints (the end users phones) A report function that generates transaction reports to support billing Additional logic that can provide network security for example, the ability to prevent a specific endpoint from making international calls
Unlike the PSTN, which identifies endpoints by their phone number, VoIP networks identify endpoints by their IP address and port number. Some VoIP networks use a Domain Name System (DNS) to abstract addresses. PSTN and VoIP networks couple these database services with call state control and signaling to coordinate the activities of their network elements.
2.2.3.3.2
Signaling
Signaling enables individual network devices to communicate with one another. Both PSTN and VoIP networks rely on signaling to activate and coordinate the various components needed to complete a call.
14 In a PSTN network, phones communicate with a time-division multiplexed (TDM) Class 5 switch or traditional digital private branch exchange (PBX) for call connection and call routing purposes.
In a VoIP network, the VoIP components communicate with one another by exchanging IP datagram messages. The format of these messages may be dictated by any of several standard protocols. The most commonly used signaling protocols H.323, Session Initiation Protocol (SIP), which will be described later in this research.
2.2.3.3.3
Call Connection and Audio Transport Mechanisms
To complete a call, two endpoints must be able to open and sustain a communication session. The public or private switches in the PSTN complete calls by connecting logical Digital Signal-0 (DS0) channels through the network. Each DS-0 is a 64 Kbps bi-directional channel that the PSTN dedicates exclusively to the communication session for the duration of the call. The PSTN uses Pulse Code Modulation (PCM) to represent analog audio frequencies, enabling the network to transmit the audio payload through these DS-0 channels as a digitally encoded pulse amplitude value.
Like the PSTN, VoIP networks use PCM to encode the audio payload. However, instead of transmitting the audio payload directly over a dedicated DS-0 channel, VoIP networks transport the audio payload using shared network resources. To complete a connection, VoIP networks place a set of one or more PCM samples, known as frames, into an IP datagram. The VoIP solution formats the datagrams according to the Real-Time Transport Protocol (RTP), and then forwards them over a routed or packet-forwarded IP network. Because the IP network does not allocate resources specifically for these RTP packets, ensuring high-quality VoIP communications can pose a significant challenge to service providers and enterprises.
15 2.2.3.3.4 CODEC Operations
The fourth basic network function is the process the network uses to convert analog waveforms to digital information. Both PSTN and VoIP solutions use CODECs or voice coder-decoder (VOCODERs) to do this. The process that achieves this conversion is complex and well beyond the scope of this research. For the purposes of this discussion, however, it is sufficient to say that there are many ways to transform an analog voice signal all of which are governed by industry standards and most of which are based on PCM.
Each encoding scheme has its own history and merit. Each has its own bandwidth needs based on its compression capabilities. Table 2 lists some of the more important encoding standards covered by the International
Telecommunications Union. Notice the tradeoff that the standards make between encoding efficiency, reduced bandwidth consumption, and increased conversion delay.
Table 2-1: ITU Encoding Standards [2]

ITU Standard G.711 G.721 G.728 G.729 G.723 Description PCM ADPCM LD-CELP CS-ACELP Multi-rate CELP Bandwidth (Kbps) 64 32, 16, 24, 40 16 8 6, 3, 5, 3 Conversion Delay (ms) < 1.00 < 1.00 ~ 2.50 ~ 15.00 ~ 30.00
2.2.3.4
Circuit-Switched Networks vs. Data Networks
According to [4], the topology and behavior of circuit-switched networks and data networks are significantly different. Voice traffic carried over a system originally designed for data creates technical challenges that must be addressed to ensure a high quality of service (QoS). QoS refers to the ability of a network to
16 satisfy voice traffic and service requirements. Circuit-switched networks sacrifice maintenance costs for a high QoS, and data networks sacrifice QoS for bandwidth efficiency.
Circuit-switched networks have a high QoS because they dedicate bandwidth resources to each individual call for the duration of that call. However, dedicating permanent resources to an individual call creates significant cost disadvantages for circuit-switched networks. Additionally, because most voice conversation consists of pauses where no voice transmission takes place, circuit-switched networks cannot utilize available bandwidth efficiently, resulting in a cost disadvantage when compared to data networks.
Unlike circuit-switched networks, data networks do not dedicate permanent resources to an individual call between two locations. Instead, data is prepared for network transport by attaching classification information to the packet header, which is attached to the payload. The data network uses the path in the network that optimizes the utilization of available bandwidth. Figure 2-1 below illustrates how voice data can be prepared for traveling over a data network.
Figure 2-1: Packetization of Voice Traffic. [4]
17 2.2.4 VoIP Benefits and Challenges
2.2.4.1
Advantages of VoIP
From [6] we can conclude that many advantages of VoIP over traditional circuit switching telephone networks are summarized as follows:
The most important advantage of VoIP is its low cost as compared to PSTN. The VoIP providers charge a monthly fee for making calls within the area and a nominal charge for making calls outside this geographical area. While making from computer to computer it is generally at no cost. Interest in VoIP has grown in part because the technology can help both service providers and enterprises to reduce costs by using a single IP network for both data and voice applications. This is also the same important advantage that makes many others like in [2] to think that in the future, we will utilize VoIP more and more in our life. Routing phone calls over existing data networks avoids the need for separate voice and data networks. Putting the voice and data traffic on one set of wires instead of two also seems to make good commercial sense when compared to the cost of developing and supporting two separate infrastructures. Integration with other services available over the Internet, including video conversation, message or data file exchange in parallel with the conversation, audio conferencing, call forwarding, automatic redial, 3 way calling, speed dialing, caller ID, managing address books, and passing information about whether others (e.g., friends or colleagues) are available to interested parties. Such features that traditional telecommunication companies normally charge extra for, are available for free from open source VoIP implementations. Portability and location independence. Only an Internet connection is needed to get a connection to a VoIP provider. For instance, call center agents using VoIP phones can work from anywhere with a sufficiently fast and stable Internet connection.
18 Effective use of bandwidth which make it easy to transmit more than one telephone call over the same broadband connection. This can make VoIP a simple way to add an extra telephone line to a home or office. Secure calls using standardized protocols (such as Secure Real-time Transport Protocol.) Most of the difficulties of creating a secure phone connection over traditional phone lines, like digitizing and digital transmission, are already in place with VoIP. It is only necessary to encrypt and authenticate the existing data stream.
2.2.4.2
Disadvantages and Challenges of VoIP
In [6] a list of several VoIP drawbacks, disadvantages and challenges is introduced. The most important ones are:
2.2.4.2.1
Security
As a computer-based technology, Voice over Internet Protocol telephone systems (VoIP) are as susceptible to attacks as PCs. This means that hackers who know about these vulnerabilities can institute denial-of-service attacks, harvest customer data, record conversations and break into voice mailboxes. As voice and data both need to be protected, added security is required. There are many ways intruders could attack the VoIP system. Many consumer VoIP solutions do not support encryption yet although having a secure phone is much easier to implement with VoIP than traditional phone lines. As a result it is relatively easy to eavesdrop on VoIP calls and even change their content. Another challenge is routing VoIP traffic through firewalls and network address translators.
There are several open source solutions that facilitate sniffing of VoIP conversations. A small amount of security is afforded due to patented audio codecs
19 that are not easily available for open source applications; however such security through obscurity has not proven effective in the long run in other fields. Some vendors also use compression to make eavesdropping more difficult. However, real security requires encryption and cryptographic authentication, which are not widely available at a consumer level.
2.2.4.2.2
Quality of Service
Quality is an important issue in case of VoIP. Because the underlying IP network is inherently unreliable, in contrast to the circuit-switched public telephone network, and does not inherently provide a mechanism to ensure that data packets are delivered in sequential order, or provide Quality of Service (QoS) guarantees, VoIP implementations face problems mitigating latency and jitter.
Some broadband connections may have less than desirable quality. Where IP packets are lost or delayed at any point in the network between VoIP users, there will be a momentary dropout of voice. This is more noticeable in highly congested networks and/or where there are long distances and/or internetworking between end points.
A number of protocols have been defined to support the reporting of QoS/QoE for VoIP calls. Packet prioritization, Queuing, and Traffic Shapers are suggested methods for making communication more reliable and ensuring that temporary failures have less impact on communication quality.
2.2.4.2.3
Emergency Calls
The nature of IP makes it difficult to locate network users geographically. Emergency calls, therefore, cannot easily be routed to a nearby call center, and are
20 impossible on some VoIP systems. Sometimes, VoIP systems may route emergency calls to a non-emergency phone line at the intended department. Moreover, in the event that the caller is unable to give an address, emergency services may be unable to locate them in any other way.
A fixed line phone has a direct relationship between a telephone number and a physical location. A telephone number represents one pair of wires that links a location to the telco's exchange. Once a line is connected, the telco stores the home address that relates to the wires, and this relationship will rarely change. If an emergency call comes from that number, then the physical location is known.
2.2.4.2.4
Susceptibility to Power Interruption
Conventional phones and telephones for traditional residential analog service are usually connected directly to telephone company phone lines which provide direct current to power most basic analog handsets independently of locally available power. In the event of a power failure, these phones are kept functioning by back-up generators or batteries located at the telephone exchange. However, household VoIP hardware uses broadband modems and other equipment powered by household electricity, which may be subject to outages in the absence of an uninterruptible power supply or generator. Even with local power still available, the broadband carrier itself may experience outages as well.
The susceptibility of phone service to power failures is a common problem even with traditional analog service in areas where many customers purchase modern handset units that operate wirelessly to a base station, or that have other modern phone features, such as built-in voicemail or phone book features.
Using an uninterrupted power supply or generator can help in overcoming this problem. However, most broadband networks are less than 10 years old, and
21 even the best are still subject to intermittent outages. Furthermore, consumer network technologies such as cable and DSL often are not subject to the same restoration service levels as the PSTN
2.2.5
VoIP Architecture
2.2.5.1
Multimedia over Packet Oriented Networks
In the early days of networks, according to [11], most data was just textual data. Today, with the rise of multimedia applications and the growth of the World Wide Web, networks have to transport multimedia information as well. This causes some tasks that are not trivial.
Firstly, multimedia applications often need a greater bandwidth than textual data. Let us consider a movie that is replayed at 25 frames per second and with a resolution of 320x240 pixels in true color. Such a movie would need a bandwidth of approximately 5.7 MByte/s. This is unimaginable on a standard network many users are sharing.
Second, multimedia applications such as audio and video require the realtime traffic; because its data has to be played back continuously at the same rate they are sampled. If some data does not arrive in time the playback process will stop and the human ears and eyes can pick up the artifact. But in a traditional network some packets may be delivered with a longer delay for several reasons, primarily because of congestion. In this case real-time data may get obsolete and has to be dropped. And because the data has to be delivered in time, there is no need for reliability. If a packet gets lost or if it is transmitted incorrectly, no sufficient time for a retransmission is available.
22 Third, a multimedia data stream over packet oriented networks is often bursty. The receiver's buffer must not underflow or overflows. Underflow ends up with starving of the application and overflow ends up with packet loss and poor quality.
There are other requirements to a network such as providing multimedia data to more than one receiver, for instance for video conferencing with more than two users, which is also known as multicasting, the packets are copied by the network and not by the application.
2.2.5.2
Multimedia over the Internet
The core Internet protocols like TCP/IP and UDP/IP do not provide any realtime functionality, therefore they are not able to fulfill the requirements needed for transmitting multimedia data. The real-time service that was developed by Integrated Services working group in the IETF, enables IP networks to provide quality of service to multimedia applications. Resource Reservation Protocol (RSVP), together with Real-time Transport Protocol (RTP), Real-Time Control Protocol (RTCP), Real-Time Streaming Protocol (RTSP), which will be explained in details later, provides a working foundation for real-time services. Integrated Services allows applications to configure and manage a single infrastructure for multimedia applications and traditional applications. It is a comprehensive approach to provide applications with the type of service they need and in the quality they choose.
Figure 2-2: Network Stack [11]
23
As it is clearly noticed in Figure 2-2 above, the Integrated Services all use the User Datagram Protocol (UDP) as transport protocol or rely directly on the Internet Protocol (IP).
Although it is also clearly seen that the solution is the RTP protocol layered on the UDP protocol, few thoughts need to be reconsidered a about using the existing and well known protocols TCP or UDP.
2.2.5.3
Multimedia over TCP
TCP is a connection oriented protocol that is reliable including flow control and supports byte-stream in full duplex mode.
Consider that in multimedia applications often multicast methods are used, to serve a couple of clients at the same time. So the connection oriented ability of TCP is not very helpful, allows it only a point to point connection on application level between a server and one client. The multicast method to send data to more than one client is not supported by the TCP protocol.
At the first thought a reliable connection seems a good thing for multimedia. Reliable means, that no single packet will be lost. Expressed in an algorithm it says that the client has to acknowledge every received packed. The client also has to wait for eventually lost packets. This high quality of service costs wait cycles and brings delays with it. Delays are exactly what we do not want in real-time data transmissions. It goes even further; it does not matter if a packet goes lost. For example in a video transmission, the lost packet is just dropped and the previous packet is display a second time. For the human eye this little trick does not affect the quality. So reliable transmission on transport layer is not exactly what is needed for multimedia transmissions.
24 The ability of flow control is a good thing. For sound or video data it is highly important to get the frames in the right order. IP does not guarantee a sequence in delivering the packets from the network, so a higher level protocol has to take care of that. TCP does it.
Byte-stream in full duplex mode sounds like exactly what is needed for realtime application. But with a closer look, for a video broadcast to multiple clients is no full duplex mode needed.
TCP is not a good basis to transport multimedia real-time data over a network, to summarize the usability of TCP. Multimedia demands a constant bit rate even if sometimes a packet gets lost and that is exactly what TCP cannot guarantee.
2.2.5.4
Multimedia over UDP
To explain it in one sentence: UDP is a connectionless protocol that tries best effort delivery with no flow-control and included message support.
UDP is not reliable and that is one of the key advantages for real-time transmissions. On top of that or because of that it is a best effort delivery protocol. That is exactly what is needed for multimedia data. The circumstances that UDP is connectionless does not matter, is the need for connection orientation in multimedia not very important. UDP also supports multicast methods. In the manner of broadcasting a video to more than one client this is exactly what is needed. Multicast simple multiplies the data on a network segment to a group of clients. Imagine this situation without a multicast method. It would be necessary to set up sessions with each client and to multiply the data on application level. So it would generate a multiple of streams on a single network segment. But there are not only good things on UDP. This protocol does not support flow-control and thats actually very
25 important for real-time applications. So UDP is just a good base that goes in the right direction but it lacks a couple of key functions. That is where RTP comes along.
2.2.6
VoIP Solution Components
According to [1], VoIP networks take a different approach to fulfilling the four primary networking functions mentioned earlier, while the major components of a VoIP network ultimately deliver very similar functionality to that of a PSTN. Consequently, VoIP networks can perform all of the same tasks that the PSTN does.
A typical VoIP network has five major components: VoIP phones, consoles, PC applications, and other devices from which end users initiate and receive VoIP calls The call processing server/PBX, which manages all VoIP control connections One or more media/PSTN-to-VoIP gateways, which convert voice content for transport over the IP network The IP network, which transports the audio payload One or more Session Border Controllers (SBCs), which control real-time, session-based traffic at the signaling (call control) and transport layers as it crosses network borders and network domains
With respect to [6], VoIP uses IP as its basic transport method. It utilizes both UDP and TCP protocols over IP and implements packet switched network. No permanent connection is established between the source and destination. Multiple users can take part in a conversation. When a user dials a number using the IP telephone, the telephone adapter will digitize the analog signal, which is then compressed and inserted into data packets. These packets are carried by UDP datagram.
26 This signal then travels through the Ethernet to the VoIP server, which uses an IP PBX. IP PBX has a directory of all phone users and their corresponding IP address. So if it is within the same network it will add correct IP frame of the destination. If not, it will route this call to the gateway that converts the signal to IP packet before sending it to the router, which in turn will route the packet to the destination network through PSTN lines.
2.2.7
VoIP Protocols and Communication Flow
VoIP as a real-time service enables voice conversations through IP networks. According to [16], it is possible to offer IP telephony due to four main groups of protocols:
Signalling protocols that allow to create, modify, and terminate connections between the calling parties currently the most popular are SIP and H.323 Transport protocols the most important is RTP, which provides end-to-end network transport functions suitable for applications transmitting real-time audio. RTP is usually used in conjunction with UDP (or rarely TCP) for transport of digital voice stream, Speech codecs e.g. G.711, G.729, G.723.1 that allow to compress/decompress digitalized human voice and prepare it for transmitting in IP networks. Other supplementary protocols like RTCP, SDP, or RSVP etc. that complete VoIP functionality. For purposes of this project the role of RTCP protocol will be explained. RTCP is a control protocol for RTP and it is designed to monitor the Quality of Service parameters and to convey information about the participants in an ongoing session.
27
Figure 2-3: VoIP Stack and Protocols [43]
Generally, IP telephony connection consists of two phases: a signalling phase and a conversation phase. In both phases certain types of traffic are exchanged between calling parties. Later in this research, a scenario with SIP as a signalling protocol and RTP (with RTCP as control protocol) for audio stream transport will be presented. That means that during the signalling phase of the call certain SIP messages are exchanged between SIP endpoints (called: SIP User Agents). SIP messages usually traverse through SIP network servers: proxies or redirects that allow end-users to locate and reach each other. After this phase, the conversation phase begins, where audio (RTP) streams flow bi-directly between a caller and a callee.
2.2.7.1
Real Time Transport Protocol
The Real-time Transport Protocol (RTP) is formally specified in [30]. There, it is defined as a protocol which provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video. So this protocol can also be used for VoIP applications. VoIP data packets live in RTP packets which are inside UDP-IP packets. As discussed earlier, VoIP doesn't use TCP because it is too heavy for real time applications, so instead a UDP datagram is used.
28 Roberto A. [15] added that UDP has no control over the order in which packets arrive at the destination or how long it takes them to get there. Both of these are very important to overall voice quality and conversation quality. RTP solves the problem enabling the receiver to put the packets back into the correct order and not wait too long for packets that have either lost their way or are taking too long to arrive. This relies on the fact that we don't need every single voice packet, but we need a continuous flow of many of them and ordered.
2.2.7.1.1
The RTP Data Transfer Packet
The format of an RTP data transfer packet is illustrated in Figure 2-4. There are four parts to the packet:
1. 2. 3. 4.
The mandatory RTP header An optional header extension An optional payload header (depending on the payload format used) The payload data itself
Figure 2-4: An RTP Data Transfer Packet [30]
29 The entire RTP packet is contained within a lower-layer payload, typically UDP/IP. The mandatory RTP data packet header is typically 12 octets in length, although it may contain a contributing source list, which can expand the length by 4 to 60 additional octets. The fields in the mandatory header are the payload type, sequence number, time-stamp, and synchronization source identifier. In addition, there is a count of contributing sources, a marker for interesting events, support for padding and a header extension, and a version number.
The RTP header provides the timing information necessary to synchronize and display audio and video data and to determine whether packets have been lost or arrive out of order. In addition, the header specifies the payload type, thus allowing multiple data and compression types. This is a key advantage over most proprietary solutions, which specify a particular type of compression and thus limit users' choice of compression schemes.
2.2.7.2
RTCP
The RTP protocol is accompanied by a control protocol, RTCP. Each participant of a RTP session periodically sends RTCP packets to all other participants in the session. According to [30], RTCP has four functions:
The primary function is to provide feedback on the quality of data distribution. Such information can be used by the application to perform flow and congestion control functions. The information can also be used for diagnostic purposes. RTCP distributes an identifier which can be used to group different streams audio and video for example - together. Such a mechanism is necessary since RTP itself does not provide this information. By periodically sending RTCP packets, each session can observe the number of participants. The RTP data cannot be used for this since it is possible that
30 somebody does not send any data, but does receive data from other participants. For example, this is the case in an on-line lecture. An optional function is the distribution of information about a participant. This information could be used in a user-interface for example.
According to [10], there are several types of RTCP packets which are used to supply this functionality. Sender reports (SR) are used by active senders to distribute transmission and reception statistics. If a participant is not an active sender, it still distributes reception statistics by sending receiver reports (RR). Information which describes a participant is transmitted in the form of source description (SDES) items. There is also a packet type to allow application specific data (APP). Finally, when a participant is about to leave the session, it sends a goodbye (BYE) packet.
The transmission statistics which an active sender distributes, include both the number of bytes sent and the number of packets sent. It also includes two timestamps: a Network Time Protocol (NTP) timestamp, which gives the time when this report was created, and a RTP timestamp, which describes the same time, but in the same units and with the same random offset of the timestamps in the RTP packets.
This is particularly useful when several RTP packet streams have to be associated with each other. For example, if both video and audio signals are distributed, on playback there has to be synchronization between these two media, called inter-media synchronization. Since their RTP timestamps have no relation whatsoever, there has to be some other way to do this. By giving the relation between each timestamp format and the NTP time, the receiving application can do the necessary calculations to synchronize the streams.
A participant to a RTP session distributes reception statistics about each sender in the session. For a specific sender, a reception report includes the following information:
31 The fraction of lost packets since the last report. An increase of this value can be used as an indication to congestion. The total amount of lost packets since the start of the session. Amount of inter-arrival jitter, measure in timestamp units. When the jitter increases, this is also a possible indication of congestion. Information that can be used by the sender to measure the round-trip propagation time to this receiver. The round-trip propagation time is the time it would take a packet to travel to this receiver and back.
The source description items give general information about a participant, like name and e-mail. But it also includes a so-called canonical name (CNAME). This is a string which identifies the sender of the RTP packets. Unlike the SSRC identifier, this one stays constant for a given participant, independent of the current session and it is normally unique for each participant. Thanks to this identifier it is possible to group different streams coming from the same source.
Since these packets are sent periodically by each participant to all destinations, care must be taken not to use too much of the available bandwidth for RTCP packets. The RTCP packet interval is calculated from the number of participants and the amount of bandwidth which RTCP packets may occupy. To prevent that each participant would sent its RTCP packets at the same time, this value is multiplied by a random number.
2.2.7.3
H323 Signaling Protocol
H323 protocol is used, for example, by Microsoft Netmeeting to make VoIP calls. Roberto A. [15] in 2003 describes H323 Signaling Protocol as follows:
32 1. Terminals, clients that initialize VoIP connection. Although terminals could talk together without anyone else, some additional elements for a scalable vision might be needed. 2. Gatekeepers, that essentially operate: a. b. c. 3. 4. 5. address translation service, to use names instead IP addresses admission control, to allow or deny some hosts or some users bandwidth management
Gateways, points of reference for conversion TCP/IP - PSTN. Multipoint Control Units (MCUs) to provide conference. Proxies Server is also used.
H323 allows not only VoIP but also video and data communications. Concerning VoIP, h323 can carry audio codecs G.711, G.722, G.723, G.728 and G.729 while for video it supports H261 and H263.
2.2.7.4
Resource Reservation Protocol
There are also other protocols used in VoIP, like Resource ReSerVation Protocol (RSVP), that can manage Quality of Service (QoS). RSVP is a signaling protocol that requests a certain amount of bandwidth and latency in every network hop that supports it.
Bsiger. C. [11] added that it is the network control protocol that allows data receiver to request a special end-to-end quality of service for its data flows. Realtime applications use RSVP to reserve necessary resources at routers along the transmission paths so that the requested bandwidth can be available when the transmission actually takes place. RSVP is a main component of the future Integrated Services Internet which can provide both best-effort and real-time service.
33 RSVP is used to set up reservations for network resources. When an application in a host (the data stream receiver) requests a specific quality of service (QoS) for its data stream, it uses RSVP to deliver its request to routers along the data stream paths. RSVP is responsible for the negotiation of connection parameters with these routers. If the reservation is setup, RSVP is also responsible for maintaining router and host states to provide the requested service.
The reservation requests are initiated by the receivers. They do not need to travel all the way to the source of the sender. Instead, it travels upstream until it meets another reservation request for the same source stream, then merges with that reservation. The figure below shows how the reservation requests merge as they progress up the multicast tree.
Figure 2-5: RSVP Merge [11]
This reservation merging leads to the primary advantage of RSVP: scalability, a large number of users can be added to a multicast group without increasing the data traffic significantly. So RSVP can scale to large multicast groups and the average protocol overhead decreases as the number of participants increases.
The reservation process does not actually transmit the data and provide the requested quality of service. But through reservation, RSVP guarantees the network resources are available when the transmission actually takes place.
34 2.2.7.5 Real-Time Streaming Protocol
RTSP, the Real-Time Streaming Protocol, is a client-server multimedia presentation protocol to enable controlled delivery of streamed multimedia data over IP network. It provides "VCR-style" remote control functionality for audio and video streams, like pause, fast forward, reverse, and absolute positioning. Sources of data include both live data feeds and stored clips.
RTSP is an application-level protocol designed to work with lower-level protocols like RTP, RSVP to provide a complete streaming service over Internet. It provides means for choosing delivery channels (such as UDP, multicast UDP and TCP), and delivery mechanisms based upon RTP. It works for large audience multicast as well as single-viewer unicast.
RTSP aims to provide the same services on streamed audio and video just as HTTP does for text and graphics. It is designed intentionally to have similar syntax and operations so that most extension mechanisms to HTTP can be added to RTSP.
In RTSP, each presentation and media stream is identified by an RTSP URL. The overall presentation and the properties of the media are defined in a presentation description file, which may include the encoding, language, RTSP URLs, destination address, port, and other parameters. The presentation description file can be obtained by the client using HTTP, email or other means.
But RTSP differs from HTTP in several aspects. First, while HTTP is a stateless protocol, an RTSP server has to maintain "session states" in order to correlate RTSP requests with a stream. Second, HTTP is basically an asymmetric protocol where the client issues requests and the server responds, but in RTSP both the media server and the client can issue requests. For example the server can issue a request to set playing back parameters of a stream.
35 2.2.7.6 Session Initiation Protocol
According to [12] and [13], the Session Initiation Protocol (SIP) is a signaling protocol, widely used for setting up and tearing down multimedia communication sessions such as voice and video calls over the Internet.
SIP, is a text-based protocol with syntax similar to that of HTTP. There are two different types of SIP messages, requests and responses. The first line of a request has a method, defining the nature of the request and a Request-URI, indicating where the request should be sent. The first line of a response has a response code.
According to [12], RFC 3261 defines the following methods for SIP requests:
REGISTER: Used by a UA to notify its current IP address and the URLs for which it would like to receive calls. INVITE: Used to establish a media session between user agents. ACK: Confirms reliable message exchanges. CANCEL: Terminates a pending request. BYE: Terminates a session between two users in a conference. OPTIONS: Requests information about the capabilities of a caller, without setting up a call.
According to [12] also, the SIP response types defined in RFC 3261 fall in one the following categories.
Provisional (1xx): Request received and being processed. Success (2xx): The action was successfully received, understood, and accepted. Redirection (3xx): Further action needs to be taken (typically by sender) to complete the request.
36 Client Error (4xx): The request contains bad syntax or cannot be fulfilled at the server. Server Error (5xx): The server failed to fulfill an apparently valid request. Global Failure (6xx): The request cannot be fulfilled at any server.
2.2.8
Establishing VoIP Connections with H.323
The ITUs choice for establishing VoIP connections, H.323, is a packet-based multimedia communication system. The H.323 specifications define various signaling functions, as well as media formats related to packetized audio and video services.
Generally speaking, the H.323 standards were the first to classify and solve multimedia delivery issues over LAN technologies. As IP networking and the Internet became prevalent, many Internet RFC standard protocols and technologies were developed based on the H.323 concepts.
As Figure 2-6 illustrates, H.323 networks contain three primary solution components:
Call processing servers, which store and apply information on network topologies and endpoints, for routing calls to VoIP gateways and end user devices Media gateways, which serve both as the H.323 termination endpoint and interface with non-H.323 networks, such as the PSTN Gatekeepers, which function as a central unit for call admission control, bandwidth management and call signaling.
37
Figure 2-6: H.323 Call Setup Process [14]
Although gatekeepers are not required elements in H.323, they can increase the networks overall scalability by separating call control and management functions from the gateways.
2.2.9
Establishing VoIP Connections with SIP
Many VoIP networks use the IETFs signaling protocol, SIP, to handle the setup and tear down of multimedia sessions between endpoints. This lightweight, text-based signaling protocol is transported over either Transmission Control Protocol (TCP) or UDP. SIP uses invitations to create Session Description Protocol (SDP) messages to carry out capability exchange and to setup call control channel use. These invitations allow participants to agree on a set of compatible media types.
The powerful SIP client-server application supports user mobility with two operating modes: proxy and redirect. In proxy mode (shown in Figure 2-7 below), SIP clients send requests to the proxy server. The proxy server either handles the requests or forwards them to other SIP servers. Proxy servers can insulate and hide SIP users by proxying the signaling messages. To the other users on the VoIP
38 network, the signaling invitations look as if they are coming from the proxy SIP server.
Figure 2-7: SIP Proxy Operation [14]
When the SIP server is operating in redirect mode (shown in Figure 2-8 below), the SIP client sends its signaling request to a SIP server, which then looks up the destination address. The SIP server returns the destination address to the originator of the call, who uses it to signal the destination SIP client.
Figure 2-8: SIP Redirector Server [14]
39 The ability to proxy and redirect requests to the end users current location is critical to supporting a highly mobile voice user base. SIP enables users to inform the SIP server of their current location (IP address or URL) by sending a registration message to a registrar. As a result, although early VoIP deployments were based on H.323, SIP has become the protocol of choice.
In fact, SIP (as defined in RFC 2543) is the basis for the IP Multimedia Subsystem (IMS) multimedia data and control protocol framework that the IETF is developing in conjunction with the Third-Generation Partnership Project (3GPP). IMS uses SIP and other standard interfaces between applications, network layers, and back-office systems to create a flexible framework that can deliver any kind of traffic voice, data, video, or multimedia over any wireless or wireline access network.
2.3
Steganography
Over the past few years, steganography has been the source of a lot of discussion, particularly as it was suspected that terrorists connected with the September 11 attacks might have used it for covert communications. While no such connection has been proven, the concern points out the effectiveness of steganography as a means of obscuring data. Indeed, along with encryption, steganography is one of the fundamental ways by which data can be kept confidential. This section will offer a brief introductory discussion of steganography: what it is, how it can be used, and the true implications it can have on information security.
2.3.1
Definition
According to [21], steganography is the art of hiding information in ways that prevent the detection of hidden messages. The word "steganography" is
40 of Greek origin and means "concealed writing". In other words, steganography has evolved into the practice of hiding a message within a larger one in such a way that others cannot discern the presence or contents of the hidden message. In present-day terms, steganography has evolved into a digital strategy of hiding a file in some form of multimedia, such as an image, an audio file (like a .wav or mp3) or even a video file.
As one of the effective solutions for Internet-based secure communications, [18] stated that steganography has attracted increasing interest. It provides a secure protection for secret messages by embedding them into digital media and making them inconspicuous and invisible to eavesdroppers. In contrast to the traditional cryptography whose purpose is to hide the content of secret messages being exchanged between the two communicating parties, the purpose of steganography is to hide not only the content but also its very existence. Therefore, it can offer a better security in many ways.
2.3.2
Uses of Steganography
Like many security tools, Pahati, OJ [19] stated that steganography can be used for a variety of reasons, some good, some not so good. Legitimate purposes can include things like watermarking images for reasons such as copyright protection. Digital watermarks are similar to steganography in that they are overlaid in files, which appear to be part of the original file and are thus not easily detectable by the average person. Steganography can also be used as a way to make a substitute for a one-way hash value (which is taking a variable length input and creating a static length output string to verify that no changes have been made to the original variable length input). Further, steganography can be used to tag notes to online images. Finally, steganography can be used to maintain the confidentiality of valuable information, to protect the data from possible sabotage, theft, or unauthorized viewing.
41 Unfortunately, steganography can also be used for illegitimate reasons. For instance, if someone was trying to steal data, they could conceal it in another file or files and send it out in an innocent looking email or file transfer. Furthermore, a person with a hobby of saving pornography, or worse, to their hard drive, may choose to hide the evidence through the use of steganography. And, as was pointed out in the concern for terroristic purposes, it can be used as a means of covert communication. Of course, this can be both a legitimate and an illegitimate application.
2.3.3
History of Steganography
According to Fortini [20], the first description of the use of steganography dates back to the Greeks. Herodotus tells how a message was passed to the Greeks about Xerses' hostile intentions underneath the wax of a writing tablet, and describes a technique of dotting successive letters in a cover text with a secret ink, due to Aeneas the Tactician. Pirate legends tell of the practice of tattooing secret information, such as a map, on the head of someone, so that the hair would conceal it. Kahn tells of a trick used in China of embedding a code ideogram at a prearranged position in a dispatch; a similar idea led to the grille system used in medieval Europe, where a wooden template would be placed over a seemingly innocuous text, highlighting an embedded secret message.
During WWII the grille method or some variants were used by spies. In the same period, the Germans developed microdot technology, which prints a clear, good quality photograph shrinking it to the size of a dot.
There are rumors that during the 1980's Margareth Thatcher, then Prime Minister in UK, became so irritated about press leaks of cabinet documents, that she had the word processors programmed to encode the identity of the writer in the word spacing, thus being able to trace the disloyal ministers.
42 During the "Cold War" period, US and USSR wanted to hide their sensors in the enemy's facilities. These devices had to send data to their nations, without being spotted.
2.3.4
Digital Steganography
According to [21], [22], modern steganography entered the world in 1985 with the advent of the personal computer applied to classical steganography problems. Development following that was slow, but has since taken off, going by the number of 'stego' programs available: hundreds of digital steganography applications have been identified by the Steganography Analysis and Research Center. Digital steganography techniques include:
Concealing messages within the lowest bits of noisy images or sound files. Concealing data within encrypted data. The data to be concealed is first encrypted before being used to overwrite part of a much larger block of encrypted data. Mimic functions convert one file to have the statistical profile of another. This can thwart statistical methods that help brute-force attacks identify the right solution in a ciphertext-only attack. Concealed messages in tampered executable files, exploiting redundancy in the i386 instruction set. Pictures embedded in video material (optionally played at slower or faster speed). Injecting imperceptible delays to packets sent over the network from the keyboard. Delays in key presses in some applications (telnet or remote desktop software) can mean a delay in packets, and the delays in the packets can be used to encode data. Content-Aware Steganography hides information in the semantics a human user assigns to a datagram. These systems offer security against a non-human adversary/warden.
43 Digitally embedding a message into a cover-medium usually involves three basic steps. First, the redundant bits of the target cover-medium must be identified. Second, it must be decided which of the identified redundant bits are to be utilized. Finally, the bits selected for use must be modified to store the message data. In many cases, a cover-medium's redundant bits are likely to be the least-significant bit or bits of each of the encoded data's word values.
2.3.5
Steganography Capacity and Robustness
Robustness is defined by Salomon [46], as a measure of the ability of the algorithm to retain the data embedded in the cover even after the cover has been subjected to various modifications as a result of lossy compression and decompression or of certain types of processing such as conversion to analog and back to digital. Robustness is especially important when the hidden data consists of copyright or ownership information (watermark). In steganography, there is a tradeoff between embedding capacity and robustness. The more robust an algorithm is, the less data it can embed in a given cover.
Signal-to-Noise Ratio (SNR) serves as a measure of invisibility (or its opposite, detectability). In general, high SNR is desirable in communications systems, but a low SNR is ideal for steganography. This is because in steganography, the cover is the noise, while the embedded data is the signal. As a result, low SNR corresponds to low perceptibility.
Figure 2-9: Conflicting Requirements for Data Hiding [46]
44 The major requirements of steganography conflict, so any specific algorithm can satisfy only one or two of them. In particular, embedding capacity, robustness, and undetectability are mutually conflicting and cannot all be achieved by the same algorithm. Figure 2-9 above is a graphical description of the relationships between these three requirements. It shows that naive steganographic methods can achieve large embedding capacity, but at the expense of robustness and undetectability. Advanced algorithms can achieve a high degree of undetectability but offer small embedding capacity and insufficient robustness. Methods for embedding a watermark are normally designed to be robust but result in small embedding capacity and questionable undetectability.
2.3.6
Audio steganography
I)ruid [5] explained that media formats in general, and audio formats specifically, tend to be very inaccurate data formats simply because they do not need to be accurate; the human ear is not very adept at differentiating sounds. As an example, an orchestra performance which is recorded with two separate recording devices will produce vastly different recordings when viewed digitally, but will generally sound the same when played back if they were recorded in a similar manner. Due to this inherent inaccuracy, changes to an audio bit-stream can be made so slightly that when played back the human ear won't be able to distinguish the difference between the cover-medium audio and the stego-medium audio.
With many audio formats, the least-significant bit from each audio sample can be used as the medium's redundant bits for the embedding of message data. When compared to the original 8 bytes of cover-data, it is noticeable that on average only half of the bytes of data have actually changed value; however the resulting stego-data's least-significant bits contain the entire message byte. It is also
noticeable that when utilizing this embedding method with a cover-medium with these word size properties, the cover-medium must be at least eight times the size of the message in order to successfully embed the entire message.
45 Covert communication by embedding a message or data file in a cover medium has been increasingly gaining importance in the all-encompassing field of information technology. Audio steganography is concerned with embedding information in an innocuous cover speech in a secure and robust manner. Communication and transmission security and robustness are essential for transmitting vital information to intended sources while denying access to unauthorized persons. By hiding the information using a cover or host audio as a wrapper, the existence of the information is concealed during transmission. This is critical in applications such as battlefield communications and bank transactions, for example.
As mentioned above, [26] also stated that steganography, in general, relies on the imperfection of the human auditory and visual systems. Audio steganography takes advantage of the psychoacoustical masking phenomenon of the human auditory system.
Psychoacoustical, or auditory masking property renders a weak tone imperceptible in the presence of a strong tone in its temporal or spectral neighborhood. This property arises because of the low differential range of the HAS even though the dynamic range covers 80 dB below ambient level [24, 25]. Frequency masking occurs when human ear cannot perceive frequencies at lower power level if these frequencies are present in the vicinity of tone- or noise-like frequencies at higher level. Additionally, a weak pure tone is masked by wide-band noise if the tone occurs within a critical band. This property of inaudibility of weaker sounds is used in different ways for embedding information. Embedding of data by inserting inaudible tones in cover audio signal has been presented recently. [27, 28]
The secret message is embedded by slightly altering the binary sequence of a sound file. Existing audio steganography software can embed messages in WAV, AU, and even MP3 sound files.
46 Embedding secret messages in digital sound is usually a more difficult process than embedding messages in other media, such as digital images. In order to conceal secret messages successfully, a variety of methods for embedding information in digital audio have been introduced. These methods range from rather simple algorithms that insert information in the form of signal noise to more powerful methods that exploit sophisticated signal processing techniques to hide information. The next section discusses these methods in greater detail.
2.3.7
Digital Audio Signal
[29] explained that digital audio differs from traditional analog sound in that it is a discrete rather than continuous signal. A discrete signal is created by sampling a continuous analog signal at a specified rate. For example, the standard sampling rate for CD digital audio is about 44 kHz. The following, Figure 2-10, illustrates a continuous analog sound wave being sampled to produce digital audio. Note the sinusoidal nature of a sound wave.
Figure 2-10: Audio Signal Coding [29]
The discrete nature of a digital signal is emphasized in the diagram. However, standard sampling rates are usually set at a level where the resultant digital signal is visually indistinguishable from the original analog signal (later illustrations on this site will assume a standard sampling rate was used).
47 Digital audio is stored on a computer as a sequence of 0's and 1's. With the right tools, it is possible to change the individual bits that make up a digital audio file. Such precise control allows changes to be made to the binary sequence that are not discernible to the human ear.
2.3.8
Methods of Audio Steganography
This section presents some common methods used in audio steganography. Many software implementations of these methods are available on the Web and are listed in the Links section. Some of the latter methods require previous knowledge of signal processing techniques, Fourier analysis, and other areas of high level mathematics. Figures and pseudo-code are used in place of exact mathematical formulas in attempts to make the theory more accessible to readers possessing just a basic knowledge of steganography.
2.3.8.1
LSB Coding
Least significant bit (LSB) coding is the simplest way to embed information in a digital audio file. By substituting the least significant bit of each sampling point with a binary message, LSB coding allows for a large amount of data to be encoded. The following diagram illustrates how the message 'HEY' is encoded in a 16-bit CD quality sample using the LSB method:
2.3.8.2
Parity Coding
Instead of breaking a signal down into individual samples, the parity coding method breaks a signal down into separate regions of samples and encodes each bit
48 from the secret message in a sample region's parity bit. If the parity bit of a selected region does not match the secret bit to be encoded, the process flips the LSB of one of the samples in the region. Thus, the sender has more of a choice in encoding the secret bit, and the signal can be changed in a more unobtrusive fashion.
2.3.8.3
Phase Coding
Phase coding addresses the disadvantages of the noise-inducing methods of audio steganography. Phase coding relies on the fact that the phase components of sound are not as perceptible to the human ear as noise is. Rather than introducing perturbations, the technique encodes the message bits as phase shifts in the phase spectrum of a digital signal, achieving an inaudible encoding in terms of signal-toperceived noise ratio.
2.3.8.4
Spread Spectrum
In the context of audio steganography, the basic spread spectrum (SS) method attempts to spread secret information across the audio signal's frequency spectrum as much as possible. This is analogous to a system using an implementation of the LSB coding that randomly spreads the message bits over the entire sound file. However, unlike LSB coding, the SS method spreads the secret message over the sound file's frequency spectrum, using a code that is independent of the actual signal. As a result, the final signal occupies a bandwidth in excess of what is actually required for transmission.
Two versions of SS can be used in audio steganography: the direct-sequence and frequency-hopping schemes. In direct-sequence SS, the secret message is spread out by a constant called the chip rate and then modulated with a pseudorandom
49 signal. It is then interleaved with the cover-signal. In frequency-hopping SS, the audio file's frequency spectrum is altered so that it hops rapidly between frequencies.
2.3.8.5
Echo Hiding
In echo hiding, information is embedded in a sound file by introducing an echo into the discrete signal. Like the spread spectrum method, it too provides advantages in that it allows for a high data transmission rate and provides superior robustness when compared to the noise inducing methods.
To hide the data successfully, three parameters of the echo are varied: amplitude, decay rate, and offset (delay time) from the original signal. All three parameters are set below the human hearing threshold so the echo is not easily resolved. In addition, offset is varied to represent the binary message to be encoded. One offset value represents a binary one, and a second offset value represents a binary zero.
Agaian et al [31] presented two algorithms for digital audio steganography with embedding in the frequency domain and the integer transform domain (QSAS and ITSAS). Experimental results for both methods indicate that the changes in the embedded audio section are inaudible. The QSAS algorithm has lower embedding capacity but has much better SNR values. The ITSAS algorithm is preferred as it is reversible, simple, and efficient with acceptable SNR values. [31] also introduced a capacity measure that can be used to select an audio clip that introduces the least distortion after the embedding process. Gopalan and Wenndt [32] presented a method of embedding covert data in a cover audio signal by insertion of low power tones.
It is believed [29] that the flexibility of audio steganography is what makes it so potentially powerful. The five methods discussed provide users with a large
50 amount of choice and makes the technology more accessible to everyone. A party that wishes to communicate can rank the importance of factors such as data transmission rate, bandwidth, robustness, and noise audibility and then select the method that best fits their specifications. For example, two individuals who just want to send the occasional secret message back and forth might use the LSB coding method that is easily implemented. On the other hand, a large corporation wishing to protect its intellectual property from "digital pirates" may consider a more sophisticated method such as phase coding, SS, or echo hiding.
Cvejic and Seppnen, [45], presented a high bit rate LSB audio watermarking method that reduces embedding distortion of the host audio. Using their proposed two-step algorithm, watermark bits are embedded into higher LSB layers, resulting in increased robustness against noise addition. In addition, listening tests showed that perceptual quality of watermarked audio is higher in the case of the proposed method than in the standard LSB method.
Another aspect of audio steganography that makes it so attractive is its ability to combine with existing cryptography technologies. Users no longer have to rely on one method alone. Not only can information be encrypted, it can be hidden altogether!
2.3.9
Real-Time steganography
This section defines real-time use of steganography as the utilization of steganographic techniques to embed message data within an active, or real-time, media stream.
I)ruid [5] mentioned that nearly all uses of steganography targeting audio cover-medium in general, or VoIP cover-medium specifically, that were evaluated prior to performing this research were found to operate on a target cover-medium as
51 a storage channel and provided separate "hide" and "retrieve" modes. In addition, most cover-medium that were targeted by such implementations were of a static nature such as WAV or MP3 files or were unidirectional such as streaming stegoaudio to a recipient.
2.3.9.1
Context Terminology
The disciplines of steganography and data networking share some common terminology which has different meanings relative to each discipline. This study discusses some researches that lies within the area of both disciplines, and as such will use terms that may be confusing when taken out of context. The following terms are defined here and used consistently without to prevent confusion when interpreting the content of this report [5].
1.
Packet: Used in the data networking sense; A data packet which is routed through a network, such as an IP/UDP/RTP packet.
2.
Message: Used in the steganography logic; Data to be hidden or retrieved.
2.3.9.2
RTP Payload Redundant Bits
RTP packet payloads are essentially encoded multimedia data. RTP payloads may contain any type of multimedia data. However, this research effort focused entirely on audio. Specifically, audio encoded with the G.711 Codec [34]. Any number of audio Codecs can be used to encode the RTP payload, the identifier of which is included in the RTP packet's header as the payload type (PT) field.
The frequency, locations, and number of redundant bits found within the RTP packet's encoded payload are determined by the Codec that is used to encode the audio transmitted by an individual packet. The Codec focused on during this
52 research, G.711, uses a 1-byte sample encoding and is generally resilient to modifications to the least significant bit (LSB) [5] of each sample. Codecs with larger samples may provide for one or more bits per sample to be modified without any discernible audible change in the encoded audio, which is defined as the audio's audible integrity.
2.3.9.3
Audio Word Size
The data value word size, or sample size in audio terminology, used by various audio encoding formats is one factor in determining the amount of available space within the cover-medium for embedding a message. Generally only the least significant bit of each word value can be expected to be modifiable without any perceptible impact to audible integrity. Thus, only half the amount of available space in an audio cover-medium encoded in a format with a 16-bit word size will be available in comparison with a cover-medium with an 8-bit word size.
2.3.9.4
Common VoIP Audio Codecs
For reference, some common VoIP audio Codecs and their encoding and sample properties [35] are listed in the table below.
Table 2-2: Common VoIP Audio Codecs [5]

Codec G.711 G.721 G.722 G.722.1 G.723 G.723.1 G.726 Standard by ITU-T ITU-T ITU-T ITU-T ITU-T ITU-T ITU-T Bit Rate (kb/s) Sample Rate (kHz) Frame Size (ms) 64 32 64 24/32 24/40 5.6/6.3 16/24/32/40 8 8 16 16 8 8 8 Sampling Sampling Sampling 20 Sampling 30 Sampling
53
G.727 G.728 G.729 GSM 06.10 LPC10 Speex (NB) Speex (WB) iLBC DoD CELP EVRC DVI L16 ITU-T ITU-T ITU-T ETSI U.S. Gov variable 16 8 13 2.4 8, 16, 32 8, 16, 32 8 4.8 9.6/4.8/1.2 32 128 Sampling 2.5 10 22.5 22.5 30 34 30 30 20 Sampling Sampling
8 8 8 8 2.15 - 24.6 4 - 44.2 13.3 8 Variable Variable
U.S. DoD 3GPP2 IMA
2.3.9.4.1
G.711
The G.711 audio Codec is a fairly straight-forward sample-based encoding. It encodes audio as a linear grouping of 8-bit audio samples arranged in the order in which they were sampled.
2.3.9.5
Throughput
Utilizing the LSB of every sample in a G.711 encoded RTP payload, which is commonly of 160 bytes in size, a total of 20 bytes of message data can be successfully embedded. Given an average of 50 packets per second unidirectional, this results in approximately 1,000 bytes of full-duplex throughput of message data within the established covert channel.
Recently, techniques with steganography at the core have been employed successfully in covert exchange of information, copyright protection, etc. However, most of the previous studies on steganography are carried out on storage cover media and by contrast the area of steganography in real-time systems is largely unexplored. However, due to their instantaneity, real-time systems can potentially offer better
54 security for hiding secret messages. Therefore, in this study the focus will be on one of the typical real-time communication systems, Voice over IP (VoIP), as a possible carrier to apply steganography to enhance security for secret messages while maintaining good performance for VoIP real-time services.
2.4
VoIP Steganography
Steganography and steganalysis in VoIP applications are important research topics as speech data is an appropriate cover to hide messages or comprehensive documents [33]. In this section, VoIP steganography and its related literature, studies and researches is introduced.
Differing
from
applying
steganography
on
storage
cover
media,
steganography on VoIP must often delicately balance between providing adequate security and maintaining low latency for real-time services [17].
The main motivations for our VoIP-based steganography study are twofold. First, the ongoing conversation of VoIP can offer an ideal camouflage for secret messages, because the voice data is naturally assumed to be the only data carried in a given VoIP channel. Second, a typically short VoIP connection does not give eavesdroppers sufficient amount of time to detect possible abnormity due to hidden messages.
2.4.1
Previous Researches
Much research has been done in the field of steganography utilizing an audio cover-medium. Techniques such as using audio to convey messages in both the human audible and inaudible spectrum as well as various methods for the digital embedding of information into the audio data itself have all been explored; so much
55 in fact that many methods are now considered standard. Many of the most recent implementations cannot be considered to advance the state of research in the area as they generally only implement the standard methods.
It is important to note that the significant majority of previous research in the sub-discipline of audio steganography, however, has focused on static, unchanging audio data files. Many tools are just such implementations, employing standard embedding methods with WAV, MP3, and VOC audio file cover-mediums, respectively. Very few practical implementations have been developed that utilize audio steganography with a cover-medium that is in a flux state or within streaming or real-time media sessions.
A few previous research efforts have been made to employ steganography with various VoIP technologies. A comprehensive analysis of such efforts identified prior to boarding upon the research presented in this study has previously been provided [39]. In summary, most identified research efforts were utilizing steganographic techniques but not achieving the primary goal of steganography or otherwise employing steganographic techniques to accomplish an otherwise overt goal.
From the literature [36], [37], [38], some researchers have noticed the advantages of and carried out useful studies on steganography over VoIP. Wang et al. [36] proposed a scheme for transmitting secret speeches based on information hiding in VoIP systems. Their hiding process consists of two steps: compressing the secret speeches and then filling their binary bits directly into the LSBs of cover speech coded with G.711. Dittmann et al. [38] presented a more general scheme of steganography over VoIP, which can be used to transmit arbitrary secret messages. However, both of these implemented steganography techniques only directly replace the LSBs of the cover speech with the binary bits of secret messages, which is vulnerable to detection by the steganalysis algorithm subsequently proposed by Dittmann et al. [38]. Their steganalysis algorithm is based on the fact that the distribution of the LSBs in the stego-speech is not uniform, which can detect directly
56 embedded messages with a success rate of approximately 98.60%.Therefore, Kratzer et al. [37] suggested that messages be encrypted prior to embedding to improve security. Motivated by this view, they later proposed a scheme that introduces the cryptographies (i.e. Twofish, Tiger) for embedded messages [37]. However, the encryption operation must be carried out offline before the embedding operation, because the adopted cryptographies are often time-consuming and incur delays that may in turn degrade the speech quality drastically. Therefore, this method is not well suit for the real-time exchanging of secret messages. In fact, for the real-time covert communication it is necessary to strike an acceptable balance between providing adequate security and maintaining low latency for real-time services. In addition, the authors of [37] assumed that the two communicating parties must share the same knowledge of the used key, but did not reveal how the key is distributed, which is actually a crucial component for covert communication systems.
Figure 2-11: Hidden Communication Scenarios for VoIP [43]
Szczypiorski and Mazurczyk [40] provided a comprehensive evaluation of available steganographic techniques for SIP/SDP that can be used for creating covert channels during signaling phase of call. All provided solutions are based on network steganography as they utilized free or unused fields in abovementioned protocols. In [40] the proved that the total amount of information that may be transferred with use
57 of proposed solutions is more than 2000 bits in one direction for each performed VoIP call.
Szczypiorski et al. [16], [43] provided new insights by presenting two new techniques. The first one is network steganography solution which exploits free/unused protocols fields and is known for IP, UDP or TCP protocols but has never been applied to RTP and RTCP which are characteristic for VoIP. The second method, called LACK (Lost Audio Packets Steganography), provides hybrid storagetiming covert channel by utilizing delayed audio packets. The third one is HICCUPS (Hidden Communication System for Corrupted Networks) which is a generic steganographic framework for wireless LAN which can be used in voice over wireless LAN (VoWLAN) environments. Obtained results showed that during typical VoIP call, it is possible to send covertly more than 1.3 Mbits of data in one direction.
In [42], new, lightweight security and control protocol for Voice over Internet Protocol (VoIP) service is presented. It is the alternative for the IETFs (Internet Engineering Task Force) RTCP (Real-Time Control Protocol) for real-time applications traffic. It uses two information hiding techniques: steganography to create covert channel in which the header (control bits) are passed and digital watermarking to transmit the actual data (parameters value) in voice stream. The most important advantages of this solution are no consuming of available bandwidth, providing security, parameters to monitor QoS and network status in one protocol.
Krtzer. C. et al [37] summarized the design principles from the general approach and introduce extended experimental test results of a VoIP framework including a steganographic channel. They showed that using this framework it is largely secure to transmit hidden messages during a VoIP session and demonstrate results with respect to perceptibility for music and speech data.
58 2.5 Summary
In conclusion, steganography is a fascinating and effective method of hiding data that has been used throughout history. Audio steganography in particular addresses key issues brought about by the need for a secure communication scheme that can maintain the secrecy of the transmitted information, even when passing through insecure channels. Although real-time, particularly VoIP, steganography is relatively complicated, previous researches and studies have eased the mission to find the most suitable, practical and efficient techniques to implement a VoIP steganography prototype with maximum capacity.
Due to the lack of researches and studies that discuss the possibilities to improve current and new released VoIP steganography techniques with respect to capacity enhancements, a broad study should be made in order to find out the best method of real-time steganography model for VoIP that provides good security and data capacity without sacrificing real-time performance. This goal could be achieved by employing well-designed approaches to provide a reasonable tradeoff between the adequate information hiding requirement (good security and sufficient capacity) and the low latency requirement for VoIP.
CHAPTER 3
RESEARCH METHODOLOGY
3.1
Introduction
Research could be defined as a human activity based on the intellectual investigation and aimed at discovering, interpreting and revising human knowledge on different aspects of the world. The objective of research methodology is to provide a standard method and guidelines to ensure that project is completed on time and is conducted in a disciplined, well-managed, and consistent manner that serves to promote the delivery of quality products and results.
This research consists of two major tasks. The first task is to analyze, study related VoIP steganography mechanisms and differentiate them in a scientific comparison with respect to performance, Covert data capacity and feasibility. The second task, is to introduce a simple prototype benefiting from the previous task and focusing on enhancing data capacity of covert data in VoIP steganography
This research methodology chapter provides an understanding of how this project will be conducted and organized in order to obtain information that could be helpful for studying the capacity enhancements in VoIP steganography and developing a simple lab-based prototype. It presents the research approach, research phases and procedure, data used in the analysis, and prototype development procedure.
60 3.2 Research Approach
Different approaches can be taken such as deductive or inductive and quantitative and qualitative approach. Deductive research starts with existing theories and concepts and formulates hypotheses that are subsequently tested; its advantage point is received theory. Inductive research starts with real-world data, and categories, concepts, patterns, models, and eventually, theories emerge from this input. After the initial stages, all types of research become iteration between the deductive and the inductive. This sometimes referred to as adductive research.
The qualitative and quantitative methods refer to the way one chooses to treat and analyze the selected data. Selectivity and distance to the object of research characterize a quantitative approach, whereas a qualitative approach is characterized by nearness to the object of research. Both approaches have their strengths and weaknesses and neither one of the approaches can be held better than the other one. The best research method to use for a study depends on that study's research purposes and the accompanying research questions [41].
In quantitative approach results are based on numbers and statistics which are shown in figures. In qualitative approach focus lies on describing an event with the use of words. Which approach to choose depends on the problem definition together with what kind of information is needed. The two approaches are used for their suitability and will also be used together.
This research begins by studying existing theories, mechanisms and techniques relating to research problem area which will be later compared with reality. So this research is mostly deductive. By recalling the purpose of this study which is to introduce possible improvements on VoIP steganography to enhance its covert data capacity, it is believed that only quantitative approach is found to be more suitable for the purpose of this research. The quantitative approach is characterized by studying few variables and equations related to research subject.
61 3.3 Research Phases and Procedure
This project is going to be carried out according to chart process as illustrated in Figure 3-1. The research procedure is divided into three phases. Firstly, studying and analyzing existing VoIP steganography techniques. Secondly, prototype design, development and testing. Finally, suggesting further capacity enhancement. Each phase has its own activity which will be explained in the following sections and chapters.
Figure 3-1: Research Phases
3.3.1 Study and Analyze
In chapter 2, Literature Review, the data, research, papers and studies related to VoIP steganography are reviewed and analyzed. The following sections describe this phase in detail.
3.3.1.1 Data Collection
Yin (2003) [47] stated, the six most commonly used sources for data collection are documentation, archival records, interviews, direct observation,
62 participant observation and physical artifacts. In this study, the only source of evidence that is considered valuable is documentation.
The relevant data that is collected are mainly of qualitative nature, due to the chosen unit of analysis. Documents which include papers, researches and books are important in the data collection phase due to their overall value. The required information extracted from these sources together with previous researches results, will be organized, and then compared to each other.
For the purpose of this research project, many literatures, books, articles, journals and white papers which are related generally to VoIP steganography and especially to payload enhancements have been collected, studied and analyzed comprehensively to determine the possible solution for the project problem mentioned early in Chapter 1.
3.3.1.2 Analysis
In this phase, a review of previous researches will be done to get a comprehensive view about the mechanisms of the VoIP Steganography and how they can be used and improved to enhance the payload capacity of covert data. This analysis is going to include and specify any and each advantage or disadvantages related to each mechanism.
Many VoIP steganography mechanisms are studied and analyzed in this phase. Although there are few studies and researches in this field, other attempts that are done to analyze such mechanisms will be taken in consideration. This will help to shorten the time needed to analyze VoIP steganography mechanisms and also, helps to get different points of views which will enrich the final outcome of this project. This analysis also included all protocols, algorithms and environments closely related to steganography of VoIP stream and the enhancement of its capacity.
63 While the literature review chapter above showed the lack of researches and studies which focus in this field, it also proves that there is a good chance for improving and enhancing current VoIP mechanisms and techniques to get better capacity of covert data.
3.3.2 Prototype Design and Implementation
In this section, we will shortly discuss the prototype design and implementation phase which constitute the major practical part. Next chapter will illustrate this phase in details.
3.3.2.1 Requirement Specification
In this step, every requirement needed to analyze current mechanisms and to implement the suggested prototype tool will be discussed. Defining requirements to establish specifications is the first step in the development of the proposed prototype. However, in many situations, not enough care is taken in establishing correct requirements up front. This causes problems when ambiguities in requirements surface later in the life cycle, and more time and money is spent in fixing these uncertainties
There is a distinct difference between requirements and specifications. A requirement is a condition needed by a user to solve a problem or achieve an objective. A specification is a document that specifies, in a complete, precise, verifiable manner, the requirements, design, behavior, or other characteristics of a system, and often, the procedures for determining whether these provisions have been satisfied.
64 It is necessary that requirements are documented in a systematic way to ensure their accuracy and completeness, although, it is not always an easy mission. This difficulty in establishing good requirements often makes it more of an art than a science. The difficulty arises from the fact that establishing requirements is a tough abstraction problem and often the implementation gets mixed with the requirements. For the purpose of this research project the following requirements will be needed:
i. ii. iii. iv.
Two desktop/laptop computer. Windows operating system, XP SP 2 Professional Edition or above. LAN network with two free IP addresses and ports. High capabilities programming environment like Microsoft Visual
Studio.Net, namely C#. v. vi. Microsoft DirectX 9.0 or later. Microsoft office 2003 package or above to write the report.
3.3.2.2 Prototype Development
A prototype typically simulates only a few aspects of the features of the eventual program, implementation. and may be completely different from the eventual
The purpose of a prototype is to allow users of the software to evaluate developers' proposals for the design of the eventual product by actually trying them out, rather than having to interpret and evaluate the design based on descriptions. Prototyping can also be used by end users to describe and prove requirements that developers have not considered. As illustrated in Figure 3-2, this prototype development is proposed ultimately to fulfill the following tasks:
65
Figure 3-2: Prototype Main Tasks
3.3.2.2.1 Achieve Steganography
As stated in Section 2.3, the primary goal of steganography is to hide the fact that communication is taking place. Therefore, it is the primary goal of this
reference implementation to prevent indication to a third-party observer of the VoIP audio stream that anything other than the overt communication between the two VoIP call endpoints is taking place.
3.3.2.2.2 Full-Duplex Communications Channel
The prototype implementation intends to achieve a full-duplex covert communication channel between the two VoIP endpoints, mirroring the utility of VoIP calls itself. This will be accomplished through the use of either RTP or UDP streams that comprise an RTP session. By utilizing both RTP/UDP streams within the session, either application will be able to both send and receive data simultaneously.
66 3.3.2.2.3 Compensate for Unreliable Transport
The prototype development intends to compensate for the unreliable transport inherent to RTP and UDP. This may be accomplished by providing data sequencing, tracking, synchronization or resending mechanism.
3.3.2.2.4 Multi-type Data Transfer
This reference implementation could be used to provide simultaneous transfer of multiple types of data, such as text chat, file transfer, and remote shell access. This will be accomplished by providing type indication and formatting for each type of supported data being transferred. For the purpose of this project, only text chat will be considered.
3.3.2.2.5 Maximum Payload with Minimum Detection
The ultimate goal of this prototype development is to enhance the capacity of covert data without affecting the invisibility or the security of VoIP steganography.
3.3.2.3 Testing the Results
Although this project will be build on an ideal environment which based on dedicated LAN to provide almost zero noise, many statistical tests will be held on each techniques used in this project. The results of these tests will help us to understand which technique is better in terms of capacity, invisibility and performance. These tests include the following statistics:
67 i. ii. iii. iv. Signal-to-Noise Ratio (SNR) Average Deviation Secret Data to Voice Data Ratio Receiving Time
3.3.3 Suggestions for Capacity Enhancement
By studying and analyzing the results of prototype development and testing phase as well as the comprehensive literature review, the suggestions to enhance the capacity of VoIP steganography would be achievable. These suggestions include improvements in present techniques, modifying their algorithms or introducing new techniques. This phase will be discussed in details in the following chapters.
3.4
Summary
In this chapter, the methodology used in implementing this project was discussed. The workflow explained above will be followed in a systematic and consistent way. Each phase of the project procedure phases plays an important role in accomplishing this project. Although almost all phases completely depend on the outcomes of previous phases, it is also possible that some phases overlap with others and some may be performed simultaneously.
CHAPTER 4
PROTOTYPE DESIGN AND IMPLEMENTATION
4.1
Introduction
In this chapter, we will discuss in detail the design and implementation of VoIP steganography prototype. The prototype is designed and implemented according to several consecutive phases. This chapter is organized to illustrate these phases which are: VoIP call, data embedding and extraction, waveforms drawing, and statistical results presenting.
Figure 4-1: Prototype Design Phases
The VoIP steganography prototype is designed and implemented according to the above figure. The general architecture of VoIP steganography prototype is shown in Figure 4-2 below. Each covert data transmitted during VoIP call will be processed and embedded according to this architecture.
69 Sender
Secret Data
Capture voice from Mic
Voice Sampling
Embedding
UDP Packeting
IP Network Channel
Receiver
Secret Data
Send voice to speakers
Voice Generating
Extracting
UDP Receiving
Figure 4-2: Prototype General Architecture
4.2
VoIP Call
At the beginning of building VoIP steganography prototype, it is obvious that we have to start with implementing VoIP call. There are many ways to implement a VoIP call, most known are client-server based and point-to-point based. As mentioned earlier in the scope of this research, this prototype will be built on a pointto-point environment. The VoIP call in this prototype consists of three steps: initializing call, performing the conversation, and ending the call. Each one of these steps has its own massages that are transferred between sender and receiver. The following sections explain these steps in detail.
4.2.1
Initialize VoIP Call
The initialization of VoIP call starts when one endpoint (the sender) send an INVITE message to the other endpoint (the receiver) using receiver's IP address. The other endpoint receives this INVITE message and has the choice either to accept this
70 invitation or to not accept it as shown in Figure 4-3. In the case that the receiver does not accept the invitation, a BUSY message will be transferred back to the sender who will end the call session accordingly.
Figure 4-3: Receiver may accept or reject INVITE message
If the receiver accepts the invitation, an OK message will be sent back to the sender. By receiving the OK message, the initialize step is done and then both parties will start the next (conversation). The overall VoIP call process is explained in Figure 4-4.
INVITE OK
Sender
Conversation
Receiver
. .
BYE
Figure 4-4: Messages of VoIP Call
71 The following code of the function Initialize is the responsible part of defining voice call initialization parameters such as buffers, voice format and bits per sample. It also prepares every needed component before making a voice call between sender and receiver.
private void Initialize() { try { device = new Device(); device.SetCooperativeLevel(this, CooperativeLevel.Normal); CaptureDevicesCollection captureDeviceCollection = new CaptureDevicesCollection(); DeviceInformation deviceInfo = captureDeviceCollection[0]; capture = new Capture(deviceInfo.DriverGuid); short channels = 1; //Stereo. short bitsPerSample = 16; //16Bit int samplesPerSecond = 22050; //22KHz //Setting up the wave format to be captured. waveFormat = new WaveFormat(); waveFormat.Channels = channels; waveFormat.FormatTag = WaveFormatTag.Pcm; waveFormat.SamplesPerSecond = samplesPerSecond; waveFormat.BitsPerSample = bitsPerSample; waveFormat.BlockAlign = (short)(channels * (bitsPerSample / (short)8)); waveFormat.AverageBytesPerSecond = waveFormat.BlockAlign * samplesPerSecond; captureBufferDescription = new CaptureBufferDescription(); captureBufferDescription.BufferBytes = waveFormat.AverageBytesPerSecond / 5;//approx 200 ms of PCM data. captureBufferDescription.Format = waveFormat; playbackBufferDescription = new BufferDescription(); playbackBufferDescription.BufferBytes = waveFormat.AverageBytesPerSecond / 5; playbackBufferDescription.Format = waveFormat; playbackBuffer = new SecondaryBuffer(playbackBufferDescription, device); bufferSize = captureBufferDescription.BufferBytes; bIsCallActive = false; nUdpClientFlag = 0; //Using UDP sockets clientSocket = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp); EndPoint ourEP = new IPEndPoint(IPAddress.Any, 1450); //Async listen to port 1450 for coming messages (Invite, Bye, etc) clientSocket.Bind(ourEP); //Receiving data from any IP. EndPoint remoteEP = (EndPoint)(new IPEndPoint(IPAddress.Any, 0)); byteData = new byte[1024]; //Receivnge data asynchornously. clientSocket.BeginReceiveFrom(byteData, 0, byteData.Length, SocketFlags.None, ref remoteEP, new AsyncCallback(OnReceive), null); } catch (Exception ex) { MessageBox.Show(ex.Message, "Initialize()", MessageBoxButtons.OK, MessageBoxIcon.Error); } }
72 After initializing voice call parameters and components, a voice call is established by sending a simple message from the sender to the receiver asking for call acceptance. This is accomplished by executing the following command.
SendMessage(Command.Invite, otherPartyEP);
Where otherPartyEP parameter is the endpoint IP address and port number of the receiver.
The receiver will receive an invitation fro the sender which could be accepted or rejected. The voice call will not be completely established unless the receiver accepts the invitation from the sender. If so, the voice call initialization step is completed and the VoIP conversation session is started.
4.2.2
VoIP Conversation
This is the actual VoIP call in which voice and embedded data will be transferred. After an OK message being sent back to the sender, both parties will start send and receive voice signals between each other simultaneously. In this prototype, this is being done by the using multithreading. Two dedicated threads (Send and Receive) are built to accomplish simultaneous bi-directional conversation. The following sections will explain these threads in detail.
4.2.2.1
"Send" Thread
The "Send" thread will be used by both end points to send voice signals to the other end point. Figure 4-5 shows the tasks of the "Send" thread which are:
73 i. Capturing voice signals from microphone and sampling them into PCM codes. This task is done by the help of Microsoft DirectSound. ii. Compress the 16 bits PCM voice samples into 8 bits using G711 audio codec algorithm. This step is done manually without using the codec itself. iii. In the case of hiding secret data inside voice samples, the "Send" threat does this by the help of a separate embedding function. iv. Packet voice data into UDP packets and send them to the receiver.
Start
Capture Voice
Sample voice into 16bits PCM
Compress voice data into 8bits
Embedding?
Yes Embed Secret Data into Voice Data
No
UDP Packetting and Sending
Figure 4-5: Flowchart of "Send" Threat
4.2.2.2
"Receive" Thread
Just like the "Send" thread, the "Receive" thread will be used by both end points to receive voice signals from the other end point. By default, the operation and
74 order of "Receive" thread will be the opposite those of the "Send" thread. Figure 4-6 shows the tasks of the "Receive" thread which are:
i. ii.
Receive UDP Packets from the sender and extract the voice data. If there is hidden data embedded inside the voice data, the "Receive" thread extracts hidden data by the help of a separate extracting function.
iii.
Decompress the received 8 bits G711 data into 16 bits PCM voice samples using G711 audio codec algorithm. This step is also done manually without using the codec itself.
iv.
Regenerate audio signals from voice data and send them to speakers. This task is done by the help of Microsoft DirectSound. In the meanwhile, the extracted hidden data (if any) will be displayed on the GUI of the receiver.
Start
Extract Voice data from UDP Packets
Extracting?
Yes Extract Secret Data from Voice Data
No Decompress voice data to 16 bits PCM
Regenerate Voice signals from PCM
Send Voice to Speakers
Figure 4-6: Flowchart of "Receive" Threat
75 4.2.3 Ending VoIP Call
When one of the communicating parties press the "End Call" button in the GUI, a BYE message will be sent to the other party which means the VoIP call should be terminated. When any party receives or sends a BYE message, it will call the "Uninitialize" function which terminates VoIP call session and disposes all used variables.
4.3
Embedding and Extracting
For the purpose of this project, the techniques used to embed data into VoIP stream are variants of LSB steganography. Although this method is the simplest way to hide data in any medium and fails against many attacks and steganalysis techniques, it is an efficient and practical technique to be used in such simple prototype. In this prototype, the 1 LSB, 2 LSBs, 3 LSBs and 4 LSBs will be used to hide data separately and will be examined in order to study the tradeoff between the capacity of covert data in one hand and the invisibility and quality of VoIP call in the other hand. Because the algorithm used to embed and extract data using 2 LSB, 3 LSBs, 4 LSBs are almost the same as in 1 LSB; the following sections only explain the 1 LSB embedding and extracting process used in this prototype.
4.3.1
Embedding Process
This function is the responsible of hiding data into voice signal data. It embeds data using 1 LSB, 2 LSBs, 3 LSBs and 4 LSBs depending on user choice through GUI. The 1 LSB part of this function is working according to the following algorithm:
Step 1: Get secret data and convert it to an array of bits A1.
76 Step 2: Get a portion of voice data and save it into an array of bytes A2. Step 3: Insert starting bytes (0,255,0,255,0) into voice data array. Step 4: Insert 1 bit of secret data array into LSB of voice data array byte. Step 5: Increment indexes of A1 and A2. Step 6: repeat Step 4 and 5 until end of A1. Step 7: Insert ending bytes (255,0,255,0,255) into voice data array.
Figure 4-7 explains how this replacement of 1 LSB is being done.
Voice Data Array (A2)

LSB
Secret Data Array (A1)
Figure 4-7: Replacement of 1 LSB in Embedding Function
Figure 4-8 below shows the flowchart of the 1 LSB part of embedding function.
77
Start
Convert secret data into array of bits (A1)
Get voice data array (A2)
Insert Start bytes into (A2)
Yes End of (A1)?
No
Increment Indexes of (A1) and (A2)
Embed 1 bit of (A1) into LSB of one byte of (A2)
Insert End bytes into (A2)
Return (A2) , End
Figure 4-8: Flowchart of Embedding Process (1 LSB)
4.3.2
Extracting Process
This function is the responsible of extracting secret data from received voice signal data. It also extracts secret data from 1 LSB, 2 LSBs, 3 LSBs and 4 LSBs depending on user choice through GUI. The 1 LSB part of this function is working according to the following algorithm:
Step 1: Get the received voice signal and save it into an array of bytes A2.
78 Step 2: Search the received voice data for the starting bytes (0,255,0,255,0). Step 3: Extract the LSB of one byte of voice data array and save it into an array of bits A1. Step 4: Increment indexes of A1 and A2. Step 5: Repeat Step 4 and 5 until finding the ending bytes (255,0,255,0,255). Step 6: Recombine array A1 into array of characters. Step 7: Display the result (the array of characters) on the GUI.
Figure 4-9 below shows the flowchart of the 1 LSB part of embedding function.
Start
Convert secret data into array of bits (A1)
Get voice data array (A2)
Insert Start bytes into (A2)
Yes End of (A1)?
No
Increment Indexes of (A1) and (A2)
Embed 1 bit of (A1) into LSB of one byte of (A2)
Insert End bytes into (A2)
Return (A2) , End
Figure 4-9: Flowchart of Extracting Process (1 LSB)
79 4.4 Voice Waveforms Drawing
This section explains the methods used in the prototype to draw the waveforms of input and output audios through microphone and speakers respectively. Although these drawing are not essential part of the VoIP steganography prototype, it is very useful to graphically illustrate the total process of VoIP call and steganography. Two of these drawing shows both input and output signals as well dynamically. Another drawing shows the sent audio signal before and after the embedding process. The last drawing is very useful to notice the difference between the original signal before embedding and the modified signal (after embedding) which gives a good idea about the limits of payload capacity that should not be exceeded in order to preserve voice quality. Figure 4-10 shows samples of these drawings which are produced during embedding stage in VoIP call.
Figure 4-10: Screenshot of Waveform Drawing of VoIP Prototype
The bottom black part of these waveforms is drawn dynamically during VoIP call and the above gray part is drawn once only during embedding phase of VoIP steganography process.
80 The following is the code of the function responsible of drawing audio wave form. More than one versions of this function are used in this prototype with some modifications.
public void RenderTimeDomainOut(byte[] VoiceData) { // Set up for drawing Bitmap Mycanvas3 = new Bitmap(pictureBox3.Width, pictureBox3.Height); Graphics offScreenDC = Graphics.FromImage(Mycanvas3); Pen pen = new System.Drawing.Pen(Color.WhiteSmoke); // Determine channnel boundries int width1 = Mycanvas3.Width; int center1 = Mycanvas3.Height / 2; int height1 = Mycanvas3.Height; int Top1 = 0; int Right1 = width1; int Bottom1 = height1; // Draw audio channel double yCenter1 = (Bottom1 - Top1) / 2; double yScale1 = 0.5 * (Bottom1 - Top1) / 255; int xPrev1 = 0, yPrev1 = 0; int xAxis = 0, VDcnt = 0; int DIndex = 0; while (VDcnt < VoiceData.Length && xAxis < Right1) { int yAxis; int sign = (VoiceData[VDcnt] & 0x80) >> 7; //If the number is negative, make it positive (now it's a magnitude) if (sign != 0) { yAxis = -((VoiceData[VDcnt] ^ 0xFF) & 0x7F); } else { yAxis = VoiceData[VDcnt] & 0x7F; } yAxis = (int)(yCenter1 + (yAxis * yScale1)); if (xAxis == 0) { xPrev1 = 0; yPrev1 = yAxis; } else { pen.Color = Color.LimeGreen; offScreenDC.DrawLine(pen, xPrev1, yPrev1, xAxis, yAxis); xPrev1 = xAxis; yPrev1 = yAxis; } VDcnt += 1; xAxis += 1; } // Clean up pictureBox3.Image = Mycanvas3; offScreenDC.Dispose(); }
81 4.5 Presenting Results
Results of the VoIP call as well as embedded and extracted secret data are presented in the GUI of the prototype as text information. Other results like the input and output waveforms are presented also on GUI as drawings. Statistical results of the VoIP steganography process is displayed on the prototype main form as well. There are also some middle data could be presented like the voice data array and the array of bits of the secret data.
Figure 4-11: Screenshot of Prototype GUI Showing Call Results
4.6
Summary
During this chapter, we discussed the design and the implementation of VoIP steganography prototype. We started with designing and implementing VoIP call
82 which is the basis of our communication media. After that we moved to the embedding and extracting methods used in this prototype. It is mentioned that only a small number of simple techniques were used in this prototype, mainly LSB technique. Then we explained the waveform drawing functions which is used to draw the waveforms of the input and output waveforms before and after embedding. Lastly we explained what the results of this prototype are and how they are presented. After that, the process of handling database from and to the program is illustrated in this chapter. At the end, the final graphical user interface has been shown describing its contents briefly.
CHAPTER 5
RESULTS AND DISCUSSION
5.1
Introduction
After designing and implementing VoIP steganography prototype, the overall all performance and detailed outcomes of this prototype is tested and recorded. This chapter will discuss and analyze the final results and findings derived from reviewing literature related to this project as well as from the outcomes of the implemented prototype.
5.2
Current Relevant Steganography Techniques
Based on the literature review performed earlier, its is clear that there are many audio and network steganography algorithms that could be used in VoIP stream. Although many of these algorithms are not tested yet in practice with real VoIP application, many others have been applied and tested in experimental environments.
Starting with the famous LSB steganography technique, which has been used since the beginning of digital steganography and with almost every multimedia file, it is obvious that this technique provides a high capacity and low complexity. When using single LSB, the total ratio of secret data to the voice data will be 1/8 which
84 considered as a high rate. It is also possible to use the first and second LSBs which will double the capacity of the covert signal. By increasing the number of LSBs used to hide secret data, the capacity will be increased significantly while the voice distortion and probability of detection will be increased. Some modifications and techniques are suggested to minimize the possibility of secret message detection such as encryption, hashing and randomization.
There are also many audio steganography techniques that are applicable to VoIP covering time domain, frequency domain and integer transform domain. The time domain algorithms provide less complexity and higher capacity than others, while other domains focus on the robustness and invisibility.
Because VoIP is simply audio signal transmission using IP networking, it is possible also to apply network steganography techniques during VoIP call session. It is possible to use redundant, unused and reserve bits of protocols packets to hide some data. There are also some new algorithms such as LACK and HICCUPS which are introduced to facilitate network steganography of VoIP applications.
Many of current audio and network steganography algorithms are theoretically applicable to VoIP streams. In practice only some of them are feasible or applicable to VoIP taking in mind available equipment and technology. There are also some techniques which are too complex to be applied on VoIP stream which needs real time embedding and extraction of secret data.
As a summary, Table 5-1 below lists all studied steganography techniques that might be used in VoIP steganography system. This table also provides a comparison among these techniques.
85 Table 5-1: Summary Comparison among VoIP Steganography Techniques

No 1 Method LSB Type Audio Description Compress data then hide it Capacity 1/8 of audio data 1/8 of audio data N.A. Notes 20 bytes for 160 bytes RTP payload 1000 bytes/sec 20 bytes for 160 bytes RTP payload 1000 bytes/sec Increase robustness against noise lower capacity but much better SNR values
LSB Higher LSB layers QSAS ITSAS Phase coding Spread Spectrum (SS) Echo Hiding Free or unused bits LACK HICCUPS
Audio
Encrypt and hide Data bits are embedded into higher LSB layers Frequency domain Integer transform domain Hiding data by shifting phase according to the secret data Spread secret information across the audio signal's frequency spectrum as much as possible Information is embedded in a sound file by introducing an echo into the discrete signal IP, UDP, TCP, RTP and RTCP protocols utilizing delayed audio packets Used in voice over wireless LAN (VoWLAN)
Audio
4 5 6 7
Audio Audio Audio Audio
N.A. N.A. N.A. N.A.
Audio
N.A. 2000 bits for each VoIP call N.A. N.A.
network Network (timing) Network
10 11
5.3
Testing Approaches and Methods
For the purpose of testing, each LSB steganography algorithm is tested many times. Each test uses different secret data with different lengths. The waveforms and statistical results of each test are recorded and analyzed. Another testing is by sending a different secret message with different sizes using each one of LSB algorithms in this prototype. The result of each algorithm test are then analyzed and compared. Generally, two testing approaches are used with this project prototype. The program performance is tested first and then the testing experiments are carried out.
86 5.3.1 Program Performance
This approach is used to test the prototype's overall behavior and its functionality to ensure that the program is executable and performs its tasks as desired. This testing approach includes checking GUI and commands, functions and messages to make sure that every component is working and doing the required tasks. The prototype user interface is shown in Figure 4-11 above. A detailed usecase procedures are implemented to investigate the performance of the prototype in all cases. Another way to ensure the performance of this prototype is by monitoring its response to incorrect input. These misuse-cases are used to validate the error handling of this prototype.
After embedding secret data into VoIP call stream, the waveform diagram of both signals before and after embedding are displayed simultaneously in the same graph with different colors. This will help to visualize the difference between the
two signals and to realize the amount of change made to the original VoIP call signal. Furthermore, the statistical values of each VoIP steganography process are shown in the main for just after completing the embedding process.
After performing above performance tests, the behavior of the prototype will be considered as normal and trusted. It will be also clear that the prototype functionality meets the expectations and it is resilient to unexpected events and inputs. This prototype program is proven to be executable and performs its tasks as expected.
5.3.2
Results Listing and Analyzing
In this testing approach, which is the most important one because it checks whether this prototype meets the predefined objectives or not. This approach is done
87 by performing many experiments then test and analyze each and every result according to the prototype objectives. Statistical results are also counted in this test.
The measures used are visual and statistical measures. Statistical measures include Time interval needed to receive, extract and display secret data, the number of bits transmitted, number of VoIP call sample or bytes used to embed such secret data, the signal-to-noise ratio SNR, and average deviation. The calculations of these statistical data are done to every VoIP steganography technique used in this prototype. The statistical results of each calculation are then compared to others.
Before we start doing experiments and comparing results, it is better to have a look on the measures used to evaluate these LSB VoIP steganography techniques:
i.
Time Used: this measure calculates the total time that the receiver needs to extract the complete secret data from the received VoIP call stream signal. The time interval starts when the receiver host accepts the starting bytes and it ends when the receiver accepts the ending bytes. This time measure does not include the time needed to transfer the VoIP signal from the sender to the receiver. It also excludes the period of time spent by the receiver before it find the starting bytes.
ii.
Used Voice Segments: is the total number of VoIP stream packets needed to hide the desired secret data. This measure totally depends on the size of the secret message as well as the size of the streaming and networking packet.
iii.
Average deviation: is the average ratio of the secret data portion embedded into one byte of VoIP call data to the value of this byte. This measure illustrates the total rate of noise introduced to the original VoIP call data due to secret data embedding process. It depends on the technique, voice signal strength and secret data type and value. The following equation is used to evaluate average deviation (AvDev):
88 Where: AvDev = average deviation. n = number of VoIP call data bytes used to hide the secret data. Si = value of the secret data hided in the VoIP call data byte i. Vi = value of the VoIP call data byte I used to hide secret data value Si.
5.3.2.1
Experiment to hide sample information (<50 bytes)
This experiment is done by sending a small text message (<50bytes) using each LSB techniques and then record statistical results and waveform outputs. The results of this experiment were as follows:
Figure 5-1: Screenshot of Prototype Call Waveforms
In figure 5-1 above, the green arrows point to the starting signals of each LSB technique waveform while the dark red arrows point to the ending signals. The secret
89 data is embedded between these two arrows and the receiver checks and extract secret data only from the signals between them. The red waveforms are the original voice signal before embedding while the blue waveforms are the modified voice signals after embedding the secret data. From above graphs, it is obvious that the time needed to send a secret message using 4 LSBs is one forth of time needed to send the same secret message using only 1 LSB.
It can be noticed also that the difference between red and blue waveforms in the 4 LSBs graph is greater than the difference between these waveforms in 1 LSB graph. This means that 1 LSB technique produce less distortion to the original voice stream than 4 LSBs technique. The initial findings of the result of this experiment is summarized in the following table.
Table 5-2: Initial comparison among LSB techniques
Measure Capacity Secret Data / Voice Data Time interval Average deviation
1 LSB lowest 1/8 longest 0.78%
2 LSBs lower 1/4 longer 1.56%
3 LSBs higher 3/8 shorter 3.13%
4 LSBs Highest Shortest 6.25%
5.3.2.2
Experiment to Hide Sample Information (100bytes)
This experiment is done by sending a sample 100bytes text message using each LSB technique and then record statistical results of each technique. The secret data sent by the sender is totally and correctly received by the receiver using any of the four targeted techniques. The results of this experiment and statistical records are summarized in Table 5-3 below.
90 Table 5-3: Result of hiding 100bytes of secret data
Measure Secret Data Bits Secret Data Bytes Time Used Used Voice Segments Average Deviation
1 LSB 800 100 4 1
2 LSBs 800 100 0 1
3 LSBs 801 100 0 1
4 LSBs 800 100 0 1
0.007109 0.0142429 0.039231 0.075223
In figure 5-1 above, it is proved that using 4 LSBs techniques to hide secret data in a VoIP stream offers more capacity but with more distortion rate. Because the size of the secret message used in this experiment is small, only one VoIP signal segment is required to hide this data in all LSB techniques. Looking at the 801 value in first row crossing 3 LSBs column, the total number of secret data bits is not exact. This is because not all numbers that are multiples by 8 could be divided by 3. Although, the total number of secret data bytes is correct. The differences, pros and cons of each technique are clearly noticeable.
5.3.2.3
Experiment to Hide 1KB Sample Information
Using 1KB of secret data, the four LSB techniques are benchmarked again. The secret data sent by the sender is totally and correctly received by the receiver using any of the four targeted techniques. The results of this experiment and statistical records are summarized in Table 5-4 below.
91 Table 5-4: Result of hiding 1KB of secret data
1 LSB 8192 1024 237 4
2 LSBs 8192 1024 108 2
3 LSBs 8193 1024 51 2
4 LSBs 8192 1024 0 1
0.007447 0.019293
0.056067 0.068524
5.3.2.4
Experiment to Hide Large (4KB and 8KB) Sample Information
The process is repeated again using larger secret data (4KB and 8KB). The differences between results of each LSB technique are now clearer than before. The results of this experiment and statistical records are summarized in Table 5-5 and Table 5-6 below.
Table 5-5: Result of hiding 4KB of secret data
1 LSB 32768 4096 1378 15
2 LSBs 32768 4096 700 8
3 LSBs 32769 4096 382 5 0.03364
4 LSBs 32768 4096 223 4 0.082939
0.006822 0.013514
92 Table 5-6: Result of hiding 8KB of secret data
1 LSB 65536 8192 2908 30
2 LSBs 65536 8192 1368 15
3 LSBs 65538 8192 837 10
4 LSBs 65536 8192 710 8
0.004482 0.01468
0.066338 0.06109
The graphs that describe all of the above experiments are shown below:
Figure 5-2: Time Used Measure
93
Figure 5-3: Used VoIP Segments Measure
Figure 5-4: Average Deviation Measure
From above graphs and tables, it is clear that each one of these techniques has its pros and cons in terms of time required to send and receive the secret, the difference between the voice signals before and after the embedding process, and the total number of voice segments required to send the desired secret message.
94 5.4 Limitations of LSB Steganography Technique
The advantages of LSB steganographic technique are that it is simple to understand, easy to implement, and it results in voice stream that contain hidden data yet appear to be of high aural fidelity. Many steganographic applications are built on this technique taking in mind the high capacity of hidden data that it offers. However, it can be shown that under certain conditions, LSB embedding is not secure at all. The fatal drawback of this technique is that it is very vulnerable to many attacks. For example, LSB technique does not take well to the VoIP stream changing like audio being replaced, disordered or distorted with large noise as this can destroy the hidden message.
In the other hand, LSB technique has its limitations. One of these limitations is the need to prior agreement on which LSB technique will be used before actual transmission of secret data. Another limitation is due to the size of the medium being used to hide the data. In order for LSB steganography technique to be useful, the message should be hidden without any major changes to the VoIP stream it is being embedded in. This leaves limited room to embed a message without noticeably changing the original voice stream. In our case, 1LSB technique is the best in terms of invisibility. 2LSB, 3LSB and 4LSB in order have higher chance of noticeability. LSB techniques use more than 4LSB will be very vulnerable to easily aural detection.
5.5
Suggested Steganography Techniques
The balance and tradeoff between capacity, robustness, invisibility and complexity is the reason behind researches being made from time to another. Any new VoIP steganography technique or algorithm is introduced to enhance one or more of these critical characteristics of any steganography system.
95 As a part of my contribution in this research, I introduce two techniques, which to my best knowledge have not used before, to be used in future VoIP steganography application. The first one is Multiplexing technique, in which the secret data will be multiplexed with the VoIP audio signal. For each N of VoIP audio bytes, we insert additional byte of secret data to the stream. If we choose N=3 bytes, we will add one byte of secret data after every 3 bytes of voice data which will give us a secret data capacity of 1/4 which considered a very high rate in steganography. Although this technique decreases the invisibility of secret message and also increases the rate of distortion in the overall VoIP audio stream, it gives a very high capacity compared to other techniques.
The second suggested technique is Mixed technique in which both audio and network steganography types are used. Any current audio technique (such as LSB) is used to hide secret data in VoIP call stream. This will provide a good capacity and low complexity but with low robustness. The robustness could be enhanced by hashing secret data embedded into any single packet and save the hash value in the unused/redundant bits of used network protocols. This technique will not affect the capacity of used audio steganography algorithm. In the other hand it will increase the robustness by providing hashing, numbering and error detection. Below, Table 5-7 summarizes these two suggested techniques.
Table 5-7: Suggested VoIP Techniques
Method Multiplexing
Type Audio Mixed (Audio and Network)
Description Multiplexing secret data with the audio data Hide data in LSB of audio while using network unused bits for hashing
Capacity 1/4 of audio data 1/8 of audio data
Notes Increases capacity Increases robustness
Mixed
96 5.6 Summary
In this chapter, prototype results and research findings are was discussed briefly. The chapter started by providing the literature review analysis of present VoIP, audio and network steganography techniques. This chapter also suggested two VoIP steganography techniques to help improving embedded data capacity and robustness. The final results and findings generated from the prototype program are then discussed in detail.
CHAPTER 6
CONCLUSION AND FUTURE WORK
6.1
Introduction
At the end of this project, it is clear that VoIP stream is a very interesting and promising area to hide secret data within especially for its real-time nature and for the huge rate of data that can be hided. As shown in previous chapters, this lab-based prototype was perfect in term of extracting 100% of the hidden data which may be different from real-life application. Implementing real-life VoIP steganography system is a harder mission taking in mind the complexity of the network environment, multiple noise sources, time delay and many other things.
The main idea behind choosing this topic and building such a prototype is to enhance the study the available improvement of in the capacity of the secret data embedded within VoIP call stream. The limited number of researches in this area makes adds more difficulty in term. The fact that there is only a limited number of researches in this topic gave me motivation to choose this project in order to add my contribution to this newly studied field.
There are a number of studies that are made to suggest and developed new techniques in VoIP steganography from time to another, although, most of these theoretical techniques are either practically unfeasible or very hard to implement in real life. In this chapter the ultimate conclusion and summary of the project will be introduced with some advices and suggestions for future works and researches.
98 6.2 Project Summary and Conclusion
LSB technique of VoIP steganography was tested and evaluated throughout the phases of this research, which proved that the rate and speed of transferring secret data during VoIP call is very large. VoIP is considered a very effective and practical medium to hide data and transfer it secretly between two parties. Although this research is made for academic and educational purposes, VoIP steganography is a perfect method to be used by bad guys in many ways including spying, hacking, malware transferring and others. The studies theoretical VoIP steganography techniques could be utilized to hide or transfer illegal data or files.
6.3
Meeting Research Objectives
The ultimate goal of this project or any project in general, is to meet the predefined project objectives. In Section 1.5 the project objective were clearly
declared. This chapter will explain how to make sure that the objectives are met by testing the results for this research.
Throughout chapter 2 and from the testing and results explained in chapter 5, the first objective of study VoIP steganography, analyze its current techniques and discuss their performance in terms of capacity was met.
The detailed illustration in chapters 3 and 4 of the implemented prototype, the perfect results discussed in chapter 5 and the live demonstration of the system prove that the second objective of implementing a simple lab-based system that performs VoIP steganography using one or more techniques to illustrate the VoIP steganography process was met too.
The third objective was met by explaining the suggested improvements of the VoIP steganography which were discussed in chapter 5. The improvement of
99 payload capacity shown in the results of the implemented prototype also proves that this objective is totally met.
6.4
Project Contribution
The main contribution of this project is developing a prototype of VoIP steganography that is totally build from the ground up to support steganography and challenge the capacity of embedding information in VoIP steganography, by using many LSB techniques. Few steganography prototypes have been built for academic researches, but most of them, if not all, used readymade VoIP libraries that handle everything needed to establish and run a VoIP call. These previous prototypes focus only on how to add steganographic features and techniques to the publicly released VoIP libraries or codes. One of the most used VoIP libraries is Jori's Voice over IP library (JVOIPLIB) which was used in [33] and [37]. My implemented prototype proves that it is now possible to embed large amount of secret data in a VoIP call stream with the minimum rate of detection and distortion, maximum speed and almost without affecting total VoIP call quality.
Depending on the final results and discussion of the implemented prototype and the literature review analysis, I introduced two new techniques that help to improve capacity of VoIP steganography as well as robustness.
6.5
Future Work
Implementing VoIP prototype and reviewing relevant literature open new horizons and bring many ideas to improve such system. This prototype was focused on LSB steganography and the suggested improvements to enhance the capacity of its payload data. Although, there are other VoIP steganography techniques that to my best knowledge have not been implemented in practice yet. This availability of
100 theoretical untested techniques makes VoIP steganography an interesting field for further study and improvement.
Changing the network environment from the ideal network environment used in this research into a real-life network environment with client-server based connection will be a huge leap toward building a practical VoIP steganography system. Another modification is by using nicknames instead of IP addresses to perform VoIP call which will enhance the call security, increase the portability and preserve users' identifications.
Using both network steganography and audio steganography will increase the overall capacity of hidden data and decrease the time interval needed to transfer a specific amount of secret data from one host to another. As discussed in chapter 5, implementing a VoIP steganography system with the capability of compressing secret data before embedding will greatly increase the capacity of the VoIP steganography. Using hashing or encryption will greatly increase the robustness such system.
Using some synchronization, hashing and encryption mechanisms based on real-time systems will greatly increase the robustness, enhance the flexibility of the covert communication system and accurately recover secret messages at the receiver side. Improved silence detection mechanisms might be used to enhance embedding transparency. This silence detection might be expanded to cover also analogue silence, a step which might further improve the transparency but requires hardware dependent calibration.
6.6
Summary
In this chapter, the final summary and conclusion of this project report were discussed. It is started with an introduction demonstrates the significance of this
101 study. Then a comprehensive summary is illustrated about the project from the start till the last objective of the research. Suggested future works and improvement are also presented in this chapter. The objectives of this research project were proved to be met.
102
REFERENCES
1.
International Engineering Consortium. Voice over Internet Protocol. Definition and Overview. http://www.iec.org/online/tutorials/int_tele/. White Paper. Accessed May 5, 2009.
2.
Juniper Networks, Inc. (May 2007). Voice over IP 101: Understanding the Basic Networking Functions, Components, and Signaling Protocols in VoIP Networks. CA, US.
3.
US. Department of Justice. (May 2007). Voice over Internet Protocol http://www.ncjrs.gov/pdffiles1/nij/217864.pdf. Accessed May 7, 2009.
4.
Altera. (September 2000). Implementing Voice over Internet Protocol. Application Note 128. Ver. 1.1.
5.
I)ruid. Real-time Steganography with RTP. (September 2007). p. 5-15 http://www.uninformed.org/?v=8&a=3&t=pdf. Accessed May 7, 2009.
6.
Social And Ethical Issues In Telephony Voice over IP. http://www.cse.ohiostate.edu/~koppal/601/601Paper-VoIP.pdf. Accessed May 21, 2009.
7.
Genwright. A Short History of VoIP. http://www.fsb-media.com/pdf/article2296.pdf. Accessed May 18, 2009.
8.
http://www.whichvoip.com/voip/articles/voip_history.htm. Accessed April 22, 2009.
9.
Endler, D., Collier, M. Hacking Exposed VoIP: Voice Over IP Security Secrets & Solutions, McGraw-Hill/Osborne, 2007, ISBN: 9780072263640.
10.
Liesenborgs, J. Voice over IP in networked virtual environments. Master Thesis. http://research.edm.uhasselt.be/jori/thesis/onlinethesis/chapter5.html. Accessed April 3, 2009.
11.
Bsiger, C., Schilt, P. (2000). Polyphemus Project Book. Biel School of Engineering and Architecture Computer Science Department.
103 12. Rosenberg, et al. RFC 3261, SIP: Session Initiation Protocol. June 2002 http://tools.ietf.org/html/rfc3261. Accessed May 5, 2009. 13. Porter, T. Gough, M., (2007). How to Cheat at VolP Security, Syngress Publishing, Inc. ISBN 13:978-1-59749-169-3 14. Cole, E. Hiding in Plain Sight: Steganography and the Art of Covert Communication, Wiley Publishing, Inc., 2003. ISBN: 0-471-44449-9 15. 16. Roberto, A. (August 2002). VoIP Howto. V1.7. (C) Roberto Arcomano. Szczypiorski, K., Mazurczyk, W. (2008). Steganography of VoIP Streams. White paper. Warsaw University of Technology, Poland.
http://www.springerlink.com/index/060t41w2r608h181.pdf. 17. Kristy, W. Steganography Revealed. (April 2003).
http://www.securityfocus.com/infocus/1684 Accessed April 2, 2009. 18. Tian, H. et al. An M-Sequence Based Steganography Model for Voice over IP. White paper. http://ponca.unl.edu/facdb/csefacdb/TechReportArchive/TRUNL-CSE-2008-0007.pdf 19. Pahati, OJ. (2001). Confounding Carnivore: How to Protect Your Online Privacy. AlterNet. Archived from the original on 2007-07-16. Accessed September 2, 2008. 20. Fortini, M. Steganography and Digital Watermarking: A Global View. http://www.lia.deis.unibo.it/Courses/RetiDiCalcolatori/Progetti98/Fortini/hist ory.html. 21. Johnson N., Jajodia S. Exploring Steganography: Seeing the Unseen. George Mason University. February 1998. IEEE 22. Steganography Analysis and Research Center (SARC). Steganography Application Fingerprint Database (SAFDB). http://www.sarc-
wv.com/safdb.aspx. Accessed April 2, 2009. 23. Leyden, J. Hidden messages buried in VoIP chatter: Skype covert channel confounds snoops. The Register. June 2008,
http://www.theregister.co.uk/2008/06/03/voip_steganography/ 24. Bender, W. et al. Techniques for Data Hiding. IBM Systems Journal, Vol. 35, Nos. 3 & 4, pp. 313-336, 1996. 25. Swanson M. et al. Multimedia Data-Embedding and Watermarking Technologies. Proc. IEEE, Vol. 86, pp. 1064-1087, June 1998.
104 26. Gopalan, K. and Wenndt, S. Audio Steganography for Covert Data Transmission by Imperceptible Tone Insertion. GopalanKali_422_049 27. Gopalan, K. et al. Covert Speech Communication Via Cover Speech By Tone Insertion. Proc. of the 2003 IEEE Aerospace Conference, Big Sky, MT, Mar. 2003 (on CD). 28. Thurston, R. Steganography developers turn their attention to hiding information in VoIP. SC Magazine. July 2008. 29. http://www.snotmonkey.com/work/school/405/overview.html. April 5, 2009. 30. Schulzrinne H. et al. RTP: A Transport Protocol for Real-Time Applications. RFC 1889, 1996 31. Agaian, S. et al. Two Algorithms In Digital Audio Steganography Using Quantized Frequency Domain Embedding And Reversible Integer Accessed
Transforms. University of Texas at San Antonio. USA 32. Gopalan. K. and Wenndt. S. Audio Steganography for Covert Data Transmission by Imperceptible Tone Insertion. US.
http://www.calumet.purdue.edu/engr/docs/GopalanKali_422_049.pdf 33. Kraetzer. C. and Dittmann. J. Mel-Cepstrum Based Steganalysis for VoIPSteganography. Security, Steganography, and Watermarking of Multimedia Contents IX. Bellingham, Washington. : ISBN 978-0-8194-6618-1 34. Telecommunication Standardization Sector, International Telecommunication Union ITU_T. General Aspects of Digital Transmission Systems Terminal Equipments. Pulse Code Modulation of Voice Frequencies. ITU-T Recommendation G.711. ITU 1993 35. VoipForo - codecs. http://www.voipforo.com/en/codec/codecs.php, 2009. Accessed April 5, 2009. 36. Wang, C. and Wu, Q. Information hiding in real-time VoIP streams, in Proc. 9th IEEE Int. Symposium on Multimedia, pp. 255-262, 10-12 Dec. 2007. 37. Kratzer, C. et al. Design and evaluation of steganography for voice-over-IP, in Proc. of 2006 IEEE Int. Symposium on Circuits and Systems, pp. 23972340, 21-24 May 2006. 38. Dittmann, J. et al. Steganography and steganalysis in voice over IP scenarios: operational aspects and first experiences with a new steganalysis
105 tool set. In Proceedings of SPIE, vol. 5681, Security, Steganography, and Watermarking of Multimedia Contents VII, March 2005, pp. 607-618. 39. I)ruid. An Analysis of VoIP Steganography Research Efforts.
http://druid.caughq.org/papers/An-Analysis-of-VoIP-SteganographyResearch-Efforts.pdf. September 2007. 40. Szczypiorski. K. and Mazurczyk .W. (2008). Covert Channels in SIP for VoIP Signalling. Warsaw University of Technology, Poland.
http://www.springerlink.com/index/q122877q01005027.pdf. 41. Gummesson E. (2000), Qualitative methods in management research. Sage Publications. Thousand Oaks. CA. US. 42. Mazurczyk. W. and Kotulski. Z. New security and control protocol for VoIP based on steganography and digital watermarking. Poland.
http://www.ippt.gov.pl/~zkotulsk/IBIZA_2006_en.pdf 43. Lubacz, J. et al. (2008). Hiding Data in VoIP. Warsaw University of Technology, Poland. http://www.asc2008.com/manuscripts/B/BP-13.pdf 44. Mazurczyk, W. and Szczypiorski, K. (2006). New VoIP Traffic Security Scheme with digital Watermarking. Lecture notes in computer science. ISSN: 0302-9743. 45. Cvejic, N. and Seppnen, T. Increasing Robustness of LSB Audio Steganography by Reduced Distortion LSB Coding. Finland.
http://www.mediateam.oulu.fi/publications/pdf/618.pdf 46. Salomon, D. (2005). Coding for Data and Computer Communications. Springer Science + Business Media. ISBN: 0-387-21245-0 47. Yin R. (2003). Case study research: Design and Methods. Third edition. Sage publications. Inc. 48. Heindl, E. Mobile Network. E- Business Technology. http://webuser.hs-
furtwangen.de/~heindl/ebte-09ss/Mobile-network.pdf. Accessed June 9, 2009 49. Johnson, N. et al. (2003). Information Hiding: Steganography and Digital Watermarking, Third Edition. Kluwer Academic Publisher.

Abdulaleem Zaid Mohammed MFSKSM2010

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abdulaleem Zaid Mohammed MFSKSM2010

Uploaded by

Copyright:

Available Formats

PROTOTYPE DEVELOPMENT OF VOIP STEGANOGRAPHY

ABDULALEEM ZAID MOHAMMED AL-OTHMANI

Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia

2-1 2-2 5-1 5-2 5-3 5-4 5-5 5-6 5-7

Background of the Problem

The objectives of this project are:

8 2.2 Voice over Internet Protocol

Historical Background of VoIP

Public Switched Telephone Network

Basic PSTN and VoIP Network Functions

Call Connection and Audio Transport Mechanisms

15 2.2.3.3.4 CODEC Operations

Table 2-1: ITU Encoding Standards [2]

Circuit-Switched Networks vs. Data Networks

Figure 2-1: Packetization of Voice Traffic. [4]

17 2.2.4 VoIP Benefits and Challenges

Disadvantages and Challenges of VoIP

Susceptibility to Power Interruption

Multimedia over Packet Oriented Networks

Multimedia over the Internet

Figure 2-2: Network Stack [11]

Multimedia over TCP

Multimedia over UDP

VoIP Solution Components

VoIP Protocols and Communication Flow

Figure 2-3: VoIP Stack and Protocols [43]

Real Time Transport Protocol

The RTP Data Transfer Packet

Figure 2-4: An RTP Data Transfer Packet [30]

H323 Signaling Protocol

Resource Reservation Protocol

Figure 2-5: RSVP Merge [11]

34 2.2.7.5 Real-Time Streaming Protocol

35 2.2.7.6 Session Initiation Protocol

Establishing VoIP Connections with H.323

Figure 2-6: H.323 Call Setup Process [14]

Establishing VoIP Connections with SIP

Figure 2-7: SIP Proxy Operation [14]

Figure 2-8: SIP Redirector Server [14]

Steganography Capacity and Robustness

Figure 2-9: Conflicting Requirements for Data Hiding [46]

Digital Audio Signal

Figure 2-10: Audio Signal Coding [29]

Methods of Audio Steganography

Message: Used in the steganography logic; Data to be hidden or retrieved.

RTP Payload Redundant Bits

Audio Word Size

Common VoIP Audio Codecs

Table 2-2: Common VoIP Audio Codecs [5]

8 8 8 8 2.15 - 24.6 4 - 44.2 13.3 8 Variable Variable

U.S. DoD 3GPP2 IMA

Figure 2-11: Hidden Communication Scenarios for VoIP [43]

60 3.2 Research Approach

61 3.3 Research Phases and Procedure

Figure 3-1: Research Phases

3.3.1 Study and Analyze

3.3.1.1 Data Collection

3.3.2 Prototype Design and Implementation

3.3.2.1 Requirement Specification

i. ii. iii. iv.

3.3.2.2 Prototype Development

Figure 3-2: Prototype Main Tasks