Ijetae 0413 14

International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)
77

Source Codec for Multimedia Data Hiding
K. Somasundaram
1
, M. Sukumar
2
, K. P.Vigneshkumar
3

1,3
M.Tech-Information Technology, SNS College of Engineering, Coimbatore
2
B.Tech-Information Technology, University College of Engineering, Tindivanam

Abstract The communication system can be achieved
based on steganography in stream media such as LSB based
(Least Significant Bit) methods are the most popular
strategies. The steganography algorithm for embedding data
in the active and inactive frame audio streams encoded by
XOR source codec. This is used extensively in channel. Then
LSB embedding in the secret information flat regions of the
speech. This method divides the two frames, each representing
0 and 1 respectively. The active and inactive frames of
audio are more suitable for data embedding. The
steganography in the active and inactive audio frames attains
a larger data embedding capacity the same imperceptibility.
An improved voice activity detection algorithm is suggested
for detecting active voice and inactive voice. For each frame,
the VAD function will return a 0 and 1 indication to indicate
whether speech is active voice and inactive voice Experimental
results show our proposed steganography algorithm not only
achieved perfect imperceptibility but also gained the integrity
of hidden messages in the case of packet loss.
Keywords- Adaptive Steganography, Block Steganography,
Steganalysis, Conver Communication, VAD
I. INTRODUCTION
Information hiding is the technology of embedding
secret message into ordinary cover-object. The output of
such operation is called stegano object, which is
transmitted through the channel. The receiver can extract
the secret message from the stegano object. This
technology hides not only the content of the message, but
also the existence of the transmission. Information hiding
techniques have recently become important in the number
of application areas. Digital audio, video, and pictures are
increasingly furnished with distinguish but imperceptible
marks. Military communication systems make increasing
use of the traffic security techniques which, rather than
merely concealing the content of a message using
encryption, seek to conceal its sender, its receiver, or its
very existence. Similar techniques are used in some mobile
phone systems for digital elections.
Criminals try to use whatever traffic security properties
are provided intentionally or otherwise in the available
communications systems, and police forces try to restrict
their use. However, many of the techniques proposed in
this young and rapidly evolving field can trace their history
back to antiquity, and many of them are surprisingly easy
to circumvent.
In this article, we try to give an overview of the field, of
what we know, what works, what does not, and what are
the interesting topics for research.
Many researches on audio based information hiding
have been reported, most of which are in audio files in high
speed formats like WAV and MP3. However, information
hiding in low bit-rate audio signal like compressed speech
in Voice over IP (VoIP) is still an emerging problem. Low
bit-rate codec has certain signal processing models as well
as specific bit stream definition and codebooks. These
restrictions might be the reasons that slow down the
research of information hiding in low bit-rate speech. Also
there are a couple of aspects that should be considered. The
first is the requirement for real time communication. In
other words, delay is not tolerable to be enlarged,
especially if the cover speech is instantly spoken out rather
than previously recorded. The speech is usually segmented
into frames of 20ms or so, thus the message should also be
embedded frame by frame. The second is robustness. The
method of Least Significant Bit (LSB) might be efficient,
but not able to survive low bit-rate compression. If the
secret message is embedded in the original signal before
the low bit-rate compression, there would be a probability
of bit error during extraction. Hence directly embedding in
the bit stream of low bit-rate speech is probably a better
solution. It is more challenging to embed in low bit-rate
speech because the redundancy in the original waveform is
eliminated by the parameter model based coding.
Generally speaking, most low bit-rate speech codec are
based on LPC model, which uses auto regressive filter to
predict current sample value. Such filter is called LPC
analysis filter. The reversed LPC analysis filter is called
LPC synthesis filter. The output of the former, or say the
input of the later is called residual or excitation. Both the
LPC filter coefficients and the residual are encoded and
transmitted. In speech coding the Analysis-by- Synthesis
(ABS) method is also widely used, which minimizes the
distortion by means of decoding the encoded signal and
choosing the codeword of least error. Taking the advantage
of ABS, the distortion caused by embedding in the vector
quantization index of LPC coefficients is decreased
adaptively in the encoding of the residual. Our solution
employs Quantization Index Modulation (QIM) method in
the LPC (or LPC derived) field.

78

In each index one secret bit is embedded. The essence of
QIM method is to divide the whole codebook into two parts
and assign a label of 0 or 1 to every codeword.
When a secret bit is embedded, only the corresponding
part of the codebook is used. On the receiving side, the
hidden bit is extracted by checking which part of the whole
codebook the codeword belongs to. On condition that the
channel is reliable, the receiver is able to directly extract
the message from the compressed speech stream.
Meanwhile the embedding algorithm only searches in half
of the codebook rather than the entire one, so it does not
cause additional delay. However, since the number of code
words used in quantization is lessened, the distortion is
increased. In order to lighten the distortion, one of the most
important tasks is to find an ideal codebook partition
scheme. Voice activity detection (VAD) refers to the ability
of distinguishing speech from noise and is an integral part
of a variety of speech communication systems, such as
speech coding, speech recognition, hands-free telephony,
audio conferencing and echo cancellation. In the GSM-
based wireless system for instance a VAD module are used
for discontinuous transmission to save battery power.
Similarly a VAD device is used in any variable bit rate
codec to control the average bit rate and the overall coding
quality of speech. In wireless systems based on code
division multiple accesses. This scheme is important for
enhancing the system capacity by minimizing interference.
In early VAD algorithms, short-term energy, zero-crossing
rate and LPC coefficients were among the common
features used for speech detection. Formant shape and
least-square periodicity measure are some of the more
recent metrics used in VAD designs.
A set of metrics including line spectral frequencies
(LSF), low band energy, zero-crossing rate and full-band
energy is used along with heuristically determined regions
and boundaries to make a VAD decision for each 10 ms
frame. Higher-order statistics (HOS) have shown promising
results in a number of signal processing applications, and
are of particular value when dealing with a mixture of
Gaussian and non- Gaussian processes and system
nonlinearity. The application of HOS to speech processing
has been primarily motivated by their inherent Gaussian
suppression and phase preservation properties. Work in this
area has been based on the assumptions that speech has
certain HOS properties that are distinct from those of
Gaussian noise.

While previous work in the area of speech analysis such
as detection voicing classification or pitch estimation, have
attempted to exploit some of the observed features of the
HOS of speech signals, little has been done in providing an
analytical framework for using these cumulates in a
voiced/unvoiced detector using the bispectrum is developed
and based on the observation that unvoiced phonemes are
produced by a Gaussian-like excitation and thus result in a
small bispectrum whereas the same is not true for voiced
phonemes. In a method based on Gaussianity tests for the
bispectrum and the triple correlation is used to discriminate
voiced and unvoiced segments. The method exploits the
Gaussian blindness of HOS but not the peculiarities of the
HOS of voiced speech to better classify the segments.
In the normalized skewness and kurtosis of short-term
speech segments are used to detect transitional speech
events (termed innovation), based on the observation that
these two statistics take on nonzero values at the
boundaries of speech segments, but no Analytical ground is
given to support the results. In a pitch estimation method
based on the periodicity of the diagonal slice of the third
order cumulate is described and yields more reliable pitch
estimates than the autocorrelation but the claim of the third-
order cumulate slice having similar periodicity as the
underlying speech is not clearly demonstrated. A robust
VAD algorithm based on newly established HOS properties
of speech. The first part the characteristics of the third - and
fourth-order cumulates of the LPC residual of speech
signals. The flat spectral envelope of this residual results in
distinct characteristics for these cumulates in terms of
phase, periodicity and harmonic content and yields closed
form expressions for the skewness and kurtosis. It is
shown, in the case of voiced speech, that these cumulates
have zero-phase, a similar harmonic nature as the
underlying speech and harmonic amplitudes that are a
function of speech energy.
The expressions for the skewness and kurtosis of voiced
speech show they may be expressed in terms of speech
energy and that the normalized metrics have values that are
greater than zero, regardless of the speech magnitude. In
addition, experimental results show that while sustained
unvoiced speech has zero HOS, it is seldom the case in
real-life utterances, given that unvoiced segments are short
and occur at transitional speech boundaries resulting in
nonzero HOS.

79

The properties and experimental findings thus
established show that the HOS of speech are in general
nonzero and sufficiently distinct from those of Gaussian
noise to be used as a basis for speech detection. The
statistics are immune to Gaussian noise make them a set of
robust metrics that are particularly effective in low SNR
conditions.
The second part of the HOS properties of speech thus
established and presents a new VAD algorithm that
combines HOS metrics with classical second-order
measures to classify short frames as speech or noise. A
necessary condition for voicing is derived based on the
relation between the skewness and kurtosis of voiced
speech. The practical issues related to HOS analysis such
as the bias and variance of the estimators is addressed.
Using the white Gaussian assumption about noise in the
LPC residual, a new unbiased estimator for the kurtosis is
proposed and the variances of the HOS estimators are
derived and expressed in terms of the underlying process
variance (i.e., the noise energy). Knowledge of these
variances allows quantifying the noise likelihood of a given
frame given the values of these two estimates. The
algorithm is tested using a variety of noise types and
different SNR levels and its performance compared to the
ITU-T G.729B VAD. To quantify performance, the
probability of correctly classifying speech and noise frames
as well as the probability of false classification are
computed by making references to truth marker files in
clean speech conditions. To compute these metrics and
generate the noisy speech test cases, we used the material
in the TIA database proposed for the evaluation of VAD
algorithms.
Eighty test cases were used, with each case consisting of
a different combination of speech normalization level,
noise type and SNR. Four SNR levels are used dB, 18 dB,
12 dB, and 6 dB, with the SNR value computed as the ratio
of the total energy of speech to that of the noise over the
entire utterance, according to the procedure. The results
show that the proposed algorithm performs overall better
than G.729B with noticeable improvement in the Gaussian-
like noises, such as street and parking garage and moderate
to low SNR. Digital steganography in low bit rate audio
streams is commonly regarded as a challenging topic in the
field of data hiding. There have been several steganography
methods of embedding data in audio streams. A G.711
based adaptive speech information hiding approach.
lossless steganography in G.711 encoded speeches.
A steganography method of embedding data in G.721
encoded speeches. All these methods adopt high bit rate
audio streams encoded by the waveform codec as cover
objects, in which plenty of least significant bits exist.
However, VoIP are usually transmitted over low bit rate
audio streams encoded by the source codec like ITU
G.723.1 codec to save on network bandwidth. Low bit rate
audio streams are less likely to be used as cover objects for
steganography since they have fewer least significant bits
than high bit rate audio streams. Little effort has been made
to develop algorithms for embedding data in low bit rate
audio streams. The embedded information in G.729 and
MELP audio streams. A steganography algorithm for
embedding information in low bit rate audio streams. But
these steganography algorithms have constrains on the data
embedding capacity that is, their data embedding rates are
too low to have practical applications. Thus the main focus
of this study was to work out how to increase the data
embedding capacity of steganography in low bit rate audio
streams. The some related work, discussing the possibility
of embedding data in the inactive frames of low bit rate
audio streams. In the imperceptibility of the steganography
algorithm for embedding data in the inactive audio frames
is analyzed.
II. LITERATURE REVIEW
A prediction-based conditional entropy coder which
Annex A, et al [1] Dual rate speech coder for multimedia
communications transmitting at 5.3 and 6.3 Kbit/s, G.723.1
specifies a coded representation that can be used for
compressing the speech or other audio signal component of
multimedia services at a very low bit rate. In the design of
this coder, the principal application considered was very
low bit rate visual telephony as part of the overall H.324
family of standards. G.723.1 has two bit rates associated
with it. These are 5.3 and 6.3 Kbit/s. The higher bit rate has
greater quality. The lower bit rate gives good quality and
provides system designers with additional flexibility. Both
rates are a mandatory part of the encoder and decoder. It is
possible to switch between the two rates at any 30 ms
frame boundary. An option for variable rate operation using
discontinuous transmission and noise fill during non-
speech intervals is also possible.
N. Aoki, et al [2] Steganography may be employed for
secretly transmitting side information in order to improve
the performance of signal processing such as packet loss
concealment and band extension of telephony speech. The
previous studies employ LSB replacement technique for
embedding steganogram information into speech data.
Instead of such a lossy steganography technique, this study
has investigated a loss less steganography technique for
G.711, the most common codec for telephony systems such
as VoIP. The proposed technique in this study exploits the
redundancy of G.711 for embedding steganogram
information into speech data without degradation.

80

This paper newly investigates the possibility of a semi-
loss less steganography technique for increasing the
capacity of the loss less steganography technique.
C. Bao, et al [4] Based on the analyzing of the
redundancy of coded parameters in G.723.1, a novel
approach to detect hiding information is proposed in this
paper. By using the statistical value of increaser entropy,
this scheme can not only detect hidden messages embedded
in compressed speech, but also estimate the embedded
message length accurately. The experimental results show
that the proposed scheme is effective.
M. U. Celik, et al [5] A novel lossless (reversible) data-
embedding technique, which enables the exact recovery of
the original host signal upon extraction of the embedded
information. A generalization of the well-known least
significant bit (LSB) modification is proposed as the data-
embedding method additional operating points on the
capacity-distortion curve. Lossless recovery of the original
is achieved by compressing portions of the signal that are
susceptible to embedding distortion and transmitting these
compressed descriptions as a part of the embedded payload.
utilizes unaltered portions of the host signal as side-
information improves the compression efficiency and thus
the lossless data-embedding capacity.
L. Ma, Z. Wu, and W. Yang, et al [8] an approach for
speech information hiding based on G.721 scheme.
Dynamic secret speech information data bits can be
embedded into original carrier speech data, with high
efficiency in steganography and good quality in output
speech. This method is superior to available classical
algorithms on hiding capacity and robustness. This paper
implements the proposed approach based on speech coding
scheme G.721 and the experiments show that this approach
meets the requirements of information hiding, satisfies the
constraints of speech quality for secure communication,
and achieves high hiding capacity of 1.6Kbps with an
excellent speech quality and complicating speakers
recognition.
F.A.P. Petit colas, et al [9] Information-hiding
techniques have recently become important in a number of
application areas. Digital audio, video, and pictures are
increasingly furnished with distinguishing but
imperceptible marks. Military communications systems
make increasing use of the traffic security techniques
which, rather than the merely concealing the content of a
message using encryption, seek to conceal its sender, its
receiver, or its very existence. Similar techniques are used
in some mobile phone systems for digital elections.

Criminals try to use whatever traffic security properties
are provide intentionally or otherwise in the available
communications systems, and police forces try to restrict
their use.
Z. Wu and W. Yang et al [13] suggested a G.711-based
an adaptive LSB (Least Significant Bit) algorithm to embed
dynamic secret speech information data bits into public
speech of G.711-PCM (Pulse Code Modulation) for the
purpose of secure communication according to energy
distribution with high efficiency in steganography and good
quality in output speech. It is superior to available classical
algorithms, LSB. The embedding up to 20 Kbps
information data of secret speech into G.711 speech at an
average embedded error rate of 10
5
. It meets the
requirements of information hiding, and satisfies the secure
communication speech quality constraints with an excellent
speech quality and complicating speaker recognition.
B. Xiao, et al [14] Which is applied to information
hiding in instant low bit-rate speech stream. The QIM
method divides the codebook into two parts, each
representing 0 and 1 respectively. Instead of randomly
partitioning the codebook, the relationship between code
words is considered. The proposed algorithm
Complementary Neighbor Vertices (CNV) guarantees that
every codeword is in the opposite part to its nearest
neighbor, and the distortion is limited by a bound. The
feasibility of CNV is proved with graph theory. Moreover,
in our work the secret message is embedded in the field of
vector quantization index of LPC coefficients, getting the
benefit that the distortion due to QIM is lightened
adaptively by the rest of the encoding procedure.
Experiments on iLBC and G.723.1 verify the effectiveness
of the proposed method. Both objective and subjective
assessments show the proposed method only slightly
decreases the speech quality to an indistinguishable degree.
The hiding capacity is no less than 100 bps. To the best of
our knowledge, this is the first work adopting graph theory
to improve the codebook partition while using QIM in low
bit-rate streaming media.
III. PROPOSED SYSTEM
The PCM codec is based on the waveform model that
samples, quantizes, and encodes audio signals directly. The
sample value represents the original volume of the signal.
The inactive voice cannot be used to embed information
since it will lead to obvious distortion. This codec
compresses the speech at a very low bit rate and performs
on a frame-by-frame each frame is encoded into various
parameters rather than the sample volumes.

81

Thus the volume of the speech does not change
imperceptibly even though their inactive audio frames
contain hidden information. The theoretical analysis above
suggests that steganography in the inactive frames of low
bit rate audio streams would attain a larger data embedding
capacity if an appropriate steganography algorithm were
used. The two type of frame and steganography algorithms
are then used respectively to embed the secret information.
Then the low bit rate stream with hidden information is
called stegano speech. which is transmitted to using VoIP.
The stegano speech is then decoded and the extraction of
secret information from the stegano speech is the inverse
process of the embedding algorithm. Then finally got the
secret information as well as PCM formatted audio stream.
In our proposed steganography algorithm not only
achieved perfect imperceptibility but also achieved a high
data embedding capacity we using XOR operation with
8kb/s. The data embedding capacity of the proposed
algorithm is very much larger than those of previously
suggested algorithms.
A. VAD Algorithm
The input speech and noise data will be read from WAV
files with a sampling rate of 8 kHz. The speech data will be
divided into frames of 80 samples (10 ms).For each frame,
the VAD function (supplied) will return a 0/1 indication to
indicate whether speech is active voice and inactive voice.
If the Enr<Thresh is called inactive voice and Enr>=Thresh
then it is called active voice. Active speech frame, the
samples will be converted to -law codes (8-bits per
sample).For the first inactive speech frame, a silence
descriptor (SID) frame will be transmitted. This will
contain at most 11 parameters. Subsequent inactive speech
frames only send a place holder to indicate no information.
This information for each frame will be written to a file.
Active frames will contain a flag (byte with value
1), followed by 80 bytes of speech data.
SID frames will contain a flag (byte with value 2),
followed by 11 floating point values.
Silence frames will contain a flag (byte with value
0) and nothing else.
16-bit samples to 8-bit codes and because of the "silence
compression" afforded by DTX. The transmitter must
generate a file. The receiver has only the file available and
must generate speech samples from the data in the file.
Even in this simple setup, the intermediate file will be
significantly smaller than the original file because of the -
law coding. The receiver reads the first byte of each frame
from the data file and based on the flag value operates as
follows.

For active frames, the speech data is decoded and
converted to 80 speech samples. In SID, frames, 80
comfort noise samples are generated. The noise is
generated based on the information in the 11 parameters
contained in the SID frame. In silence frames, 80 comfort
noise samples are generated based on the information
received in SID frames.
B. Data Embedding
All the audio signal and secret message are then encoded
uniformly by XOR into low bit rate stream. Then the output
is for formed Encrypted message. The low bit rate stream
contains inactive and active frames. The two type of frame
and steganography algorithm are then used respectively to
embed the secret information. Then the low bit rate stream
with hidden information is called stegano speech, which is
transmitted to using channel. Then the encrypted message
are again hide with original audio. Then new audio is
generated. This is provide the security.
C. Data Extracting
The stegano speech is then decoded and the extraction of
secret information from the stegano speech is the inverse
process of the embedding algorithm. Then the new audio is
transmitted to the receiver through the channel. The
receiver extract the audio signal and encrypted message.
Then the encrypted message and audio signal is De-XOR.
Then the receiver received formatted audio stream and
secret information. Then finally got the secret information
as well as PCM formatted audio stream.
IV. CONCLUSION
A high-capacity steganography algorithm for embedding
data in the active and inactive frames of low bit rate audio
streams encoded by XOR operation source codec. VAD
algorithms are used to separate active frame and inactive
frame from the audio. The data are hidden in both active
and inactive frames. The encrypted messages are again
hidden with the same audio. The experimental results have
shown that our proposed steganography algorithm can
achieve a larger data embedding capacity with
imperceptible distortion of the original speech, compared
with other three algorithms. We have also demonstrated
that the proposed steganography algorithm is more suitable
for embedding data in inactive audio frames than in active
audio frames. However, before the proposed algorithm
comes into practical use in conversion of VoIP
communications. The algorithm comes into practical use in
converting VoIP communications. VoIP streams are more
suitable for the data embedding than the active and the
inactive frames of streams.

82

To use this method VoIP uses the integrity of hidden
message which gives no packet loss. In future we planned
to make this method more efficient by giving zero packet
loss and higher data security.
REFERENCES
[1 ] Annex, A. (2009) Dual rate speech coder for multimedia
communications transmitting at 5.3 and 6.3 Kbit/s, ITU-T
Recommendation G.723.1 [Online].
Available:http://www.itu.int/net/itut/sigdb/speaudio/AudioForms.asp
x?val=11 172.
[2 ] Aoki, N. (2008) A technique of lossless steganography for G.711
telephony speech, in Proc. 2008 4th Int. Conf. Intelligent Inf.
Hiding Multimedia Signal Process. (IIH-MSP), Harbin, Aug. 2008,
pp. 608611.
[3 ] Bai, L.Y. and Xiao, B. (2008) Covert channels based on jitter field
of the RTCP header, in Proc. IEEE Int. Conf. Intelligent Inf. Hiding
Multimedia Signal Process, pp. 13881391.
[4 ] Bao.C, and Zhu, C. (2006) Steganalysis of compressed speech, in
Proc. IMACS Multiconf. Computational Eng. Syst. Applicat.
(CESA), pp. 510.
[5 ] Celik.M.U, Sharma, G. and Saber, E. (2005) Lossless generalized
lsb data embedding, IEEE Trans. Image Process., vol. 14, no.2, pp.
253266.
[6 ] Chen.B, and Wornell, G.W. (2001) Quantization index modulation:
a class of provably good methods for digital watermarking and
information embedding'. IEEE Transactions on Information Theory,
Vol47(4): pp.1423-1443.

[7 ] Kitawaki.N, Nagabuchi, H. and Itoh, K. (1988) Objective quality
evaluation for low-bit-rate speech coding systems, IEEE J. Sel.
Areas Commun.,vol. 6, no. 2, pp. 242248.
[8 ] Ma.L, Z. Wu, and Yang, W. (2007) Approach to hide secret speech
information in G.721 scheme, Lecture Notes Comput. Sci., vol.
4681, pp.13151324.
[9 ] Petitcolas.F.A.P, and Kuhn,M.G. (1999) Information hiding-a
survey Proceedings of the IEEE, Vol.87(7):pp. 1062-1078.
[10 ] Phil Sallee, (2004) Model-Based Steganography, IWDW 2003,
LNCS 2939, pp.154-167,
[11 ] Quatieri.T.F. (2002) Discrete-Time speech signal processing:
Principles and practice, Prentice Hall PTR.
[12 ] Tian, H. Zhou, K, Feng, D and Liu, J. (2008) A covert
communication model based on least significant bits steganography
in voice over IP, in Proc. 9th Int. Conf. For Young Comput.
Scientists, pp. 647652.
[13 ] Wang Chungyi, Wu Quincy.(2007) Information Hiding in Real-
Time VoIP Streams. Ninth IEEE International Symposium on
Multimedia, Proceedings, pp.255-262.
[14 ] Wu and W. Yang. (2006) G.711-based adaptive speech information
hiding approach, Lecture Notes Comput. Sci., vol. 4113, pp. 1139
1144.
[15 ] Xiao, B, Huang, Y. F. and Tang, S. (2008) An approach to
information hiding in low bit rate speech stream, in Proc. IEEE
GLOBECOM, pp. 371375, IEEE Press.

Ijetae 0413 14

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ijetae 0413 14

Uploaded by

Copyright:

Available Formats

International Journal of Emerging Technology and Advanced Engineering

You might also like