You are on page 1of 6

TWO ALGORITHMS IN DIGITAL AUDIO STEGANOGRAPHY

USING QUANTIZED FREQUENCY DOMAIN EMBEDDING AND


REVERSIBLE INTEGER TRANSFORMS
Sos S. Agaian
1
, David Akopian
1
and Sunil A. DSouza
1
1
Non-linear Signal Processing Lab, University of Texas at San Antonio,
6900 North Loop 1604 West, San Antonio, Texas 78249, USA.
sagaian@utsa.edu , dakopian@utsa.edu, sdsouza@lonestar.utsa.edu


ABSTRACT
Steganography is the art of hiding secret messages
within another innocuous message or carrier. Steg-
anography in digital audio has received considerable in-
terest and in this paper we present two algorithms for se-
cure digital audio steganography. In the first algorithm
we use classical unitary transforms with quantization in
the transform domain to embed the secure data. The se-
cure data is embedded in the transform domain coeffi-
cients. In the second algorithm we use a reversible inte-
ger transform to obtain the transform domain coeffi-
cients. In the integer domain we look at the binary repre-
sentation of the integer coefficients and embed the se-
cure information as an extra bit. We also introduce a ca-
pacity measure to select audio carriers that will introduce
minimum distortion after embedding.

Keywords reversible, steganography, data hiding


1. INTRODUCTION
In the last few years the infrastructure for distribution of
digital media has grown rapidly. The distribution of such
digital media provides an excellent opportunity for trans-
mission of hidden information. Recently, digital steg-
anography has received considerable interest and is used
to hide information in carriers such as digital audio, im-
ages or video so that only the intended recipient is capa-
ble of retrieving the hidden information.
Typically, algorithms in steganography can be
broadly classified into two classes: 1) a spread spectrum
based approach wherein a pseudo-noise sequence added
to the carrier, 2) quantize the carrier and replace some of
the quantized data with steganographic information [2].
Each of these classes include techniques that can applied
in time and transform domains. In [1], Cox et al require
the embedded information to be constructed as an inde-
pendent and identically distributed (i.i.d.) Gaussian ran-
dom vector that is inserted in spread spectrum like fash-
ion into the carrier. A common example of second class
is the manipulation of the least significant bit to repre-
sent the embedded information.
Steganography in digital audio has invited con-
siderable research interests and many techniques [3-5]
have been proposed based on the characteristics of digi-
tal audio signals and the human auditory system (HAS).
The effects of HAS relative to steganography are tempo-
ral masking and frequency masking. In temporal mask-
ing a weaker audible signal on either side (pre and post)
of a strong masker becomes imperceptible. Similarly, in
frequency masking, if two signals occurring simultane-
ously are close together in frequency, the stronger mask-
ing signal may make the weaker signal inaudible [5].
Wang et al [3], propose an audio watermarking
algorithm based on HAS principles. The procedure in-
volves selecting an audio clip immediately after a loud
sound. The clip is transformed to the frequency domain
and spectral components adjacent to high peaks are se-
lected. A combination of a pn-sequence and secure data
is embedded in a frequency band. Detection is achieved
by autocorrelation properties of pn-sequences. High em-
bedding capacity is achieved by QAM modulation tech-
niques. In [10], Tian proposes an integer wavelet trans-
form based watermarking algorithm. The algorithm uses
the integer wavelet transform to obtain the integer coef-
ficients in the transform domain. The binary representa-
tion of the coefficient is looked at and an additional bit
representing secure information is added, thereby ex-
panding the coefficient. A location map is embedded
that identifies the pixels that are changed. Tilki and Beex
[4], propose an algorithm for encoding a 35 bit digital
signature onto the audio component of a television sig-
nal. The digital signature is encoded using 167 sinusoids
in the 2.4 to 6.4 KHz range, specifically chosen as hu-
man sensitivity declines compared to its peak at 1 KHz.
The 167 frequencies are chosen to correspond to the bin
frequencies of the 4096 point FFT of the original audio
segment. The digital signature is then added to the audio
component. The signature is detected by comparing the
magnitude of adjacent FFT bins to a threshold and mak-
ing a decision. Swanson et al [5] propose a robust audio
watermarking algorithm in which the power spectrum of
an audio block is calculated and tonal components below
the absolute hearing threshold are removed. A fre-
quency-masking threshold for each block is calculated
and used to weight a noise-like watermark. A temporal
mask is used to further shape the time domain represen-
tation of the watermark. Adding the two signals creates
the watermarked segment of audio. Detection requires
the original signal and is accomplished by hypothesis
testing.
In this paper, we present two algorithms for se-
cure digital audio steganography. Both algorithms do not
require the original signal or information about the se-
cure content for detection. Only a few parameters are
necessary for detection and extraction of the steg-
anographic data. Both algorithms feature a small pseudo-
noise sequence added to the carrier for detection pur-
poses. The framework of both steganography algorithms
includes characteristics of the human auditory system.
While the frequency domain based steganography algo-
rithm includes temporal and frequency-masking charac-
teristics, the integer transform based algorithm includes
temporal characteristics. We also introduce a simple ca-
pacity measure for selecting the cover audio clip that has
the least distortion after the embedding process, based
on the capacity measure formula developed in [7]. Ex-
perimental results show that the changes in the embed-
ded audio section are imperceptible.
The remainder of the paper is organized as fol-
lows: Section 2 introduces the block diagram and steps
for encoding and decoding process of the proposed algo-
rithm. Section 3 deals with the simulation results and
section 4 provides the conclusion.


2. PROPOSED ALGORITHMS
In this section, we present two algorithms for secure
digital audio steganography. The first algorithm is called
Quantized-frequency Secure Audio Steganography al-
gorithm (QSAS). It is based on classical unitary trans-
forms and quantization in the transform domain and is
an extension of the work on watermarking presented in
[3]. The differences are that our algorithm is developed
for steganography wherein higher embedding capacity is
relatively more important than the robustness require-
ments of watermarking. We also present a simple capac-
ity measure to select an audio with the best embedding
capacity.
In the QSAS algorithm, we select the Fou-
rier transform as we are dealing with audio signals and
its properties in the frequency domain are well known.
The DFT of an N-point discrete-time signal x(n) is de-
fined by
N 1
kn
N
n 0
X(k) x(n)W ,

=
= for k =0,1,,N-1 (1)
where
2 / j N
N
W e

= . Similarly, the IDFT can be given
by
N 1
-kn
N
0
x(n) X(k)W ,
k

=
= for n =0,1,,N-1 (2)
The coefficients appearing in FFT and IFFT structures
are complex numbers with magnitude one and their in-
verses are complex conjugates of each other. The bin
frequencies for an N point FFT are given by fs/N where
fs is the sampling frequency. Among the N bins only
N/4 bins are modified and the embedded information in-
cludes the pseudo random sequence and secure data.
Our goal in the development of the second algo-
rithm was to have a simple and robust integer based re-
versible audio steganography algorithm. The algorithm
is a variation of the watermarking algorithm presented in
[10]. The key differences are as follows:
1. The algorithm in [10] was implemented for water-
marking images, while our algorithm is developed
for audio steganography in which higher embedding
capacity is relatively more important than the ro-
bustness requirements of watermarking.
2. Unlike images, audio samples are not represented
by integer values and preprocessing of the audio
signal plays a significant role.
3. We use a pseudo-random sequence to detect the lo-
cation of the secure data and do not require a loca-
tion map to identify the samples that have been
modified by the algorithm.
4. Our algorithm uses the temporal characteristic of
HAS to select the location at which we embed the
secure data.
We term this algorithm as Integer Transform based Se-
cure Audio Steganography algorithm (ITSAS).
The output of an analog-to-digital converter
(ADC) is a quantized Q bit integer value that typically
represents the range of the input analog signal. For ex-
ample, audio files in the Microsoft .wav format this
range is mapped to (-1, 1). For processing these signals
using integer transforms, each value must be converted
to integer format.
Computers typically process real numbers using
floating-point representation. Converting from floating
point to integer values is often very slow. An alternative
representation is the fixed-point format in which a Q bit
number is represented as M: N, where M=Q is the total
number of bits that determine the range; N is number of
bits that determine the precision. In some articles, the
fixed-point format is referred to as the Q-format and is
given as Q-15, Q-30, etc. implying that the precision
(fractional part) is represented by 15 or 30 bits, respec-
tively. A number represented in fixed-point format can
be converted to an integer by multiplying the number by
2
N
. The original Q format can be obtained by reversing
the process.
The reversible integer transform [10] for adjacent
sample pair (x, y) is given as
,
2
x y
l h x y
+
= =


(3)
where the symbol .

is the floor function meaning,
the greatest integer less than or equal to. The l coef-
ficient is just the average of two adjacent samples while
the h is the difference. Note, for audio signals the dif-
ference coefficient is typically very small compared to
the average coefficient of the signals. The inverse trans-
form of (3) is given as
1
,
2 2
h h
x l y l
+
= + =


(4)



2.1. Quantized Frequency Secure Audio Steg-
anography
In this section, we give a system description of the quan-
tized frequency domain audio steganography algorithm.


2.1.1. Encoding block
A system block diagram of the encoding process is given
in Figure 1. The basic encoding steps are as follows:

Inputs: cover audio segment, secured data.
Step 1: Find all potential embedding blocks in the time
domain of the audio signal based on temporal
masking characteristics.
Step 2: Generate the frequency spectrum of the selected
block.
Step 3: Generate an m-sequence of suitable length.
Step 4: Combine the m-sequence and the secure data cre-
ate a new sequence.
Step 5: Select the best block in which we wish to embed
the secured data. The frequency range of the
band lies between parameters P and Q, based on
frequency masking characteristics.

Figure1. Block diagram of Encoding Block - Frequency
Domain based Algorithm

Step 6: Quantize the spectral content of the band.
Step 7: Quantize the new sequence and additively embed
it into the cover.
Step 8: Take the inverse Fourier transform of the block.
Step 9: Replace the time domain block in the original
audio segment with the encoded block.
Step 10: Repeat steps 3-9 for all blocks.
Output: Audio signal with secure information.

2.1.2. Decoding block
The system block diagram for the decoding block is
given in Figure 2. The block size, temporal threshold
value as a percentage of maximum amplitude, order of
the m-sequence and the quantization step are fixed val-
ues and are available to the decoding block. The basic
decoding steps are as follows:

Input: Audio signal with secure information.
Step 1: Locate all potential blocks in the time domain of
the audio signal based on threshold value. Obvi-
ously, these blocks are the same as the blocks lo-
cated in the encoding process.
Step 2: Generate an m-sequence of given order. The
generator polynomial is the same in the encoding
and decoding block.

Figure 2: Block diagram of decoding block - Frequency
Domain based Algorithm

Step 3: Generate the frequency spectrum of the selected
block.
Step 4: Quantize the block with the same quantization
step.
Step 5: Correlate the block and the m-sequence obtained
in step 2 to locate the starting point of the em-
bedded band.
Step 6: Extract the embedded information.
Step 7: Repeat steps 3-6 for the each of the other blocks.
Output: Secure information.


2.2. Integer Transform based Secure Audio Steg-
anography

2.2.1. Encoding block
In this section, we give a system description of the inte-
ger transform based secure audio steganography algo-
rithm. The system block diagram for the embedding
block is given in Figure 3.

Inputs: cover audio segment, secured data.
Identify
Potential
Blocks
Select
Frame
Select
Frequency
Band
M-
Sequence
Generator
Secured
Data
Embed
Mixer
Quantize
Block
FFT
INPUT
iFFT
Identify
Potential
Blocks
Select
Frame
Select
Frequency
Band
M-
Sequence
Generator
Extract
Data
Quantize
Block
FFT
INPUT
Correlate
Block
Step 1: Find all blocks where data can be embedded
based on temporal masking characteristics of the
audio signal.
Step 2: The pre- process block is used to convert audio
signal in each block to integer domain.
Step 3: Compute the forward integer transform and ob-
tain the transform domain coefficients as per (3).
Step 4: Generate an m-sequence of suitable length (m).
This the pilot sequence.
Step 5: Combine the pilot with the secure data to form a
new sequence.
Step 6: Look at the binary representation of each h or
difference coefficient. If the addition of a bit
does not cause overflow or underflow, unlikely
due to small difference values, then embed a bit
of secure data from the sequence created in step
4 at the MSB+1 position. This creates a new
value for the difference coefficienth .

Figure 3. Block diagram of Encoding Block - Integer.

Step 7: Compute the reverse integer transform as per
(4).
Step 8: Replace the audio frame into the original audio
stream.
Step 9: Repeat steps 2-8 for each of the blocks.
Output: Audio Signal with secure information.


2.2.2. Decoding block
The system block diagram for the decoding block of the
integer-based transform is given in Figure 4. The length
of the m-sequence and the block size are available to the
decoding block. The basic decoding steps are as follows:

Inputs: Audio signal with secure information.
Step 1: Select all blocks based on temporal masking
characteristics of the audio signal.
Step 2: The pre-process block is used to convert the au-
dio signal in each block to integer domain.
Step 3: Compute the forward integer transform of the
frame and obtain the transform domain coeffi-
cients as per (3).
Step 4: Generate an m-sequence of length (m).
Step 5: Look at the MSB+1 of the binary representation
of the difference coefficient for a block size of
m.
Step 6: Correlate the bit stream obtained in step 5 with
the m-seq. Repeat step 5 till the sequence is lo-
cated in the frame.

Step 7: Extract the MSB+1 bit from the difference coef-
ficients to obtain new values for the coeffi-
cienth h = .
Figure 4. Block diagram of Decoding Block - Integer.

Step 8: Compute the inverse integer transform as per (4)
and replace the audio frame into the original au-
dio stream.
Step 9: Repeat steps 2-8 for all blocks.
Output: Extracted information and original audio signal.


3. COMPUTER SIMULATION
In this section we introduce a capacity measure formula
for selecting the audio that would introduce the least dis-
tortion after the embedding process. The formula is
given as:
No.of samplesintheemdeddingband
M = MSE x
audio
No.of bitsinthesecuredata

The capacity measure is calculated by multiplying the
mean square error times the bit ratio of the embedding
band in the cover audio and the secure message. For
computer simulations we tested the algorithms over
many audio pieces from classical, pop, country and
speech.
All cover audio signals are about 3 seconds long
and are sampled at 44100 KHz with 16 bits resolution.
The classical pieces are Beethovens Symphony No.4
Select
Blocks
Pre Process
Block
F. Integer
Transform
M-seq. Correlate
Block
Mixer
INPUT
Post
Process
Block
Extract
Block
R. Integer
Transform
Select
Blocks
Pre Process
Block
F. Integer
Transform
M-seq.
Secure
Data
Embed
Block
R. Integer
Transform
Mixer
INPUT
Post Process
Block
in B flat, Adagio - Allegro vivace and Walter Pistons
Turnbridge Fair, respectively. The country songs are
Lyle Lovetts Long Tall Texan and Nanci Griffiths
If I had a Hammer. The pop pieces are by Paul Simon
called That was your mother and the Christmas clas-
sic Away in a manger recorded by Peter J acobs.
In the frequency domain based algorithm the
range of the embedding band is chosen as P =5.5 KHz
and Q =11 KHz (BW/4 BW/2) and the block size is
chosen as 512 samples. As noted earlier, only N/4 =
128 frequencies are modified. As the Fourier coeffi-
cients are complex conjugates we actually embed in-
formation in half these coefficients. The time domain
plot of the original signal is shown in Figure 5. As can
be seen in the plot, the potential embedding blocks
based on HAS characteristics are identified. The thresh-
old is chosen to be 55 % of the maximum amplitude of
the chosen cover audio signal. Figure 6 shows addi-
tional time domain plots of the cover audio signals. The
embedding capacity is different for different audio
cover signals based on the availability of embedding
blocks that satisfy the HAS criteria.

Figure 5: Cover audio signals with potential blocks
identified with threshold at 55% of maximum value of
cover audio.
Figure 6
d c
b a
: Cover audio signals a) Classical 13
blocks, b) Country 10 blocks, c) Pop 18 blocks, d)
Speech 23 blocks.
The analysis of sample audio signals is provided
in Table 1. As can be seen from the data the M
audio
measure is a simple method for selecting an audio clip
that introduces the least distortion.
Signal Blocks /
Payload
(bits)
SNR RMS
M
audio
(10
4
)
Classical 21 / 1323 34.98 0.0027 12.25
Classical 13 / 819 34.87 0.0028 8.15
Country 10 / 630 34.46 0.0017 2.31
Country 28 / 1764 39.70 0.0023 11.85
Pop 18 / 1134 39.47 0.0020 5.76
Pop 33 / 2079 46.56 0.0005 0.66
Speech 23 / 1449 43.02 0.0016 4.71
Table 1: SNR, RMS and M
audio
for audio signals em-
bedded using quantization in frequency domain.

The computer simulations for the integer-based
algorithm were performed on the same set of audio sig-
nals. The block size is 512 and the number of bits that
can be embedded per block is equal to 256. The analysis
is provided in Table 2.

Signal Blocks /
Payload
(bits)
SNR RMS
M
audio
(10
4
)
Classical 21 / 4557 19.9 0.026 334.95
Classical 13 / 2821 17.95 0.022 148.46
Country 10 / 2170 17.47 0.024 135.90
Country 28 / 6076 25.47 0.015 148.65
Pop 18 / 3906 16.48 0.013 71.77
Pop 33 / 7161 27.95 0.006 27.10
Speech 23 / 4991 25.86 0.012 78.14
Table 2: SNR, RMS and M
audio
for audio signals em-
bedded using reversible integer transform.

4. CONCLUSION
In conclusion, we presented two algorithms for digital
audio steganography with embedding in the frequency
domain and the integer transform domain. Experimental
results for both methods indicate that the changes in the
embedded audio section are inaudible. The QSAS algo-
rithm has lower embedding capacity but has much better
SNR values. The ITSAS algorithm is preferred as it is
reversible, simple, and efficient with acceptable SNR
values. We also introduced a capacity measure that can
be used to select an audio clip that introduces the least
distortion after the embedding process.


5. ACKNOWLEDGMENTS
This research was partially funded by the Center for In-
frastructure Assurance and Security.

0 2 4 6 8 10 12 14
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Waveform of the audio signal: beethoven
s
ym4.wav - Length = 512/ Blocks = 21
Sampl es
Threshol d = 0.42539
0 2 4 6 8 10 12 14
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Waveform of the audio signal: turnbrge
f
air.wav - Length = 512/ Bl ocks = 13
Samples
Threshold = 0.47898
0 2 4 6 8 10 12 14
x 10
4
-1
-0.5
0
0.5
1
Waveform of the audio signal: paulsimon.wav - Length = 512/ Blocks = 18
Samples
Threshold = 0.55
0 2 4 6 8 10 12 14
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Waveform of the audi o si gnal : ni nesecl yl e.wav - Length = 512/ Bl ocks = 10
Samples
Threshold = 0.29623
0 2 4 6 8 10 12 14
x 10
4
-1
-0.5
0
0.5
1
Waveform of the audio signal: myspeech.wav - Length = 512/ Blocks = 23
Samples
Threshold = 0.55
6. REFERENCES
[1] I. J . Cox, J. Kilian, F. T. Leighton and T. Shamoon, Se-
cure Spread Spectrum Watermarking for Multimedia,
IEEE Trans. Signal Processing, vol. 6, no. 12, pp. 1673-
1687, December 1997.
[2] B. Chen and G. W. Wornell, Digital Watermarking and
Information Embedding using Dither Modulation, Mul-
timedia Signal Processing, 1998 IEEE Second Workshop,
pp: 273-278, December 1998.
[3] S. Wang, X. Zhang, and K. Zhang, Data Hiding in Digi-
tal Audio by Frequency Domain Dithering, MMM-
ACNS, Springer-Verlag, Berlin Heidelberg, 2003, pp.383-
394.
[4] J .F. Tilki and A.A. Beex, Encoding a Hidden Digital Sig-
nature onto an Audio Signal Using Psychoacoustic Mask-
ing, in Proc. 7
th
International Conference on Signal Proc-
essing Applications & Technology, Boston MA, October
1996, pp. 476-480.
[5] M. D. Swanson, B. Zhu, A. H. Tewfik, L. Boney, Robust
audio watermarking using perceptual masking, Signal
Processing, vol.66, 1998, pp. 337-355.
[6] S. Oraintara, Y. Chen, T. Nguyen, Integer Fast Fourier
Transform, IEEE Trans. Signal Processing.
[7] Sos S. Agaian and Juan P. Perez, New Pixel Sorting
Method for Palette Steganography and Steganographic
Capacity Measure, GSteg Pacific Rim Workshop on
Digital Steganography, November 17-18, 2004, pp. 37-35,
ACROS Fukuoka 1-1 Tenjin 1-chome, Chuo-ku, Fukuoka,
810-0001 J apan
[8] Bassia, P., Pitas, I., and Nikolaidis, N, Robust Audio Wa-
termarking in the Time Domain, IEEE Trans. Multime-
dia, vol. 3 (2001) 232241
[9] Tilki, J.F., Encoding a Hidden Digital Signature Using
Psychoacoustic Masking, Thesis submitted to the Faculty
of the Bradley Department of Electrical and Computer
Engineering,Virginia Polytechnic Institute and State Uni-
versity, J une 9, 1998.
[10] Tian, J un, High Capacity Reversible Data Embedding
and Content Authentication, IEEE Conference on Acous-
tics, Speech and Signal Processing, vol. 3, pp. 517-520,
2003.

You might also like