Professional Documents
Culture Documents
=
= for k =0,1,,N-1 (1)
where
2 / j N
N
W e
= . Similarly, the IDFT can be given
by
N 1
-kn
N
0
x(n) X(k)W ,
k
=
= for n =0,1,,N-1 (2)
The coefficients appearing in FFT and IFFT structures
are complex numbers with magnitude one and their in-
verses are complex conjugates of each other. The bin
frequencies for an N point FFT are given by fs/N where
fs is the sampling frequency. Among the N bins only
N/4 bins are modified and the embedded information in-
cludes the pseudo random sequence and secure data.
Our goal in the development of the second algo-
rithm was to have a simple and robust integer based re-
versible audio steganography algorithm. The algorithm
is a variation of the watermarking algorithm presented in
[10]. The key differences are as follows:
1. The algorithm in [10] was implemented for water-
marking images, while our algorithm is developed
for audio steganography in which higher embedding
capacity is relatively more important than the ro-
bustness requirements of watermarking.
2. Unlike images, audio samples are not represented
by integer values and preprocessing of the audio
signal plays a significant role.
3. We use a pseudo-random sequence to detect the lo-
cation of the secure data and do not require a loca-
tion map to identify the samples that have been
modified by the algorithm.
4. Our algorithm uses the temporal characteristic of
HAS to select the location at which we embed the
secure data.
We term this algorithm as Integer Transform based Se-
cure Audio Steganography algorithm (ITSAS).
The output of an analog-to-digital converter
(ADC) is a quantized Q bit integer value that typically
represents the range of the input analog signal. For ex-
ample, audio files in the Microsoft .wav format this
range is mapped to (-1, 1). For processing these signals
using integer transforms, each value must be converted
to integer format.
Computers typically process real numbers using
floating-point representation. Converting from floating
point to integer values is often very slow. An alternative
representation is the fixed-point format in which a Q bit
number is represented as M: N, where M=Q is the total
number of bits that determine the range; N is number of
bits that determine the precision. In some articles, the
fixed-point format is referred to as the Q-format and is
given as Q-15, Q-30, etc. implying that the precision
(fractional part) is represented by 15 or 30 bits, respec-
tively. A number represented in fixed-point format can
be converted to an integer by multiplying the number by
2
N
. The original Q format can be obtained by reversing
the process.
The reversible integer transform [10] for adjacent
sample pair (x, y) is given as
,
2
x y
l h x y
+
= =
(3)
where the symbol .
is the floor function meaning,
the greatest integer less than or equal to. The l coef-
ficient is just the average of two adjacent samples while
the h is the difference. Note, for audio signals the dif-
ference coefficient is typically very small compared to
the average coefficient of the signals. The inverse trans-
form of (3) is given as
1
,
2 2
h h
x l y l
+
= + =
(4)
2.1. Quantized Frequency Secure Audio Steg-
anography
In this section, we give a system description of the quan-
tized frequency domain audio steganography algorithm.
2.1.1. Encoding block
A system block diagram of the encoding process is given
in Figure 1. The basic encoding steps are as follows:
Inputs: cover audio segment, secured data.
Step 1: Find all potential embedding blocks in the time
domain of the audio signal based on temporal
masking characteristics.
Step 2: Generate the frequency spectrum of the selected
block.
Step 3: Generate an m-sequence of suitable length.
Step 4: Combine the m-sequence and the secure data cre-
ate a new sequence.
Step 5: Select the best block in which we wish to embed
the secured data. The frequency range of the
band lies between parameters P and Q, based on
frequency masking characteristics.
Figure1. Block diagram of Encoding Block - Frequency
Domain based Algorithm
Step 6: Quantize the spectral content of the band.
Step 7: Quantize the new sequence and additively embed
it into the cover.
Step 8: Take the inverse Fourier transform of the block.
Step 9: Replace the time domain block in the original
audio segment with the encoded block.
Step 10: Repeat steps 3-9 for all blocks.
Output: Audio signal with secure information.
2.1.2. Decoding block
The system block diagram for the decoding block is
given in Figure 2. The block size, temporal threshold
value as a percentage of maximum amplitude, order of
the m-sequence and the quantization step are fixed val-
ues and are available to the decoding block. The basic
decoding steps are as follows:
Input: Audio signal with secure information.
Step 1: Locate all potential blocks in the time domain of
the audio signal based on threshold value. Obvi-
ously, these blocks are the same as the blocks lo-
cated in the encoding process.
Step 2: Generate an m-sequence of given order. The
generator polynomial is the same in the encoding
and decoding block.
Figure 2: Block diagram of decoding block - Frequency
Domain based Algorithm
Step 3: Generate the frequency spectrum of the selected
block.
Step 4: Quantize the block with the same quantization
step.
Step 5: Correlate the block and the m-sequence obtained
in step 2 to locate the starting point of the em-
bedded band.
Step 6: Extract the embedded information.
Step 7: Repeat steps 3-6 for the each of the other blocks.
Output: Secure information.
2.2. Integer Transform based Secure Audio Steg-
anography
2.2.1. Encoding block
In this section, we give a system description of the inte-
ger transform based secure audio steganography algo-
rithm. The system block diagram for the embedding
block is given in Figure 3.
Inputs: cover audio segment, secured data.
Identify
Potential
Blocks
Select
Frame
Select
Frequency
Band
M-
Sequence
Generator
Secured
Data
Embed
Mixer
Quantize
Block
FFT
INPUT
iFFT
Identify
Potential
Blocks
Select
Frame
Select
Frequency
Band
M-
Sequence
Generator
Extract
Data
Quantize
Block
FFT
INPUT
Correlate
Block
Step 1: Find all blocks where data can be embedded
based on temporal masking characteristics of the
audio signal.
Step 2: The pre- process block is used to convert audio
signal in each block to integer domain.
Step 3: Compute the forward integer transform and ob-
tain the transform domain coefficients as per (3).
Step 4: Generate an m-sequence of suitable length (m).
This the pilot sequence.
Step 5: Combine the pilot with the secure data to form a
new sequence.
Step 6: Look at the binary representation of each h or
difference coefficient. If the addition of a bit
does not cause overflow or underflow, unlikely
due to small difference values, then embed a bit
of secure data from the sequence created in step
4 at the MSB+1 position. This creates a new
value for the difference coefficienth .
Figure 3. Block diagram of Encoding Block - Integer.
Step 7: Compute the reverse integer transform as per
(4).
Step 8: Replace the audio frame into the original audio
stream.
Step 9: Repeat steps 2-8 for each of the blocks.
Output: Audio Signal with secure information.
2.2.2. Decoding block
The system block diagram for the decoding block of the
integer-based transform is given in Figure 4. The length
of the m-sequence and the block size are available to the
decoding block. The basic decoding steps are as follows:
Inputs: Audio signal with secure information.
Step 1: Select all blocks based on temporal masking
characteristics of the audio signal.
Step 2: The pre-process block is used to convert the au-
dio signal in each block to integer domain.
Step 3: Compute the forward integer transform of the
frame and obtain the transform domain coeffi-
cients as per (3).
Step 4: Generate an m-sequence of length (m).
Step 5: Look at the MSB+1 of the binary representation
of the difference coefficient for a block size of
m.
Step 6: Correlate the bit stream obtained in step 5 with
the m-seq. Repeat step 5 till the sequence is lo-
cated in the frame.
Step 7: Extract the MSB+1 bit from the difference coef-
ficients to obtain new values for the coeffi-
cienth h = .
Figure 4. Block diagram of Decoding Block - Integer.
Step 8: Compute the inverse integer transform as per (4)
and replace the audio frame into the original au-
dio stream.
Step 9: Repeat steps 2-8 for all blocks.
Output: Extracted information and original audio signal.
3. COMPUTER SIMULATION
In this section we introduce a capacity measure formula
for selecting the audio that would introduce the least dis-
tortion after the embedding process. The formula is
given as:
No.of samplesintheemdeddingband
M = MSE x
audio
No.of bitsinthesecuredata
The capacity measure is calculated by multiplying the
mean square error times the bit ratio of the embedding
band in the cover audio and the secure message. For
computer simulations we tested the algorithms over
many audio pieces from classical, pop, country and
speech.
All cover audio signals are about 3 seconds long
and are sampled at 44100 KHz with 16 bits resolution.
The classical pieces are Beethovens Symphony No.4
Select
Blocks
Pre Process
Block
F. Integer
Transform
M-seq. Correlate
Block
Mixer
INPUT
Post
Process
Block
Extract
Block
R. Integer
Transform
Select
Blocks
Pre Process
Block
F. Integer
Transform
M-seq.
Secure
Data
Embed
Block
R. Integer
Transform
Mixer
INPUT
Post Process
Block
in B flat, Adagio - Allegro vivace and Walter Pistons
Turnbridge Fair, respectively. The country songs are
Lyle Lovetts Long Tall Texan and Nanci Griffiths
If I had a Hammer. The pop pieces are by Paul Simon
called That was your mother and the Christmas clas-
sic Away in a manger recorded by Peter J acobs.
In the frequency domain based algorithm the
range of the embedding band is chosen as P =5.5 KHz
and Q =11 KHz (BW/4 BW/2) and the block size is
chosen as 512 samples. As noted earlier, only N/4 =
128 frequencies are modified. As the Fourier coeffi-
cients are complex conjugates we actually embed in-
formation in half these coefficients. The time domain
plot of the original signal is shown in Figure 5. As can
be seen in the plot, the potential embedding blocks
based on HAS characteristics are identified. The thresh-
old is chosen to be 55 % of the maximum amplitude of
the chosen cover audio signal. Figure 6 shows addi-
tional time domain plots of the cover audio signals. The
embedding capacity is different for different audio
cover signals based on the availability of embedding
blocks that satisfy the HAS criteria.
Figure 5: Cover audio signals with potential blocks
identified with threshold at 55% of maximum value of
cover audio.
Figure 6
d c
b a
: Cover audio signals a) Classical 13
blocks, b) Country 10 blocks, c) Pop 18 blocks, d)
Speech 23 blocks.
The analysis of sample audio signals is provided
in Table 1. As can be seen from the data the M
audio
measure is a simple method for selecting an audio clip
that introduces the least distortion.
Signal Blocks /
Payload
(bits)
SNR RMS
M
audio
(10
4
)
Classical 21 / 1323 34.98 0.0027 12.25
Classical 13 / 819 34.87 0.0028 8.15
Country 10 / 630 34.46 0.0017 2.31
Country 28 / 1764 39.70 0.0023 11.85
Pop 18 / 1134 39.47 0.0020 5.76
Pop 33 / 2079 46.56 0.0005 0.66
Speech 23 / 1449 43.02 0.0016 4.71
Table 1: SNR, RMS and M
audio
for audio signals em-
bedded using quantization in frequency domain.
The computer simulations for the integer-based
algorithm were performed on the same set of audio sig-
nals. The block size is 512 and the number of bits that
can be embedded per block is equal to 256. The analysis
is provided in Table 2.
Signal Blocks /
Payload
(bits)
SNR RMS
M
audio
(10
4
)
Classical 21 / 4557 19.9 0.026 334.95
Classical 13 / 2821 17.95 0.022 148.46
Country 10 / 2170 17.47 0.024 135.90
Country 28 / 6076 25.47 0.015 148.65
Pop 18 / 3906 16.48 0.013 71.77
Pop 33 / 7161 27.95 0.006 27.10
Speech 23 / 4991 25.86 0.012 78.14
Table 2: SNR, RMS and M
audio
for audio signals em-
bedded using reversible integer transform.
4. CONCLUSION
In conclusion, we presented two algorithms for digital
audio steganography with embedding in the frequency
domain and the integer transform domain. Experimental
results for both methods indicate that the changes in the
embedded audio section are inaudible. The QSAS algo-
rithm has lower embedding capacity but has much better
SNR values. The ITSAS algorithm is preferred as it is
reversible, simple, and efficient with acceptable SNR
values. We also introduced a capacity measure that can
be used to select an audio clip that introduces the least
distortion after the embedding process.
5. ACKNOWLEDGMENTS
This research was partially funded by the Center for In-
frastructure Assurance and Security.
0 2 4 6 8 10 12 14
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Waveform of the audio signal: beethoven
s
ym4.wav - Length = 512/ Blocks = 21
Sampl es
Threshol d = 0.42539
0 2 4 6 8 10 12 14
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Waveform of the audio signal: turnbrge
f
air.wav - Length = 512/ Bl ocks = 13
Samples
Threshold = 0.47898
0 2 4 6 8 10 12 14
x 10
4
-1
-0.5
0
0.5
1
Waveform of the audio signal: paulsimon.wav - Length = 512/ Blocks = 18
Samples
Threshold = 0.55
0 2 4 6 8 10 12 14
x 10
4
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Waveform of the audi o si gnal : ni nesecl yl e.wav - Length = 512/ Bl ocks = 10
Samples
Threshold = 0.29623
0 2 4 6 8 10 12 14
x 10
4
-1
-0.5
0
0.5
1
Waveform of the audio signal: myspeech.wav - Length = 512/ Blocks = 23
Samples
Threshold = 0.55
6. REFERENCES
[1] I. J . Cox, J. Kilian, F. T. Leighton and T. Shamoon, Se-
cure Spread Spectrum Watermarking for Multimedia,
IEEE Trans. Signal Processing, vol. 6, no. 12, pp. 1673-
1687, December 1997.
[2] B. Chen and G. W. Wornell, Digital Watermarking and
Information Embedding using Dither Modulation, Mul-
timedia Signal Processing, 1998 IEEE Second Workshop,
pp: 273-278, December 1998.
[3] S. Wang, X. Zhang, and K. Zhang, Data Hiding in Digi-
tal Audio by Frequency Domain Dithering, MMM-
ACNS, Springer-Verlag, Berlin Heidelberg, 2003, pp.383-
394.
[4] J .F. Tilki and A.A. Beex, Encoding a Hidden Digital Sig-
nature onto an Audio Signal Using Psychoacoustic Mask-
ing, in Proc. 7
th
International Conference on Signal Proc-
essing Applications & Technology, Boston MA, October
1996, pp. 476-480.
[5] M. D. Swanson, B. Zhu, A. H. Tewfik, L. Boney, Robust
audio watermarking using perceptual masking, Signal
Processing, vol.66, 1998, pp. 337-355.
[6] S. Oraintara, Y. Chen, T. Nguyen, Integer Fast Fourier
Transform, IEEE Trans. Signal Processing.
[7] Sos S. Agaian and Juan P. Perez, New Pixel Sorting
Method for Palette Steganography and Steganographic
Capacity Measure, GSteg Pacific Rim Workshop on
Digital Steganography, November 17-18, 2004, pp. 37-35,
ACROS Fukuoka 1-1 Tenjin 1-chome, Chuo-ku, Fukuoka,
810-0001 J apan
[8] Bassia, P., Pitas, I., and Nikolaidis, N, Robust Audio Wa-
termarking in the Time Domain, IEEE Trans. Multime-
dia, vol. 3 (2001) 232241
[9] Tilki, J.F., Encoding a Hidden Digital Signature Using
Psychoacoustic Masking, Thesis submitted to the Faculty
of the Bradley Department of Electrical and Computer
Engineering,Virginia Polytechnic Institute and State Uni-
versity, J une 9, 1998.
[10] Tian, J un, High Capacity Reversible Data Embedding
and Content Authentication, IEEE Conference on Acous-
tics, Speech and Signal Processing, vol. 3, pp. 517-520,
2003.