You are on page 1of 5

Mobile Application Security for Video Streaming

Authentication and Data Integrity Combining


Digital Signature and Watermarking Techniques
Stefano Chessa , Roberto Di Pietro , Erina Ferro , Gaetano Giunta , Gabriele Oligeri
ISTI-CNR,
Area della Ricerca di Pisa, Via G.Moruzzi 1, 56124 Pisa, Italy
Dipartimento
di Informatica, Universita di Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy
Dipartimento di Matematica, Universita di Roma Tre, L.go S. Murialdo 1, 00146 Rome, Italy.
Dipartimento di Elettronica Applicata, Universita di Roma Tre, Via della Vasca Navale 84, 00146 Rome, Italy.

Abstract Satellite link presents peculiar characteristics like One-way functions and cryptographic hash functions are com-
no packet reordering and low bit error rate. In this paper puted by cryptographic primitives without key. For example,
we leverage these characteristics combined with watermarking MD5 [4] and SHA-1 [5] are customized hash functions in
techniques to propose a novel authentication algorithm for
multicast video streaming. This algorithm combines a single cryptographic process.
digital signature with a hash chain pre-computed on the trans- Another method for securely identifying images is water-
mitter side; the hash chain is embedded in the video stream marking. In order to be efficient in images [1], the digital sig-
by means of a watermarking technique. Our proposal shows nature should be different if and only if the image contents are
several interesting features: Authentication is enforced, as well different. Digital watermarking [6] is a multidisciplinary field
as integrity of the received multicast stream; received blocks can
be authenticated on the fly; no storage is required on the receiver that combines media and signal processing with cryptography,
side, except for the amount of memory needed to store a single communication theory, coding theory, signal compression,
hash; overhead computations required on the receiver sum up theory of human perception, and quality of service require-
to single hash per block, while a digital signature verification is ments [7]. Techniques based on watermarking are largely
amortized over the whole received stream. Finally, note that the used for copyright protection and fast search of images in
bandwidth overhead introduced is negligible, since the applied
watermarking technique introduces virtually no modifications (at databases [6]. In authentication or copyright watermark, the
least, not recognizable by humans) on the original video stream embedded signature watermarks the data with the owners or
pictures. producers identification. This is the enabling technology to
prove ownership on copyrighted material, either to monitor
I. I NTRODUCTION the use of the copyrighted multimedia data, or to analyse
In a secure video-streaming system, sender authentication where the data is in use over networks and servers. This
must be performed to prevent unauthorised use of the shared technology builds on the idea of hiding meta-information, like
media [1]. Conventional methods for securely identifying an owners signature, in physical material in which content
images use hash functions. Digital signature standard, used in is represented, such as the pixels of an image. From the
cryptosystem to dispute authentication documents, also involve digital form, authorised recipients can extract the embedded
the use of hash functions. A digital signature is a bit stream information when necessary to show proof of ownership. In
dependent on a private key and the content of the document to general, we have a watermarking embedding function that
be signed. For each document, the digital signature algorithm inserts a watermark h into media data B and generates the
provides a unique output bit stream [2]. watermarked data B w . A detector can check for the existence
of the watermark and extracts it (h) from data B w . The
Authentication schemes assert that an adversary cannot
embedding and detector functions are usually public but they
compute the same signature with two different messages. In
are initialised by keys to achieve security. Referring to the
cryptosystems, this type of process is largely used to ensure
existing literature, to the best of our knowledge, no published
data integrity, data origin authentication and non repudiation.
papers deal with proposals of a combined procedure involving
The digital signature is based on a hash function (or a one-
digital watermarking together with public/private key security
way hash function -OWH-) and an encryption algorithm [3].
mechanism.
This work has been supported by the CNR/MIUR program Legge 449/97 Digital watermarking techniques should be robust and frag-
(project IS-Manet) and by the European Commission within the 6th Research ile at once [6]. Copyright, fingerprint, and copy-control water-
Framework Programme in the framework of the European Satellite Commu- marks should have a high robustness, consisting of resistance
nications Network of Excellence (SatNEx II, IST-27393) for the FT4 Video
Streaming Authentication in satellite communications (ViStA) of the JA 2240 to blind, non-targeted modifications, or common media oper-
Security. ation. For manipulation recognition, the integrity watermark

1550-2252/$25.00 2007 IEEE 634


1111
0000
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
000
111
111
000
B 1 B 2
B
000
111
000
111 3
000
111
000
111
000
111 RSA(H(B 1)) H(B 2) H(B 3)
000
111

Fig. 2. Hash chain constructed on three blocks.

work. It is worth noticing that using watermarking technique


to embed authentication information within the video stream
avoids to incur in any bandwidth overhead. Indeed, the authen-
tication information introduce just transparent modifications to
the video pictures (that is, modifications that will not affect
the perceived quality of the video).
Fig. 1. Multicast satellite scenario.
Each receiver station decompresses the video stream and
then authenticates it. This goal is obtained combining a a
digital signature like RSA [11] with a hash chain. Figure
must be fragile (it disappears if someone has altered the file)
2 synthesizes the authentication schema, where Bx is the
to detect altered media. Secure watermarking implies that
compressed version of the block Bx (GOP).
embedded information cannot be removed by targeted attacks
This approach amortises also the computational overhead
without these modifications being detected. Moreover, water-
due to the single asymmetric signature over the whole video
mark must be transparent according to the properties of the
stream, trading off the overhead related to digital signature,
human sensory system, in order that a transparent watermark
with the overhead required by one way hash functions, like
causes no artifacts or quality loss. Furthermore, either a private
SHA-1 [12], characterised by a low computational overhead.
verification (by a private-key function) or a public verification
(by a public-key algorithm) of the embedded watermark has to III. T HE AUTHENTICATION PROCEDURE
be implemented, depending on the particular application. All In this section we present a novel authentication procedure
these characteristics should operate at a reasonable level of for multicast video streaming. We propose a new authentica-
complexity: The introduced overhead required to embed and tion schema based on single digital signature like RSA [11],
retrieve a watermark should be low, especially for real-time authenticating a one-time signature chain. The hash chain,
applications such as video-streaming. (that could be built over SHA-1 [12]) is hidden in the video
stream pictures, by means of the watermarking technique
II. S YSTEM MODEL
described in Section IV.
Multicast satellite networks present peculiar characteristics An MPEG2 video stream can be considered as a GOP
like high round trip time, no packet reordering and low (Group Of Pictures) sequence, like in Figure 3. A GOP
bit error rate (BER). For instance, geostationary satellites is composed of three different types of pictures: I-picture
placed in equatorial orbit at 36,000 km altitude, introduce a (Intra), P-picture (Predictive), and B-picture (Bidirectionally
transmission delay between transmitter and receiver of about predictive). Inside a GOP there is only one I-picture and one
250 ms, while satellite link presents a low BER of about 108 or more B and P-pictures. Compression is computed over two
in the majority of cases [8]. This low BER is due to design different levels: The spatial and the temporal ones [9].
requirements and structural constraints. Leveraging the low Spatial compression is performed only over the I-picture,
BER and the no packet reordering feature, we propose a novel while temporal compression is carried out over the B and
schema for authenticating multicast video streaming that does the P-pictures. These pictures are obtained via difference
not affect available bandwidth. Our scenario is in Figure 1: from the I-picture. A GOP, hereafter Gi , represents in the
The center has to distribute an uncompressed video stream MPEG2 domain a well known picture sequence (block) in
through a multicast satellite network. Both transmitter and the uncompressed domain.
receiver can perform preprocessing elaborations, like MPEG2 Let H() and Signature() be the functions that imple-
coding/decoding, before the transmission and after reception ment a SHA-1 hash and an RSA signature respectively. The
respectively. The use of MPEG2 [9] algorithm is justified if function M peg2() computes the GOP associated with the
we assume our satellite network using DVB-S [10]. specific input block using the MPEG2 algorithm. Finally,
The sender preprocesses the uncompressed video stream W (x, y) inserts the data y into a block of x pictures according
inserting in it the authentication information. This is achieved to the watermarking algorithm described in Section IV.
with watermarking technique. The next step for the sender is Table I describes in details how the sender builds up the hash
first to perform MPEG2 compression and then to transmit the chain. Transmitter is provided with an uncompressed video
signal to the various receiver stations through the satellite net- stream S and interprets it as sequence of blocks of pictures

635
S side. Hence, once the hash over the first GOP is computed:
h = H(G1 ), this value is employed to verify the digital
signature (Sig) with V erSignature(Sig, h).
G1 G2 G3 G4 Gn1 Gn If the test succeeds, the receiver can subsequently authen-
ticate the video stream checking for the authenticity of each
block of the hash chain. The hash value hi inserted by the
sender is extracted from the block Bu through the watermark
retrieval function E(Bu ). Bu is obtained repeatedly from
each GOP belonging to the video stream through Bu =
IBBPB... U nM peg2(Gi ). The value hi is then compared with the
hash calculated by the receiver over the GOP Gi+1 , that is
H(Gi+1 ). The execution proceeds iteratively only if the test
Fig. 3. Video stream as a picture sequence.
is successful for all of the blocks. If a picture is corrupted (for
instance, forged or changed during the channel transmission)
its hash changes, and the authentication test fails. Note that
[B1 , . . . , Bn ], so that each block will be codified afterwards as once a failure is triggered, all the subsequent pictures cannot
a GOP by the MPEG2 algorithm. The last block of the video be authenticated.
sequence is compressed: Gc = M peg2(Bn ). The construction Table II shows in details the algorithm run by the receiver.
of the hash chain starts from the block of index n1. The hash
TABLE II
value hi is calculated over the GOP Gc through the function
Verification of the authenticity of sequence Sig, Sc
hi = H(Gc ). This value (hi ) is inserted into the block
Bi by means of the watermarking function W (Bi , hi ). The Receive (Sig, Sc );
[G1 , . . . , Gn ] = ExtGOP (Sc ) ;
watermarked block is now encoded with Gc = M peg2(Biw ), h = H(G1 );
obtaining the GOP Gc that will be the argument for the next if not ( V erSignature(Sig, h) )
hash calculation. The procedure is iterated over all the blocks {
up to the first one (B1 ). Finally, the first GOP (Gc ) is authen- authentication failed;
ticated through a digital signature with Signature(H(Gc )). exit;
}
Note that this is the only asymmetric operation performed for i = 1 to n
by the sender. The marked block sequence is now codified {
with MPEG2 by the function M peg2([B1w , . . . , Bnw ]). Note Bu = U nM peg2(Gi ) ;
that each block belonging to S is codified into a GOP in Sc . hi = E(Bu );
Hence, each block in the uncompressed domain corresponds if hi = H(Gi+1 );
{
to a GOP in the compressed domain. authentication failed;
exit;
TABLE I
}
Construction of the single hash chain }
let S = [B1 , . . . , Bn ] be the uncompressed video
stream block sequence;
Gc = M peg2(Bn ) ; Without losing of generality, we assumed that the sequence
for i = n 1 to 1 (Sig, Sc ) is received correctly. While this assumption is ac-
{ ceptable due to the low BER of satellite link, it should be noted
hi = H(Gc );
that if message corruption should become an issue, it could be
Biw = W (Bi , hi );
Gc = M peg2(Biw ) ; tackled considering redundant hash chain or using FEC [13].
} For instance, robustness could be obtained embedding into the
Sig = Signature(H(Gc )); same block more than one hash calculated over several GOP.
Sc = M peg2([B1w , . . . , Bnw ]) ;
Transmit (Sig, Sc ); IV. T HE EMBEDDING PROCEDURE
A. Embedding information as watermark
On the receiver side, the authentication procedure work In this section we describe the technique used to embed the
as follows: Once the receiver has stored (Sig, Sc ), it de- hash chain into the video stream. In particular, we describe
codes the MPEG2 stream and extracts a GOP sequence from the functions W () and E() used in the previous section.
the compressed video stream Sc via ExtGOP (), that is: Note that our use of watermarking techniques is not stan-
[G1 , . . . , Gn ] = ExtGOP (Sc ). dard; classical watermarking is used to embed a logo (gen-
Now, the receiver has to authenticate the anchor of the erally a picture) within a video stream. This embedding is
authentication chain. Note that this operation is the only oper- hidden in the video and it can be extracted to witness the
ation based on asymmetric key cryptography on the receiver ownership of the video. It is acceptable that the extraction

636
TABLE III
could be lossy, that is, the logo can be extracted with errors,
Function W () embedding a hash value into video stream pictures
provided it remains recognisable. Instead, in our case, we
use watermarking to embed the result of a hash computation. let B be a video block belonging to the stream S;
let h be the hash value to be embedded in B;
The extraction of the hash must not be affected by errors, let p1 , . . . , pm be the pictures belonging to B;
otherwise the hash chain would break and the receiver could let be an amplification factor;
not authenticate the remaining of the video stream. For this let PRSG() be a pseudo random sequence generator algorithm;
reason, in this section we propose an embedding technique H Rep = [h, h, . . . , h] // concatenation of the
that provides error-free hash extraction. // h replications
An uncompressed video stream can be considered as a hRepM  = Reshape to asquare matrix hRep
0 0 // Place data into high
three dimensional signal, where its components are a bi- hW = RepM // frequency DCT domain
dimensional matrix (the picture) and the time. Each picture 0 h
is formed by a w h matrix of pixels; each pixel expresses for i = 1 to k
{
a colour that can be represented by three values: R (red), G ni = PRSG()
(green), B (blue) or, equivalently, by Y(luminance) and U, V Pi = DCT{pi (Y)}
(chrominance) components. In our algorithm, we consider the Pih = PiHF + |PiHF | ni hW
Y-U-V representation, and we embed a mark (hash) on each pi (Y ) = DCT1 {Pih (Y)}
picture of the video stream on its luminance component (Y). }
We use a Fourier domain method based on the DCT (dis-
crete cosine transform). This method assures robustness with TABLE IV
regard to various types of compression methods like MPEG2 Function E() retrieving the hash value from a block of video pictures
or MPEG4 [14], [15]. This procedure works modifying the
let B be a video block belonging to the stream S;
image frequency domain coefficients; it thus has a minimal let p1 , . . . , pm be the pictures belonging to B;
impact in the spatial domain of the picture. let PRSG() a pseudo random sequence generator algorithm;
Let us consider B as a block of pictures and h the hash value let SIGN {} is the mathematical signum function
that should be embedded in each picture belonging to the block x=0
B. Note that, due to specific features of the mark extraction for i = 1 to m
{
(discussed in Section IV-B), the mark must be represented as ni = PRSG()
a sequence of {1, 1}. For this reason h (which is a sequence Pi = DCT{pi (Y)} ni
of {0, 1}) is first simply converted into a sequence of {1, 1}. PiHF = Extract high frequency from Pi
Then, for each picture pi (with 1 i m) in B, the mark h PiHF V = Reshape PiHF to a vector
HF V
is embedded in pi using the following equation:  Pi
// Consider as [h, . . . , hj , . . . , h]
hi = hj
x = x + hi
Pi = DCT{pi (Y)} (1) }
h = SIGN{x}

PiH = PiHF + |PiHF | ni hW (2)

B. Retrieval of information
Equation 1 computes the DCT transformation over the
Since the mark has been embedded in each picture of the
luminance (Y) component pi (Y ), obtaining Pi . Equation 2
block, the receiver extracts the mark from each picture. This
embeds hW , a repeated and reshaped version of vector h, in
is necessary because the extraction from a single picture is,
the high frequency of Pi , denoted PiHF . To this goal, we
in general, affected by errors. Thus, the mark is obtained
define the parameters , an amplification factor, and ni , a
combining the information extracted from all the pictures in
pseudo-noise sequence which is changed for each picture. In
the block.
particular, ni is a matrix of binary ({1, 1}) pseudo noise
Given a picture pi in an incoming block, the receiver
sequences.
performs a DCT transform over the luminance component (Y)
The spread sequence ni h is multiplied by an amplification
of the picture denoted Pi . Then, it extracts the mark from the
factor (), replicated according to a replication factor, and
high frequencies of Pi (which are the frequencies where the
finally reshaped to a matrix of the same dimensions of the
mark was embedded). This is achieved using the following
picture.
equations:
The mark is scaled by the luminance frequency coefficients,
so that it is spread proportionally to the DCT coefficients.
Finally, the resulting sequence is summed up to the highest
PiHF ni = PiHF ni + |PiHF | ni hW ni
frequencies coefficients PiHF of the DCT transformed picture
Pi . The pseudo code of the embedding algorithm is shown in = PiHF ni + |PiHF | hW
Table III.

637
Let hi (j) be the j th replica of the hash inside picture i and a pre-computed hash chain. In particular, we embed the hash
Red be the number of hash replications in each picture. The chain inside the video by means of a watermarking technique.
receiver combines the marks extracted from each picture using This solution provides several benefits; on the sender side:
the following equation: The computation overhead sums up to a digital signature for
the whole stream and a hash per block; as for bandwidth,
 
i=k j=Red
security related information does not consume this resource,
{Pi (j) ni (j) + |Pi (j)| hi (j)} (3)
since these data are inserted in the video stream pictures
i=1 j=1
through negligible modifications (at least, not recognizable by
where the inner sum accumulates mark replicated within a humans). On the receiver side: Authentication is performed on
given picture, and the outer sum accumulates all the marks a per block basis; no extra storage is required on the receiver,
extracted from all the pictures in the block. The choice of the except for the storing of a hash value; the computational
number of pictures to be embedded in a block is essential: overhead on the receiver side is limited to one digital signature
If the number of pictures is too small, the introduced mark verification, that is amortized over the whole multicast stream,
redundancy could be insufficient and the extraction fail. and one extra hash computation per block.
Block dimensioning is an essential operation that must be Future work strives to extend the proposed scheme provid-
done by the sender in the preprocessing phase of the video ing resilience to increasingly lossy channel.
streaming.
Since the first term in Equation 3 (Pi (j) ni (j)) expresses R EFERENCES
uncorrelated signals, their sum tends to 0 thus, if the number [1] F. Lefebvre, J. Czyz, and B. M. Macq, A robust soft hash algorithm
for digital image signature. in ICIP (2), 2003, pp. 495498.
of pictures in a block is sufficiently high, Equation 3 becomes: [2] O. Goldreich, Two remarks concerning the goldwasser-micali-rivest
signature scheme, in CRYPTO, 1986, pp. 104110.
 
i=k j=Red
[3] Digital signature standard (DSS). Washington: National Institute
{|Pi (j)| hi (j)}. (4) of Standards and Technology, 2000, federal Information Processing
i=1 j=1 Standard 186-2.
[4] R. Rivest, The MD5 message-digest algorithm, RFC 1321, 1992.
Thus, mark extraction is performed over Equation 4 using [5] N. B. of Standards, Digital signature standard, National Bureau of
the signum function as: Standards, Tech. Rep. FIPS Publication 186, 1994.
[6] J. Dittmann and F. Nack, Copyright-copywrong, IEEE MultiMedia,
vol. 07, no. 4, pp. 1417, 2000.
 
i=k j=Red  [7] F. Benedetto, G. Giunta, and A. Neri, End-to-end quality of service
sign {|Pi (j)| hi (j)} assessment in mobile applications for user-centric multimedia networks
by tracing watermarking, in IEEE 12th Symp. on Communication and
i=1 j=1 Vehicular Tecnology, Twente, Netherland, 2005.
 
i=k j=Red  [8] A. Annese, P. Barsocchi, N. Celandroni, and E. Ferro, Performance
sign {|Pi (j)| } h evaluation of udp multimedia traffic flows in satellite-wlan integrated
paths, in Proceedings of the 11th Ka and Broadband Communications
i=1 j=1 Conference, Sept. 2005, pp. 267275.
 
i=k j=Red  [9] D. L. Gall, Mpeg: a video compression standard for multimedia
but sign {|Pi (j)| } 1. applications, Commun. ACM, vol. 34, no. 4, pp. 4658, 1991.
[10] R. de Bruin and J. Smits, Digital video broadcasting: technology,
i=1 j=1 standards, and regulations. Norwood, MA, USA: Artech House, Inc.,
The pseudo code of the extraction algorithm is shown in 1999.
[11] R. L. Rivest, A. Shamir, and L. M. Adelman, A method for ob-
Table IV. taining digital signatures and public-key cryptosystems, Tech. Rep.
MIT/LCS/TM-82, 1977.
V. C ONCLUSIONS [12] D. Eastlake 3rd and P. Jones, US Secure Hash Algorithm 1 (SHA1),
We have presented a novel scheme for authenticating video RFC 3174, Sept. 2001.
[13] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, and
streaming in a multicast satellite network. The proposed J. Crowcroft, Forward error correction (fec) building block, , United
scheme leverages peculiar features of satellite link, like low States, 2002.
BER and no packet reordering. In particular, this work shows [14] M. Barni, F. Bartolini, and N. Checcacci, Watermarking of mpeg-4
video objects. IEEE Transactions on Multimedia, vol. 7, no. 1, pp.
how it it possible to use watermarking to embed security re- 2332, 2005.
lated information. The security properties enforced are authen- [15] S. Biswas, S. R. Das, and E. M.Petriu, An adaptive compressed mpeg-
tication and integrity of the multicast stream. These features 2 video watermarking scheme, IEEE Transactions on Instrumentation
and Measuremnent, vol. 54, no. 5, pp. 18531861, 2005.
are achieved using just one digital signature, combined with

638

You might also like