Imaging For Forensics PDF

Springer Series on
S IGNALS AND C OMMUNICATION T ECHNOLOGY

Imaging for Forensics and Security: Human Factors and Voice Interactive Systems,
From Theory to Practice Second Edition
A. Bouridane D. Gardner-Bonneau and H. Blanchard
ISBN 978-0-387-09531-8 ISBN 978-0-387-25482-1
Multimedia Content Analysis: Theory Wireless Communications: 2007 CNIT
and Applications Thyrrenian Symposium
A. Divakaran (Ed.) S. Pupolin
ISBN 978-0-387-76567-9 ISBN 978-0-387-73824-6
Grid Enabled Remote Instrumentation Adaptive Nonlinear System Identification:
F. Davoli, N. Meyer, R. Pugliese, S. Zappatore The Volterra and Wiener Model Approaches
ISBN 978-0-387-09662-9 T. Ogunfunmi
ISBN 978-0-387-26328-1
Usability of Speech Dialog Systems
T. Hempel Wireless Network Security
ISBN 978-3-540-78342-8 Y. Xiao, X. Shen, and D.Z. Du (Eds.)
ISBN 978-0-387-28040-0
Handover in DVB-H
X. Yang Satellite Communications and Navigation
ISBN 978-3-540-78629-0 Systems
E. Del Re and M. Ruggieri
Multimodal User Interfaces ISBN: 0-387-47522-2
D. Tzovaras (Ed.)
ISBN 978-3-540-78344-2 Wireless Ad Hoc and Sensor Networks
A Cross-Layer Design Perspective
Wireless Sensor Networks and Applications R. Jurdak
Y. Li, M.T. Thai, W. Wu (Eds.) ISBN 0-387-39022-7
ISBN 978-0-387-49591-0
Cryptographic Algorithms on Reconfigurable
Passive Eye Monitoring Hardware
R.I. Hammoud (Ed.) F. Rodriguez-Henriquez, N.A. Saqib, A. Dı́az
ISBN 978-3-540-75411-4 Pérez, and C.K. Koc
Digital Signal Processing ISBN 0-387-33956-6
S. Engelberg Multimedia Database Retrieval
ISBN 978-1-84800-118-3 A Human-Centered Approach
Digital Video and Audio Broadcasting P. Muneesawang and L. Guan
Technology ISBN 0-387-25627-X
W. Fischer Broadband Fixed Wireless Access
ISBN 978-3-540-76357-4 A System Perspective
Satellite Communications and Navigation M. Engels and F. Petre
Systems ISBN 0-387-33956-6
E. Del Re, M. Ruggieri (Eds.) Distributed Cooperative Laboratories
ISBN 978-0-387-47522-6 Networking, Instrumentation,
Three-Dimensional Television and Measurements
H.M. Ozaktas, L. Onural (Eds.) F. Davoli, S. Palazzo
ISBN 978-3-540-72531-2 and S. Zappatore (Eds.)
ISBN 0-387-29811-8
Foundations and Applications of Sensor
Management The Variational Bayes Method in Signal
A.O. Hero III, D. Castañón, D. Cochran, Processing
and K. Kastella (Eds.) V. Šmı́dl and A. Quinn
ISBN 978-0-387-27892-6 ISBN 3-540-28819-8
(continued after index)

Ahmed Bouridane
Imaging for Forensics

and Security
From Theory to Practice
123
Ahmed Bouridane
Queen’s University, Belfast
Department of Computer Science
Faculty Engineering
Belfast
United Kingdom BT7 1NN
a.bouridane@qub.ac.uk
ISSN 1860-4862
ISBN 978-0-387-09531-8 e-ISBN 978-0-387-09532-5
DOI 10.1007/978-0-387-09532-5
Springer Dordrecht Heidelberg London New York
Library of Congress Control Number: 2009927770
c Springer Science+Business Media, LLC 2009

All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of
going to press, neither the authors nor the editors nor the publisher can accept any legal responsi-
bility for any errors or omissions that may be made. The publisher makes no warranty, express or
implied, with respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

To my Wife Saida, Daughters Asma and
Alaa, and Son Abbudi
For their love and support
Preface
The field of security has witnessed an explosive growth during the last years, as
phenomenal advances in both research and applications have been made. Biometric
and forensic imaging applications often involve photographs, videos and other im-
age impressions that are fragile and include subtle details that are difficult to see. As
a developer, one needs to be able to quickly develop sophisticated imaging applica-
tions that allow for an accurate extraction of precious information from image data
for identification and recognition purposes. This is true for any type of biometric
and forensic image data.
The applications covered in this book relate to Biometrics, Watermarking and
Shoeprint recognition for forensic science. Image processing transforms using Dis-
crete Fourier Transform, Discrete Wavelet Transforms Gabor Wavelets, Complex
Wavelets, Scale Invariant Feature Transforms and Directional Filter banks are used
in data modelling process for either feature extraction or data hiding tasks. The
emphasis is on the methods and the analysis of data sets including comparative
studies against existing and similar techniques. To make the underlying methods
accessible to a wider audience, we have stated some of the key mathematical results
given in a logical structure of the development.
For example, biometric based methods are emerging as the most reliable solu-
tions for authentication and identification applications where traditional passwords
(knowledge-based security) and ID cards (token-based security) have been used so
far to access restricted systems. Automated biometrics deal with physiological or
behavioural characteristics such as fingerprints, iris, voice and face that can be used
to authenticate a person’s identity or establish an identity within a database. With
rapid progress in electronic and Internet commerce, there is also a growing need
to authenticate the identity of a person for secure transaction processing. Current
biometric systems make use of fingerprints, hand geometry, iris, retina, face, facial
thermograms, signature gait, and voiceprint to establish a person’s identity. While
biometric systems have their limitations they have an edge over traditional security
methods in that they cannot be easily stolen or shared. Besides bolstering security,
biometric systems also enhance user convenience by alleviating the need to design
and remember passwords.
Driven by the urgent need to protect digital media content that is being widely
and wildly distributed and shared through the Internet by an ever-increasing number
vii
viii Preface
of users, the field of digital watermarking has witnessed an extremely fast-growing

development since its inception almost a decade ago. The main purpose of dig-
ital watermarking, information embedding, and data hiding systems is to embed
auxiliary information, usually called digital watermarks, inside a host signal (audio,
image, and video) by introducing small and minor perturbations into the host signal.
The quality of the host signal should not be degraded unacceptably and the intro-
duced changes lie below the minimum perception threshold of the intended recipi-
ent. Watermark detection and extraction from the composite host signal should be
possible in the presence of a variety of intentional and unintentional manipulations
and attacks. It is obvious that these attacks and manipulations do not corrupt the
composite host signal at an unacceptable level.
Watermarking systems are expected to play an important role in meeting at least
two major challenges that resulted from the widespread use of Internet for the
distribution and exchange of digital media: (i) error-free perfect copies of digital
multimedia and (ii) availability of free and affordable tools for the manipulation
and alteration of digital content. The first challenge was the driving force that led
the combined efforts of academic and industrial research to produce first-generation
watermarking algorithms. These algorithms were mainly concerned with the “copy-
right protection” of the digital content. For instance, illegal distribution and copying
of digital music is causing the music industry massive gain losses. The second chal-
lenge has guided the research efforts to develop what are so-called “tamper-proof”
or “fragile” watermarking algorithms. This class of watermarking schemes aims at
detecting any “intentional” manipulation or corruption of the media.
Following the emergence and success of forensic science as a powerful and
irrefutable tool for solving many enigmatic crime puzzles, images collected from
crime scenes are abounding and, therefore, large image collections are being cre-
ated. Shoeprint images are no exception and it has been indicated, recently, that
shoeprint evidence, at crime scenes, is more frequently present than fingerprints.
Very recently, it has been suggested that shoeprint evidence should be made com-
parable to that of fingerprint and DNA evidence. It is also true that shoeprint in-
telligence remains an untapped potential forensic source (usually overshadowed by
the accepted fingerprint and DNA evidence). However, there is no practical tech-
nology to efficiently search shoeprint on large databases. Existing commercial sys-
tems still require manual involvement (manual annotation of both the impression
under investigation and the primary database). The task of automated scenemark
matching is a tedious one and researching the use of existing image processing
and pattern recognition techniques is desired before an underpinning technology
is developed.
One of the most distinctive features of the book is that it covers in detail a number
of imaging applications and their deployment in security problems. In addition, the
book appeals to both undergraduate and postgraduate students since each applica-
tion problem includes a detailed description of the mathematical background and its
implementation.
Most of the material of the book is derived from very recent research output gen-
erated by various researchers at doctoral level under the supervision of the author.
Preface ix
This brings some novelty of the topics through a thorough analysis of the results
of the implementation. My indebtedness to those students, in particular W R Bouk-
abou, M Gueham, M Laadjel, M Nabti, O Nibouche, I Thompson, H Su, K Zebbiche
and A Baig of the Speech, Image and Vision Systems (SIVS) group at the School
of Electronics, Electrical Engineering and Computer Science, Queen’s University
Belfast.
The book is organised as follows. Chapter 1 starts by defining the biometric tech-
nology including the characteristics required for a viable deployment using various
operation modes such as verification, identification and watch-list. A number of
currently used biometric modalities are also described with some emphasis of few
emerging ones. Then the various steps of a typical biometric recognition system
are discussed in detail. For example, data acquisition, image localisation, feature
extraction and matching are all defined and the current methods employed for their
implementation and deployment discussed and contrasted. The chapter concludes
by briefly highlighting the need to use appropriate datasets for the evaluation of a
biometric system.
Chapter 2 introduces the notion of data representation in the context of biomet-
rics. The various stages of a typical biometric system are also enumerated and dis-
cussed and the most commonly deployed biometric modalities are stated. The chap-
ter also examines various aspects related to image data representation and modelling
for feature extraction and matching. Various methods are then briefly discussed and
brought within the context of a biometric system. For example, image data formats,
feature sets and system testing and performance evaluation metrics are detailed.
In Chapter 3 recent advances in enhancing the performance of face recognition
using the concept of directional filter banks is discussed. In this context, the di-
rectional filter banks are investigated as pre-processing phase in order to improve
the recognition rates of a number of different and existing algorithms. The chapter
starts by reviewing the basic face recognition principles and enumerates the various
steps of a face recognition system. Four algorithms representing both Component
and Discriminant Analysis approaches, namely: PCA, ICA (FastICA), LDA and
SDA are chosen for their proven popularity and efficiency to demonstrate the use-
fulness of the directional filter bank method. The mathematical models behind these
approaches are also detailed. Then the proposed directional filter bank method is
described and its implementation discussed. The results and their analysis are finally
assessed using two well known face databases.
Chapter 4 is concerned with recent advances in iris recognition using a mutiscale
approach. State of the art works in the area is first highlighted and discussed and a
detailed review of the various steps of an automatic iris recognition system enumer-
ated. Proposed developments are then detailed for both iris localisation and classifi-
cation using an integrated multiscale wavelet approach. Extensive experimentation
is carried out and a comparative analysis with some state of the art approaches
given. The chapter concludes by giving some future directions to further enhance
the results obtained.
In chapter 5, the use of complex wavelets for image and video watermarking is
described. The theory of complex wavelets and their features are first highlighted.
x Preface
The concept of spread transform watermarking is then given in detail and its combi-
nation with the complex wavelet transforms detailed. Information theoretic capacity
analysis for watermarking with complex wavelets is then elucidated. The chapter
concludes with some experiments and their analysis to demonstrate the improved
levels of capacity that can be achieved through the superior feature representation
offered by complex wavelet transforms.
Chapter 6 discusses the problem of one-bit watermark detection for protect-
ing fingerprint images. Such a problem is theoretically formulated based on the
maximum-likelihood scheme, which requires an accurate modeling of the host data.
The watermarking is applied into the Discrete Wavelet Transform (DWT) due to the
vavious advantages provided by this transform. First, a statistical study of DWT co-
efficients is carried out by investigating and comparing three distributions, namely,
the generalized Gaussian, Laplacian and Cauchy models. Then, the performances
of the detectors based on these models are assessed and evaluated through extensive
experiments. The results show that the generalized Gaussian is the best model and
its corresponding detector yields the best detection performance.
Chapter 7 is intended to introduce the emerging shoemark evidence for forensic
use. It starts by giving a detailed background of the contribution of shoemark data to
scene of crime officers including a discussion of the methods currently in use to col-
lect shoeprint data. Methods for the collection of shoemarks will also be detailed and
problems associated with each method highlighted. In addition, the chapter gives a
detailed review of existing shoemark classification systems.
In Chapter 8, methods for automatically classifying shoeprints for use in forensic
science are presented. In particular, we propose two correlation based approaches
to classify low quality shoeprints: i) Phase-Only Correlation (POC) which can be
considered as a matched filter, and ii) Advanced Correlation Filters (ACFs). These
techniques offer two primary advantages: the ability to match low quality shoeprints
and translation invariance. Experiments were conducted on a database of images of
100 different shoes available on the market. For the experimental evaluation, chal-
lenging test images including partial shoeprints with different distortions (such as
noise addition, blurring and in-plane rotation) were generated. Results have shown
that the proposed correlation based methods are very practical and provide high
performance when processing low quality partial-prints.
Chapter 9 is concerned with the retrieval of scene-of-crime (or scene) shoeprint
images from a reference database of shoeprint images by using a new local feature
detector and an improved local feature descriptor. Similar to most other local feature
representations, the proposed approach can also be divided into two stages: (i) a set
of distinctive local features is selected by first detecting scale adaptive Harris corners
where each corner is associated with a scale factor. This allows for the selection of
the final features whose scale matches the scale of blob-like structures around them
and (ii) for each feature, an improved Scale Invariant Feature Transform (SIFT)
descriptor is computed to represent it. Our investigation has led to the development
of two novel methods which are referred to as the Modified Harris-Laplace (MHL)
detector and the Modified SIFT descriptor, respectively.
Preface xi
Contributions:
Chapter 2: “Data Representation and Analysis”

A. Baig and A. Bouridane
Chapter 3: “Improving Face Recognition Using Directional Faces”

W. R. Boukabou and A. Bouridane
Chapter 4: “Recent Advances in Iris Recognition: A Multiscale Approach”

M. Nabti and A. Bouridane
Chapter 5: “Spread Transform Watermarking Using Complex Wavelets”

I. Thompson and A. Bouridane
Chapter 6: “Protection of Fingerprint Data Using Watermarking”

K. Zebbiche and A. Bouridane
Chapter 7: “Shoemark Recognition for Forensic Science: An Emerging

Technology”
H. Su and A. Bouridane
Chapter 8: “Techniques for Automatic Shoeprint Classification”

M. Gueham and A. Bouridane
Chapter 9: “Automatic Shoeprint Image Retrieval Using Local Features”

H. Su and A. Bouridane
Belfast, United Kingdom, 2008 Ahmed Bouridane

Contents
1 Introduction and Preliminaries on Biometrics

and Forensics Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition of Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Biometric Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Biometric Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Recognition/Verification/Watch-List . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Verification: Am I Who I Claim to Be? . . . . . . . . . . . . . . . . . . 5
1.3.2 Recognition: Who Am I? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 The Watch-List: Are You Looking for Me? . . . . . . . . . . . . . . . 6
1.4 Steps of a Typical Biometric Recognition Application . . . . . . . . . . . . . 6
1.4.1 Biometric Data Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Normalisation and Pre-processing . . . . . . . . . . . . . . . . . . . . . . 7
1.4.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.5 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Data Representation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Sensor Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Improving Face Recognition Using Directional Faces . . . . . . . . . . . . . . . . 21

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
xiii
xiv Contents
3.2 Face Recognition Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Recognition/Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Steps of a Typical Face Recognition Application . . . . . . . . . . 23
3.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . 26
3.3.2 Independent Component Analysis (ICA) . . . . . . . . . . . . . . . . . 27
3.3.3 Linear Discriminant Analysis (LDA) . . . . . . . . . . . . . . . . . . . . 28
3.3.4 Subspace Discriminant Analysis (SDA) . . . . . . . . . . . . . . . . . . 29
3.4 Face Recognition Using Filter Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Gabor Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Directional Filter Bank: A Review . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Proposed Method and Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.3 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.4 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.5 SDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.6 FERET Database Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Recent Advances in Iris Recognition: A Multiscale Approach . . . . . . . . 49

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Related Work: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Iris Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.2 Iris Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.3 Existing Methods for Iris Localisation . . . . . . . . . . . . . . . . . . . 53
4.4 Proposed Method for Iris Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.2 The Multiscale Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5 Texture Analysis and Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.1 Wavelet Maxima Components . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5.2 Special Gabor Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.7 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.7.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.7.2 Combined Multiresolution Feature Extraction Techniques . . 72
4.7.3 Template Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.7.4 Comparison with Existing Methods . . . . . . . . . . . . . . . . . . . . . 73
4.8 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Contents xv
5 Spread Transform Watermarking Using Complex Wavelets . . . . . . . . . . 79

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Wavelet Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.1 Dual Tree Complex Wavelet Transform . . . . . . . . . . . . . . . . . . 80
5.2.2 Non-redundant Complex Wavelet Transform . . . . . . . . . . . . . 83
5.3 Visual Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.1 Chou’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3.2 Loo’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3.3 Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4 Watermarking as Communication with Side Information . . . . . . . . . . 94
5.4.1 Quantisation Index Modulation . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4.2 Spread Transform Watermarking . . . . . . . . . . . . . . . . . . . . . . . 97
5.5 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.1 Encoding of Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5.2 Decoding of Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Information Theoretic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6.1 Decoding of Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.6.2 Parallel Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.6.3 Watermarking Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.6.4 Non-iid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.6.5 Fixed Embedding Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6 Protection of Fingerprint Data Using Watermarking . . . . . . . . . . . . . . . . 117

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2 Generic Watermarking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3 State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4 Optimum Watermark Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.5 Statistical Data Modelling and Application to Watermark
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.5.1 Laplacian and Generalised Gaussian Models . . . . . . . . . . . . . 128
6.5.2 Alpha Stable Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.6.1 Experimental Modelling of DWT Coefficients . . . . . . . . . . . . 132
6.6.2 Experimental Watermarking Results . . . . . . . . . . . . . . . . . . . . 135
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7 Shoemark Recognition for Forensic Science: An Emerging

Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.1 Background to the Problem of Shoemark Forensic Evidence . . . . . . . 143
7.1.1 Applications of a Shoemark in Forensic Science . . . . . . . . . . 144
7.1.2 The Need for Automating Shoemark Classification . . . . . . . . 146
7.1.3 Inconsistent Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
xvi Contents
7.1.4 Importable Classification Schema . . . . . . . . . . . . . . . . . . . . . . . 148

7.1.5 Shoemark Processing Time Restrictions . . . . . . . . . . . . . . . . . 149
7.2 Collection of Shoemarks at Crime Scenes . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.1 Shoemark Collection Procedures . . . . . . . . . . . . . . . . . . . . . . . 150
7.2.2 Transfer/Contact Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.2.3 Photography of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2.4 Making Casts of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2.5 Gelatine Lifting of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2.6 Electrostatic Lifting of Shoemarks . . . . . . . . . . . . . . . . . . . . . . 153
7.2.7 Recovery of Shoemarks from Snow . . . . . . . . . . . . . . . . . . . . . 154
7.2.8 Recovery of Shoemarks using Perfect Shoemark Scan . . . . . . 154
7.2.9 Making a Cast of a Shoemark Directly
from a Suspect’s Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.10 Processing of Shoemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.11 Entering Data into a Computerised System . . . . . . . . . . . . . . . 157
7.3 Typical Methods for Shoemark Recognition . . . . . . . . . . . . . . . . . . . . . 157
7.3.1 Feature-Based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3.2 Classification Based on Accidental Characteristics . . . . . . . . . 159
7.4 Review of Shoemark Classfication Systems . . . . . . . . . . . . . . . . . . . . . 160
7.4.1 SHOE-FIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.4.2 SHOE© . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.4.3 Alexandre’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4.4 REBEZO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4.5 TREADMARK TM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.4.6 SICAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.4.7 SmART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.4.8 De Chazal’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.4.9 Zhang’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8 Techniques for Automatic Shoeprint Classification . . . . . . . . . . . . . . . . . . 165

8.1 Current Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.2 Using Phase-Only Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.2.1 The POC Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.2.2 Translation and Brightness Properties
of the POC Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.2.3 The Proposed Phase-Based Method . . . . . . . . . . . . . . . . . . . . . 168
8.2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.3 Deployment of ACFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.3.1 Shoeprint Classification Using ACFs . . . . . . . . . . . . . . . . . . . . 173
8.3.2 Matching Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.3.3 Optimum Trade-Off Synthetic Discriminant
Function Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.3.4 Unconstrained OTSDF Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.3.5 Tests and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Contents xvii
8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9 Automatic Shoeprint Image Retrieval Using Local Features . . . . . . . . . . 181

9.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2 Local Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.2.1 New Local Feature Detector: Modified Harris–Laplace
Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.2.2 Local Feature Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.2.3 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.3.1 Shoeprint Image Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Chapter 1
Introduction and Preliminaries on Biometrics
and Forensics Systems
1.1 Introduction
Biometric-based security has been researched and tested for a few decades, but has
only recently entered into the public consciousness because of high-profile applica-
tions especially since the events of 9/11. Many companies and government depart-
ments are now implementing and deploying biometric technologies to secure areas,
maintain security records, protect borders and maintain law enforcement at borders
and entry points. Biometrics is the science of verifying the identity of an individual
through his/her physiological measurements, e.g. fingerprints, hand geometry, etc.
or behavioural traits, e.g. voice and signature. Since biometric identifiers are asso-
ciated permanently with the user they are more reliable than token- or knowledge-
based authentication methods such as identification card (that can be lost or stolen),
password (that can be forgotten), etc.
Biometric recognition is concerned with methods and tools for the verification
and recognition of a person’s identity by means of unique appearance or behavioural
characteristics. This chapter starts by defining the biometric technology including
the characteristics required for a viable deployment using various operation modes
such as verification, identification and watch-list. A number of currently used bio-
metric modalities are also described with some emphasis on a few emerging ones.
Various steps of a typical biometric recognition system are then discussed in detail.
For example, data acquisition, image localisation, feature extraction and matching
are all defined and the current methods employed for their implementation and
deployment are assessed and contrasted. The chapter concludes by briefly high-
lighting the need to use appropriate data sets for the evaluation of a biometric
system.
1.2 Definition of Biometrics

Biometric technologies can be defined as “automated methods of verifying or recog-
nising the identity of a person based on a physiological and/or behavioural charac-
teristic” [1, 2].
A. Bouridane, Imaging for Forensics and Security, Signals and Communication 1

Technology, DOI 10.1007/978-0-387-09532-5 1,

C Springer Science+Business Media, LLC 2009
2 1 Introduction and Preliminaries on Biometrics and Forensics Systems
The term “automated methods” means that biometric technologies are imple-
mented completely by a machine (but not always), generally a digital computer.
The second important part from the definition is “physiological or behavioural
characteristic”, meaning that biometrics tends to recognise people from their bio-
logical and behavioural characteristics. In other words, biometrics defines some-
thing you are, in contrast to other methods of identification such as something you
have (e.g. cards, keys) or something you know (password, PIN number).
1.2.1 Biometric Characteristics
There exist several characteristics that physical or behavioural traits need to fulfil in
order to be considered as a viable biometric application and the most agreed upon
are [3]
• Universality: Every individual accessing the application should possess the trait.
• Uniqueness: The given trait should be sufficiently different across individuals
comprising the population.
• Permanence: The biometric trait of an individual should be sufficiently invariant
over a period of time with respect to the matching algorithm.
• Measurability: It should be possible to acquire and digitise the biometric trait
using suitable devices that do not cause undue inconvenience to the individual.
Furthermore, the acquired raw data should be amenable to processing in order to

extract representative feature sets:
• Performance: The recognition accuracy and the resources required to achieve

that accuracy should meet the constraints imposed by the application.
• Acceptability: Individuals in the target population that will utilise the application
should be willing to present their biometric trait to the system.
• Circumvention: This refers to the ease with which the trait of an individual can
be imitated using artefacts in the case of physical traits, and mimicry in the case
of behavioural traits.
1.2.2 Biometric Modalities
In order to establish the identity of an individual, a variety of physical and

behavioural characteristics can be used by biometric systems (Fig. 1.1). These
include fingerprint, face, hand/finger geometry, iris, retina, signature, gait, palm-
print, voice pattern, ear, hand vein, odour or DNA information. In the biometric
literature, these characteristics are referred to as traits, identifiers or modalities.
No single biometric is expected to effectively meet all the requirements imposed
by all applications. In other words, no biometric is ideal but a number of them are
1.2 Definition of Biometrics 3
Fig. 1.1 Examples of biometric traits that can be used to recognise an individual. Illustrations
in the figure include ear, iris, hand geometry, face, speech, vein, fingerprint, gait and palmprint
traits
admissible [2]. The following sections briefly describe some of the most commonly
used including some emerging biometric traits:
Fingerprint recognition has been used as a biometric trait for many decades. The identifi-
cation accuracy using fingerprints has been shown to be very high [4]. A fingerprint is the
pattern of ridges and valleys on the surface of a fingertip whose formation is determined
during the first seven months of foetal development. It has been empirically determined that
the fingerprints of identical twins are different and so are the prints on each finger of the
same person [5].
Fingerprint biometrics currently has three main applications: (i) large-scale automated fin-
ger imaging systems (AFIS) generally used for law enforcement purposes; (ii) fraud preven-
tion in entitlement programs; and (iii) physical and computer access. The main problems
with fingerprint identification are related to the huge amount of computational resources
required for large-scale systems, and to the number of cuts and bruises that people can have
on their fingers [3].
• Iris recognition uses the iris patterns which are the coloured part of the eye,
although the colour has nothing to do with the biometric trait. Iris patterns of a
person’s left and right eyes are different, and so are the iris patterns of different
individuals including identical twins [6]. Iris recognition is usually employed as
a verification process due to its low false acceptance rate.
• Hand geometry recognition is based on a number of measurements taken from
the human hand such as its shape, size of palm (but not its print), and the lengths
and widths of the fingers. This method is very easy to deploy and is not compu-
tationally expensive. However, its low distinctiveness degree and the variability
of its size with age pose major problems [7]. This technology is not very suitable
for identification applications.
• Voice recognition is both a physical and behavioural biometric modality. The
physical features of an individual’s voice are based on the shape and size of the
appendages (vocal tracts, mouth, nasal cavities, and lips) which are invariant for
an individual, but the behavioural aspect of the speech changes over time due
to age, medical conditions, emotional state, etc. [8]. Speaker recognition is most
appropriate in telephone-based applications but the quality of the voice signal
degrades by the communication channel. The disadvantages of this biometric
trait are (i) it is not suitable for large-scale recognition and (ii) the speech features
are sensitive to the background noise.
• Signature recognition is defined as the process of verifying the writer’s identity
by checking his/her signature against samples kept in a database. The result of
this process is usually a number between 0 and 1 which represents a fit ratio (1
for match and 0 for mismatch). The threshold used for the confirmation/rejection
decision depends on the nature of the application. The distinctive biometric pat-
terns of this modality are the personal rhythm, acceleration and pressure flow
when a person types a specific word or group of words (usually the hand signa-
ture of the individual).
• Keystroke recognition attempts to assess the user’s typing style such as the dwell
time (how long each key is depressed), flight time (time between key strokes)
and typical typing errors. Usually this security technology is deployed for com-
puter access within an organisation. The distinctive and behavioural characteris-
tics measured by keystroke recognition also include the cumulative typing speed;
the frequency of the individual in using other keys on the keyboard, such as the
number pad or function keys; and the sequence utilised by the individual when
attempting to type a capital letter.
• Gait recognition is the process of identifying an individual by the manner in
which they walk. This modality is less unobtrusive than most others and as such
offers the possibility to identify people at distances without any interaction or
co-operation from the subject thus making it an attractive solution for identi-
1.3 Recognition/Verification/Watch-List 5
fication applications. This technology is still at an early stage of research and

development.
• Ear recognition is yet a new biometric technology and is useful during crime
scene investigations in the absence of valid fingerprints. This modality is gaining
interest from biometric community and its operation is similar to that of face
recognition.
1.3 Recognition/Verification/Watch-List
It is commonly known that a typical biometric recognition scenario, as all biometric

applications, can be classified into one of two types: verification (or authentication)
and identification (or recognition). In some applications, a third scenario may be
added. For example, Phillips et al. in the Face Recognition Vendor Test (FRVT) [9]
define another type called the “watch-list”.
1.3.1 Verification: Am I Who I Claim to Be?
This scenario can be employed in a control access point application. It is used when
a subject provides an alleged identity. The system then performs a one-to-one match
that compares a query biometric image against the template image, of the person
whose identity is being claimed, stored in the database. If a match is made, the
identity of the person is verified.
In other words, the verification test is conducted by dividing the subjects into two
groups:
• Clients: people trying to gain access using their own identity.

• Imposters: people trying to gain access using a false identity, i.e. an identity
known to the system but not belonging to them.
The percentage of imposters gaining access and clients rejected access are
reported as the false acceptance rate (FAR) and the false rejection rate (FRR).
1.3.2 Recognition: Who Am I?
This mode is used when the identity of the individual is not known in advance. The
entire template database is then searched for a match to the individual concerned in
a one-to-many search. If a match is made the individual is identified.
The recognition test works on the assumption that all biometric images being
tested are of known persons. The percentage of correct identifications is reported as
the correct (or Genuine) identification rate (CIR) or the percentage of false identifi-
cations is reported as the false identification rate (FIR).
1.3.3 The Watch-List: Are You Looking for Me?

One application example of a watch-list task is to compare a suspect flight passenger
against a database of known terrorists. In this scenario, the person does not claim
any identity; it is an open-universe test. The test person may or may not be in the
system database. The biometric sample of this individual is compared to the stored
samples in a watch-list to determine if the individual concerned is present in the
watch-list and a similarity score is reported for each comparison. These similarity
scores are then numerically ranked and if a score is higher than a preset threshold, an
alarm is raised and the system assumes that this person is present in the watch-list.
There are two terms of interest for watch-list applications [10]:
• Detection and identification rate: the percentage of times the system raises the
alarm when correctly identifying a person on the watch-list.
• False alarm rate: the percentage of times the system raises the alarm when an
individual is not on the watch-list.
It is worth noting that, in an ideal system, one wants the false alarm rate and the
detection and identification rate to be 0 and 100%, respectively.
1.4 Steps of a Typical Biometric Recognition Application
In general, biometric recognition applications, regardless of the specific method

used, consist of the following steps as shown in Fig. 1.2.
1.4.1 Biometric Data Localisation

This is the first module in a biometric recognition process where it is assumed that
an image or a video containing image data is available to the system as an input.
Localisation is a very important step in order to obtain satisfying recognition results
and the challenges associated with this process can be attributed to many factors
such as pose, presence or absence of structural components, occlusions, image ori-
entation, imaging conditions (lighting, camera characteristics).
Many approaches have been proposed to address biometric image localisation
problems.
In general, single image detection and localisation methods are classified into
four categories:
• Feature-invariant approaches: These algorithms aim to find the structural features

that exist even when pose, viewpoint or lighting conditions vary in order to locate
images. These methods are mainly designed for biometric image localisation.
1.4 Steps of a Typical Biometric Recognition Application 7
Fig. 1.2 Steps of a typical Image

biometric image recognition Acquisition
Biometric image
Localisation
Biometric
Sub-image
Normalisation
and
Pre-processing
Normalise
Image
Feature
Extraction
Feature
Vector
Matching
Database
Result
• Template matching methods: Several standard patterns of a biometric image are

stored to describe the image as a whole or the biometric features separately. The
correlations between an input image and the stored patterns are computed for
use in the detection process. These methods are used for both localisation and
detection techniques.
• Appearance-based methods: In contrast to template matching, the models (or
templates) are learned from a set of training images which capture the represen-
tative variability of image appearance. These learned models are then used for
biometric image detection.
1.4.2 Normalisation and Pre-processing
The aim of this step is to enhance the quality of the captured images due to distor-
tions previously described with a view to lead to a better recognition power for the
system. Depending on the application, some or all of the following pre-processing
techniques may be implemented in a biometric recognition system:
• Geometrical alignment: In some cases, the position of the image is not located
at its optimum position; it might be somehow rotated or shifted. Since the main
body of the biometric image plays a key role in the determination of biomet-
ric features, especially for face/iris/palmprint recognition systems based on the
frontal views of images, it may be very helpful if the pre-processing module
normalises the shifts and rotations in the main position.
• Image size normalisation: This process aims to align images such that they are
of the same size and are located at the same position and orientation. Resizing is
then performed to set the size of an acquired image to a default image size, say
of 128×128, 256×256, etc. This step is mostly encountered in systems where
images are processed globally.
• Enhancement: This step is not always required but it can be highly useful in two
cases: (i) median filtering for noisy images especially obtained from a camera or
from a frame grabber and (ii) high-pass filtering to highlight the contours of the
image to further improve edge detection performances.
• Background removal: This process deals primarily with the most useful informa-
tion where background should be removed. Masking also can be used to elimi-
nate the sections of the image that are not part of the main image area. This is
done to ensure that the biometric recognition system does not respond to features
corresponding to background, hair, clothing, etc.
1.4.3 Feature Extraction
This is the key step in any biometric recognition system and all pattern recognition
systems in general. Once the detection/localisation process has targeted a biometric
image data and normalised it, image can be analysed. The recognition process anal-
yses the spatial geometry of the distinguishing features of the image. There exist
different methods to extract distinguishing features of an image, but in general they
can be classified into three approaches:
• Feature-based approaches: They are based on the extraction of the properties of

individual organs located on a biometric trait such as eyes, nose and mouth for a
face, wrinkles, lines for a palmprint, eye lashes for an iris, etc., as well as their
relationships with each other.
• Appearance-based approaches: They are based on information theory concepts
and seek a computational model that best describes a biometric image. This is
performed by extracting the most relevant information contained in the image
without dealing with the individual properties of organs such as eyes or mouth.
In appearance-based approaches, an image is considered as a high-dimensional
vector, i.e. a point in a high-dimensional vector space. Many of these approaches
use statistical techniques to analyse the distribution of the object image vectors
in the vector space in order to derive an efficient and effective representation
(feature space) depending on the targeted application. Given a test image, the
similarity between the stored prototypes and the test view is then carried out in
the feature space.
1.5 Summary 9
• Hybrid approaches: Just as the human perception system uses both local features
and the whole image region to recognise a biometric image, a machine recogni-
tion system should use both.
1.4.4 Matching
The fourth step of a biometric recognition system is to compare the template gener-
ated in step three against a database of known features of the biometric application.
In an identification application, this process yields scores that indicate how closely
the generated template matches each of those in the database. In a verification appli-
cation, the generated template is only compared to one template in the database to
claim the true or false identity of the person.
Finally, the system should determine if the produced score is sufficiently large to
declare a match. The rules governing the declaration of a match are of two types:
(i) manual, where the end-user has to determine if the result is satisfying or not, and
(ii) automatic, in which case the measured distance (the matching score) should be
compared to a predefined threshold so that a match is declared only if the measured
score is higher than the threshold.
1.4.5 Databases
To build/train a biometric recognition algorithm, it is necessary to use a standard test
data set as used by researchers and end-users in order to be able to directly assess and
compare the results. A database is a collection of one or more computer files. For
biometric systems, these files could consist of biometric sensor readings, templates,
match results, related end- user information etc. While there exist many databases
currently in use and which can be found on the Internet or available from academic
or industrial institutions, the choice of an appropriate database should be made based
on the targeted biometric application (face, iris, palmprint, speech, etc.). Another
way is to select the data set specific to the application at hand; for example, how the
algorithm responds to biometric images under varying environment conditions or
how the algorithm operates under different operating scenarios by varying the setup
variables/values.
1.5 Summary
Biometrics aims to automatically identify individuals based on their unique phys-
iological or behavioural traits. A number of civilian and commercial applications
of biometrics-based identification have been deployed in real problems and many
are emerging. These deployments are intended to strengthen the security and conve-
nience in their respective environments. However, a number of legitimate concerns
are also being raised against the use of biometrics for various applications such as
loss of privacy and performance limitations. To address these issues, appropriate

legislations will need to be brought about in order to protect the privacy rights of
the individuals while at the same time endorsing the use of biometrics for legiti-
mate uses especially if much improved and non-invasive solutions are researched
and developed at low cost.
References
1. J. Wayman, A. K. Jain, D. Maltoni and D. Maio, Eds., “Biometric Systems:Technology,
Design and Performance Evaluation” Springer-Verlag, London, UK, 2005.
2. A. K. Jain, P. Flynn and A. A. Ross, Eds., “Handbook of Biometrics” Springer Science Busi-
ness Media, LLC, New York, USA, 2008.
3. A. K. Jain, R. Bolle and S. Pankanti, Eds., “Biometrics: Personal Identification in Networked
Society” Kluwer Academic Publishers, London, UK, 1999.
4. C. Wilson, A. R. Hicklin, M. Bone, H. Korves, P. Grother, B. Ulery, R. Micheals, M. Zoep, S.
Otto and C. Watson, “Fingerprint Vendor Technology Evaluation 2003: Summary of results
and analysis report” Tech. Report, NIST Technical Report NISTIR 7123, National Institute of
Standards and Technology, June 2004.
5. D. Maltoni, D. Maio, A. K. Jain and S. Prabhakar, “Handbook of Fingerprint Recognition”
Springer-Verlag, London, UK, 2003.
6. J. D. Woodward, C. Horn, J. Gatune and A. Thomas, “Biometrics: A Look at Facial recogni-
tion” RAND Public Safety and Justice for the Virginia StateCrime Commission, 2003.
7. R. Zunkel, “Biometrics: Personal Identification in Networked Society” Chapter Hand Geom-
etry Based Authentication, pp. 87–102, Kluwer Academic Publishers, London, UK, 1999.
8. J. P. Campbell, “Speaker recognition: a tutorial” Proceedings of the IEEE, vol. 85, no. 9, pp.
1437–1462, September 1997.
9. P. J. Phillips, G. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi and J.M. Bone “FRVT
2002: Overview and summary, http://www.frvt.org/frvt2002/documents.htm”.
10. D. M. Blackburn, “Biometrics 101, version 3.1, vol. 12” Federal Bureau of Investigation,
March 2004.
Chapter 2
Data Representation and Analysis
2.1 Introduction
The last few years have witnessed the emergence of new tools and means for the
scientific analysis of image-based information for security and forensic science and
crime prevention applications. For instance, images can now be captured, viewed
and analysed at the scenes or in laboratories within minutes whilst simultaneously
making the images available to other experts via fast and secure communication
links on the Internet, thereby making it possible to share information for forensic
and security intelligence and crime linking purposes. In addition, these tools have
a strong link with other aspects of investigation, such as image capture, informa-
tion interpretation and evidence gathering. They are helpful for minimization of
human error and analysis of data. Although there exist a number of application sce-
narios, the analysis of data is usually based on a conventional biometric system.
Therefore, the following discussion on a biometric system is given as it would be
a starting point for any other imaging system for use in security and/or forensic
science.
A standard Biometric Identification System consists of the following three
phases: Data Acquisition, Feature Extraction and Matching, and operates in two
distinct modes: Enrolment Mode or Identification Mode [1]. The Data Acquisi-
tion stage is used in the enrolment mode to establish the database of users and
their related biometric data whereas in Identification mode it is used to obtain a
reference biometric from the user. This reference biometric is then processed at
the Feature Extraction phase to obtain unique and comparable features. These fea-
tures are then compared in the Matching phase with the related features of all
the biometric templates in the database to establish or refute the identity of the
user. Figure 2.1 depicts a block diagram view of a basic Biometric Identification
System.
The design of any biometric system is based on decisions regarding the selection
of appropriate modules for each of these processes [1, 3]. Details of these processes
and modules included within these processes along with the critical issues that need
to be addressed before a design decision is made are described below.

Technology, DOI 10.1007/978-0-387-09532-5 2,

12 2 Data Representation and Analysis
Fingerprint,
Biometric
Iris, ..etc Feature Extraction
Sensor
Enrolment
Fingerprint, Biometric Database

Sensor Feature Extraction
Iris, ..etc
ID Matching
Result
Identification/Authentication
Fig. 2.1 Block diagram of a biometric identification system
2.2 Data Acquisition
In the data acquisition process the first and foremost decision to make pertains to the
selection of an appropriate biometric trait. A lot of thought has to go into the selec-
tion of human physical and physiological traits for use by the biometric recognition
system. The selected trait has to be universal but unique so that the trait exists in all
users but also varies from subject to subject. It has to be permanent and resistant to
changes so that the stored biometric data is usable over a long period of time. It has
to be measurable and socially and economically acceptable so that the data can be
gathered and matching can be performed and results quantified within a reasonable
time and cost constraints. It should also be very difficult to circumvent or forge the
trait. In addition, attention should be paid to ensure that machine-readable repre-
sentations should completely capture the invariant and discriminatory information
in the input measurements. This representation issue is by far an important require-
ment and has far reaching implications on the design of the rest of the system. The
unprocessed measurement values should not be invariant over the time of capture
and there is a need to determine those peculiar/salient features of the input measure-
ment which both discriminate between the identities as well as remain invariant for
a given individual.
The acquisition module should also aim to capture salient features since it is
accepted that more distinctive biometric signals offer more reliable identity authenti-
cation while less complex measurement signals inherently offer a less reliable iden-
tification results. Therefore, quantification at an earlier stage would lead to much
improved and effective results of a biometric recognition system.
It is important to note that no single biometric is expected to meet all the
above mentioned requirements. Therefore, developers are required to identify
the best possible biometric trait for each application e.g. for an application that
2.2 Data Acquisition 13
Table 2.1 Some commonly used biometrics
Physical Behavioural
Fingerprint: Gait:
Most commonly used Useful in distant Surveillance
Higher false accept rate Changes with age and surface
Face: Voice:
Easiest to acquire Useful in absence of visual data
Difficult to compar Changes with age and Health
Hand Geometry: Handwriting & Signature:
Robust under different conditions Useful in detecting emotions as well
Changes occur with age Changes with age, health and stress
Palm Print:
Bigger area of interest
Availability of data set
Iris:
Very Low false accept rate
Difficult to acquire
Ear:
Robust to change
Difficult to acquire
focuses on access control to critical area or application cost may not necessi-
tate a significant consideration but uniqueness and circumvention may be impor-
tant. Some of the most commonly used biometric traits are identified in the
following Table 2.1.
2.2.1 Sensor Module

Once a biometric trait is identified a suitable sensor module is required to capture
the data. The sensor module is the interface between the user and the system, there-
fore the selection of an appropriate sensor module is critical for the success of the
system. A poor quality sensor module would not be able to capture the biometric
data properly, thus increasing the failure to capture error. This will cause user dis-
satisfaction and low acceptance rate which may eventually cause the system to fail.
It is also important to decide if the sensor module would be “overt” or “covert”.
If the user is aware of the fact that biometric identification is taking place the system
is called overt but if they are not aware that any identification is taking place then the
system is called covert. Overt systems are mostly used in access control applications
whereas covert systems are used in surveillance and continuous monitoring systems.
For example, a computer may contain a fingerprint scanner to identify the user at
logon (which is the example of an overt system) but then constantly keep acquiring
facial images from an attached webcam to verify that the same user that logged on
is still accessing the system (which is the example of a covert system).
As most biometric systems are imaging based, the quality and maintenance of
the raw captured biometric image also plays an important role in the development
of a strong biometric identification system.
2.2.2 Data Storage
Maintenance of the template data store ‘or’ template database is often an over looked
area of the biometric identification system. A well established, secure and effective
database can improve the performance and user acceptance of the system. As the
database has to store the biometric data along with other personal details of the user,
it has to be kept very secure. The size of the database also has to be kept as small as
possible to maintain the speed of access.
The type of data to be stored depends upon the kind of application that will utilize
the data. Raw images are stored for research and feature sets are usually stored
for real world applications. Both type of data storage provides some interesting
challenges [4].
2.2.2.1 Raw Images

Raw Images are stored for research and development applications. When storing
raw images the following points have to be kept in mind.
Image Size – Image Size should be kept small to reduce the database size but if
the image size is very small a lot of important information may be lost.
Image Format – As the data is being stored for research it is important to keep
the images in a standard file format so that they can also be accessed by other
researchers. The image should be stored in a format that does not use a strong com-
pression algorithm because compression can cause change to pixel values in the
image. This can cause corruption of data.
Image DPI – A high DPI image will have more information available and thus
will provide more data for researchers but on the other hand it will also have a large
size thus increasing the size of the database. A balance needs to be obtained between
the DPI and Image size for optimal performance of the database.
Usually an image with a size of around 640 × 480 having 8 bits per pixel
and 500 dpi stored either as a TIFF image or a BMP image is considered to be a
good biometric image. Figure 2.2 shows some samples of raw fingerprint and iris
images.
2.2.2.2 Feature Sets

Real World Applications of biometric systems require quick access to data and very
small storage space, therefore, the feature sets are usually stored instead of raw
images. Processed feature sets usually take up less space than raw images, thus
2.3 Feature Extraction 15
Fig. 2.2 A visual spectrum

fingerprint image, a thermal
fingerprint image and an IR
spectrum iris image
reducing the database size and also increasing the access speed. One of the lesser
known advantages of using feature sets is that it is not possible to recreate the actual
raw biometric data from the feature set therefore saving the feature set only provides
personal data protection.
It should be kept in mind that to maintain system openness, the feature sets should
be stored in one of the standard formats like the ones defined in ANSI/NBS – ICST
1-1986 for minutiae, ANSI/NIST – ITL 1a-1997 for Faisal Feature Set, ANSI/NIST
– ITL 1-2006 for Iris, etc. [4, 5]. Using these standards allows for an easy expansion
and upgrade of the system at later times.
2.3 Feature Extraction

As mentioned in Chapter 1, one of the first steps in feature extraction process is
pre-processing. Pre-processing can consist of multiple steps e.g.
Image Enhancement – To reduce the noise in the image and/or to enhance the
features to be extracted.
Image Formatting – Image is converted to a format that will allow for better
feature extraction performances; e.g. some minutiae extraction algorithms require
the image to be converted into a binary form.
Image Registration and Alignment – Images are aligned and centred so that the
extracted features from all images are within the same frame of reference thus mak-
ing the matching process much simpler.
Image Segmentation – Raw image is segmented into Region of Interest (ROI) and
background. All the processing is carried out on the ROI, it is therefore imperative
that the best possible segmentation is obtained.
Once a raw image is properly processed the feature extraction algorithm
can then be used to extract the relevant features. Feature Extraction Algorithms
can be classified into two groups: Global Feature Extractors and Local Feature
Extractors.
Global Feature Extractors aim to locate usable features from the raw data at the
overall image level. They process the image as a whole and try to extract the fea-
tures, e.g. Gabor Wavelets-based approach for iris and fingerprint recognition [6].
On the other hand, Local Feature Extractors focus on the chunks of image data.
These algorithms work on small windows within the images and extract the rele-
vant features, e.g. minutiae extraction from skeletonised and binarised fingerprint
images.
A feature extractor algorithm selection is governed mainly by the type of appli-
cation that the system is being designed for. Applications requiring more accuracy
and security should have a robust and exhaustive feature extractor. However, for
faster applications a simpler algorithm might be the best option. Ideally, the feature
extractor should be very robust, accurate and fast but practically this is not possible.
It is therefore almost always a compromise between accuracy and speed. It is advis-
able to evaluate multiple feature extraction algorithms to find the optimal algorithm
for the desired application.
The feature extractor algorithm selection also depends upon the type of matcher
being used in the system. The feature extractor should generate output in the format
that the matcher is able to comprehend and process.
As mentioned before, to maintain openness of the system it is prudent to ensure
that the output of the feature extractor should follow a standard format.
2.4 Matcher
A matcher algorithm takes the reference feature set and compares it with all the
template feature sets in the database to provide a matching score for each pair. It
then selects the best template–reference pair and outputs the details as its decision.
Different types of matchers are usually used depending upon the type and format
of the feature set as well as the type of application at hand. Matchers are commonly
categorised into two categories: Time Domain Matchers and Frequency Domain
Matchers [1–3].
Time Domain Matchers work in the spatial domain and the feature sets for these
types of matchers are generated directly from the raw images.
Frequency Domain Matchers operate in the frequency domain and the feature
sets for these types of matchers are generated by first transforming the image into
the frequency domain and then selecting the features, e.g. wavelets-based matchers,
Fourier transform-based matchers, etc.
However, it is worth noting that correlation-based matchers are the most com-
monly used matcher algorithms. Similarly, distance-based matchers and supervised
learning or pattern recognition-based matching is also widely used.
Pattern recognition-based matching finds the correct match by training on known
correct and incorrect matching feature sets. In this type of matching the training
process is usually computationally intensive, but if this process is efficiently done,
matching can work very fast and provide highly accurate results.
The selection of the optimal matcher depends upon the application for which the
system is being developed as well as the type of feature sets available for matching.
In addition, information regarding the desired accuracy and the speed required also
plays an important role when selecting a matcher algorithm.
2.6 Performance Evaluation 17
2.5 System Testing

When a biometric identification system is developed it should be thoroughly tested
before going live. Appropriate testing is critical to locate, identify and eradicate
errors before the system is implemented in a real world scenario.
Testing is conducted on a test data set and for impartial testing of biometric
recognition systems the industry and research community need to have access to
large public data sets. Test data set is generated by collecting data with known val-
ues (called the ground truth) and the testing is carried out by matching a set of
known reference feature sets with the test database and evaluating the result of the
matcher against the ground truth [7]. Evaluation of testing is usually conducted in
two overlapping categories: technology and operation scenarios. Technology evalu-
ation is used to measure the performance of the recognition algorithm and is usually
conducted using standard data sets. In general, the results from this type of evalu-
ation are used to further improve the algorithm. On the other hand, the aim of the
operation evaluation is to measure and assess the performance of the recognition
system in a particular application; for example iris authentication at an airport. This
type of evaluation usually takes weeks/months to account for environment changes
including varying test samples.
The results are then analysed to locate the bugs and remove them. The perfor-
mance of the system is also evaluated and presented in a standard format for the
users.
2.6 Performance Evaluation
Determining the best biometric system for a specific operational environment and
how to set up that system for optimal performance requires an understanding of
the evaluation methodologies and statistics used in the biometrics community. The
degree of similarity between two biometric images is usually measured by a simi-
larity score. The similarity score is called genuine score if the similarity is measured
between the feature sets of the same user. On the other hand, it is called imposter
score if it is between feature sets of different users.
An end-user is often interested in determining the performance of the biomet-
ric system for his specific application. For example, he/she would like to know
whether the system makes accurate identification. Although, there exists a few cri-
teria, no metric is sufficiently adequate to give a reliable and convincing indication
of the identification accuracy of a biometric system. However, one criterion gen-
erally accepted by biometric community uses either a genuine individual type of
decision or an impostor type of decision, which can be represented by two statisti-
cal distributions called genuine distribution and impostor distribution, respectively.
The performance of a biometric identification system can then be evaluated based
on resulting Genuine Score and Imposter Score generated by the system [7]. The
following are usually employed:
Matcher accuracy – Accuracy is measured on the test data set to discover how
many of the feature sets are correctly matched by the system. If the genuine score
is above an operating threshold of the system, the feature set is considered to
be correctly matched. Matcher accuracy is usually displayed as a percentage of
matches.
False accept rate (FAR) – If an imposter score is above the operating thresh-
old it is called a false accept. FAR, therefore, means that the system accepted an
imposter as a genuine user. FAR is one of the major performance matrixes that have
to be closely evaluated. In fact, effort should be made to keep it as close to zero as
possible.
False reject rate (FRR) – If a genuine score is below the threshold then it is called
a false reject. Thus, false reject rate means that the system rejected a genuine user
as an imposter. FRR should ideally be as close to zero as possible but in most access
control applications it is not as critical as FAR. If a user is rejected as an imposter
he/she can always try again but if an imposter is accepted as a genuine user the
integrity of the complete system is compromised.
False alarm rate – A statistic used to measure biometric performance when oper-
ating in the watch-list (sometimes referred to as open-set identification) task. This is
the percentage of times an alarm is incorrectly sounded on an individual who is not
in the biometric system’s database (the system alarms on John when John isn’t in
the database) or an alarm is sounded but the wrong person is identified (the system
alarms on Peter when Peter is in the database, but the system thinks Peter is Daniel).
Equal error rate (ERR) – It is the point on the ROC curve where FAR and FRR
are equal. For a high-performance system ERR should be as low as possible.
Most vendors provide the performance evaluation in terms of accuracy and ERR.
Some other evaluation criteria are
Failure to capture rate (FCR) – FCR pertains to the amount of times a sensor
is unable to capture an image when a biometric trait is presented to it. The FCR
increases with wear and tear to the sensor module. If the FCR increases above a
certain threshold it is advisable to replace the sensor module.
Failure to enrol (FTE) – FTE indicates the number of users that were not enrolled
in the system. FTE is usually related to the quality of the biometric image. In most
cases, a system is trained to reject poor quality images. This helps in improving the
accuracy of the system and reducing the FAR and FRR. Every time an image is
rejected the FTE is increased. A trade-off between quality and FTE is required if
the system is to be accepted by the users.
2.7 Conclusion
To develop a strong biometric system it is imperative to select a very stable data

acquisition system and a very secure, fast and robust database. Feature Extractor
and Matcher selection will directly impact the user acceptance of the system and
the selection is based on the type of application.
References 19
References
1. Arun A. Ross, Patrick Flynn and Anil K. Jain, “Handbook of Biometrics” ISBN: 978-0-387-
71040-2.
2. A. K. Jain, A. Ross and S. Prabhakar, “An introduction to biometric recognition,” IEEE Trans-
actions on Circuits and Systems for Video Technology, vol. 14, no. 1, January 2004.
3. J. G. Daugman, “Biometric Decision Landscape”, Technical Report No. TR482. University of
Cambridge Computer Laboratory, 1999.
4. Data Format for Information Interchange – Fingerprint Identification, ANSI/NBS – ICST 1-
1986.
5. Data Format for Information Interchange – Data Format for the Interchange of Fingerprint,
Facial & SMT Information, ANSI/NIST – ITL 1a-1997.
6. H. Meng and C. Xu, “Iris Recognition Algorithm Based on Gabor Wavelet Transform,” IEEE
International Conference on Mechatronics and Automation, 2006.
7. J. Wayman, A. Jain, D. Maltoni and D. Maio, “Biometric Systems Technology, Design and
Performance Evaluation,” ISBN: 1852335963.
Chapter 3
Improving Face Recognition Using Directional
Faces
3.1 Introduction
Face recognition is one of the most popular applications in image processing and
pattern recognition. It plays a very important role in many applications such as card
identification, access control, mug shot searching, security monitoring and surveil-
lance problems.
There are several problems that make automatic face recognition a very challeng-
ing task. The input of a person’s face to a recognition system is usually acquired
under different conditions from those of the corresponding image in the database.
Therefore, it is important that an automatic face recognition system can deal with
numerous variations of images of a face. The image variations are usually due to
changes in pose, illumination, expression, age, disguise, facial hair, glasses and
background.
Much progress has been made towards recognising faces under controlled con-
ditions as reported in [1, 2], especially for faces under normalised pose and lighting
conditions and with neutral expression.
The Eigenfaces’ method [3], based on Principal Component Analysis (PCA), is
one of the most popular methods in face recognition. Its principal idea is to find
a set of orthogonal basis images (called eigenfaces) so that in this new basis, the
image coordinates (the PCA coefficients) are uncorrelated. Independent Component
Analysis (ICA) [4] is one generalisation of PCA and assumes that image data is
independent, and not only uncorrelated as in PCA. Fisherface technique [5] based
on Linear Discriminant Analysis (LDA) is an other popular method. It considers that
each face image in the training set is of a known class and uses this information in
the classification step. Subclass Discriminant Analysis (SDA) is a recent algorithm
devised by Zhu and Martinez [6] where each class of the LDA method is subdivided
into a number of subclasses.
However, recognition of face images acquired in an outdoor environment with
changes in illumination and/or pose remains problematic. Researchers have pro-
posed the utilisation of a pre-processing step in order to extract more discriminant
features for use in the recognition step. Gabor Filter Bank (GFB) is one of the most
well-known methods used for this purpose and many algorithms have been pro-
posed [7, 2]. However, as described in [8], the use of a GFB inherently results in

Technology, DOI 10.1007/978-0-387-09532-5 3,

22 3 Improving Face Recognition Using Directional Faces
some overlapping and missing subband regions. The Directional Filter Bank (DFB),
on the other hand, is a contiguous subband representation that preserves all image
information. Accordingly, a DFB can represent linear patterns, such as those avail-
ble around eyes, nose and mouth area, more effectively, than a GFB [9].
This chapter discusses the use of a DFB pre-processing phase in order to improve
the recognition rates of a number of different algorithms. Four algorithms represent-
ing both Component and Discriminant Analysis approaches have been selected to
demonstrate the efficiency of the DFBs. In this work, the algorithms PCA, ICA
(FastICA [10]), LDA and SDA are chosen for their popularity and efficiency.
3.2 Face Recognition Basics
3.2.1 Recognition/Verification
It is commonly known that a typical face recognition system can be classified into
one of two modes: face verification (or authentication) and face identification (or
recognition). However, Phillips et al. in the Face Recognition Vendor Test (FRVT)
2002 [11] define another mode referred to as the “Watch-list”.
3.2.1.1 Face Verification: Am I Who I Claim to be?

This scenario can be employed as a control access point application. It is used when
a person provides an alleged identity. The system then performs a one-to-one match
that compares a query face image against the template face image stored in the
database whose identity is being claimed. If a match is made the identity of the
person is verified.
In other words, the verification test is conducted by dividing the subjects into two
groups:
• Clients, people trying to gain access using their own identity.

• Imposters, people trying to gain access using a false identity, i.e. an identity
known to the system but not belonging to them.
The percentage of imposters gaining access and clients rejected access are
referred to as the False Acceptance Rate (FAR) and False Rejection Rate (FRR)
for a given threshold, respectively.
3.2.1.2 Face Recognition: Who am I?

This mode is used when the identity of the individual is not known in advance. The
entire template database is then searched for a match to the individual concerned in
a one-to-many search. If a match is made the individual is identified.
3.2 Face Recognition Basics 23
The recognition test works from the assumption that all faces being tested are
of known persons. The percentage of correct identifications is reported as the Cor-
rect(or Genuine) Identification Rate (CIR) while the percentage of false identifica-
tions is reported as the False Identification Rate (FIR).
3.2.1.3 The Watch-List: Are You Looking for Me?

One important application of the Watch-List task could be comparing a suspect
flight passenger against a database of known terrorists. In this scenario, the person
does not claim any identity, it is an open-universe test. The test person may or may
not be in the system database. The biometric sample of this individual is compared
with the stored samples in a Watch-List database to determine whether the individ-
ual concerned is present in the Watch-List. A similarity score is reported for each
comparison. These similarity scores are then numerically and orderly ranked. If a
similarity score is higher than a preset threshold, an alarm is raised and the system
assumes that this person is present in the Watch-List.
There are two factors of interest for a Watch-List application [12]:
• Detection and identification rate: the percentage of times the system raises the
alarm and correctly identifies a person on the Watch-List.
• False alarm rate: the percentage of times the system raises the alarm for an indi-
vidual that is not on the Watch-List.
In an ideal system, one wants the false alarm and the detection and identification
rates to be 0 and 100%, respectively.
3.2.2 Steps of a Typical Face Recognition Application
Facial recognition applications, regardless of the specific method used, in general

consist of the following steps (Fig. 3.1):
3.2.2.1 Face Localisation

This is the first module in a face recognition process. It assumes that an image or a
video containing face images is available as an input to the system.
In Yang et al. [13], a difference is made between face localisation and face detec-
tion:
• Face detection: given an arbitrary image, the goal is to determine whether or not
there are any faces in the image and, if present, return the image location and
extent of each face.
• Face localisation is the process of localising one face in a given image, i.e. the
image is assumed to contain one, and only one face.
Fig. 3.1 Steps of a typical

face recognition application
Therefore, face detection is a very important task of any face recognition system
and an efficient detection would enhance the recognition results. The challenges
associated with face detection can be attributed to many factors such as: pose, pres-
ence or absence of structural components (facial hair, glasses, et.), facial expression,
occlusions, image orientation, imaging conditions (lighting, camera characteristics).
Many approaches have been proposed to address the face detection problems
[14, 13], and summary is depicted in Table 3.1
3.2.2.2 Normalisation and Pre-processing

The aim of this step is to enhance the quality of the captured images due to one or
many of the factors mentioned in the previous section with a view to allow for a
better recognition power of the recognition system. Depending on the application at
hand, some or all of the following pre-processing techniques may be implemented
in a face recognition system:
• Geometrical alignment (translation, rotation).

• Image size normalisation.
• Filtering (median filtering, high-pass filtering).
3.2 Face Recognition Basics 25
Table 3.1 Categorisation of methods for face detection in a single image
Approach Representative work
Knowledge-based Multi-resolution rule-based method [15]

Feature invariant
– Facial Features Grouping of edges [16]
– Texture Space Gray-Level Dependence matrix
of face pattern [17]
– Skin Colour Mixture of Gaussian [18]
– Multiple Features Integration of skin colour size and shape [19]
Template matching
– Predefined face templates Shape templates [20]
– Deformable Templates Active Shape Models [21]
Appearance based
– Eigenfaces Eigenvector decomposition and clustering [3]
– Distribution based Gaussian distribution and multi-layer perception [22]
– Neural Network Ensemble of neural networks and arbitration
schemes [23]
– Support Vector Machine SVM with polynomial kernel [24]
– Naive Bayes Classifier Joint statistics of local appearance and position [25]
– Hidden Markov Model Higher order statistics with HMM [26]
– Information-Theoretical Kullback relative information [27]
Approach
• Illumination normalisation.
• Background removal.
3.2.2.3 Feature Extraction

This is the key step in face recognition in particular and in all pattern recognition
applications in general. Once the face detection task has detected and normalised a
face, the analysis can then take place by capturing the spatial geometry of distin-
guishing features of the face. There exist different methods to extract identifying
features of a face, but in general they can be classified into three approaches:
Feature-based approaches are based on the extraction of the properties of
individual organs located on a face such as eyes, nose and mouth including
their relationships with each other.
Appearance-based approaches are based on information theory concepts.
These approaches seek a computational model that best describes a face by
extracting the most relevant information contained in the face without deal-
ing with the individual properties of facial organs such as eyes or mouth.
Hybrid approaches are similar to the human perception system which uses
both local features and the whole face region to recognise a face. A machine
recognition system should use both.
Table 3.2 presents some of the principal algorithms developed for feature extrac-
tion as described by Zhao and Chellappa [28].
Table 3.2 Categorisation of features extraction techniques
Approach Representative work
Appearance-based methods
– Eigenfaces Direct application of PCA [3]
– Probabilistic Eigenfaces Two-class problem with prob. measure [29]
– Fisherfaces FLD on Eigenspace [5]
– SVM Two-class problem based on SVM [30]
– Evolution pursuit Enhanced GA learning [31]
– Feature lines Based on point to line distance [32]
– ICA ICA-based feature analysis [33, 4]
– Kernel faces Kernel methods [34]
Feature-based methods
– Pure geometry methods Earlier methods [35–37]; recent methods [38, 39]
– Dynamic link architecture Graph matching methods [40, 41]
– Hidden Markov model HMM methods [42, 43]
Hybrid methods
– Modular Eigenfaces Eigenfaces and Eigenmodules [44]
– Hybrid LFA Local feature method [45]
– Shape normalised Flexible appearance models [21]
– Component-based Face region and component [46]
3.2.2.4 Matching
The fourth step of a face recognition system is to compare the template generated in
step three with those in a database of known faces. In an identification application,
this process yields scores indicating how closely the generated template matches
each of those in the database. In a verification application, the generated template is
only compared with one template in the database, that of the claimed identity.
Finally, the system should determine if the produced score is high enough to
declare a match. The rules governing the declaration of a match are of two types: a
manual one where the end user has to determine if the result is satisfying or not and
an automatic type in which the measured distance (the matching score) should be
compared to a predefined threshold and a match is declared if the measured score is
higher than this threshold.
3.3 Previous Work
3.3.1 Principal Component Analysis (PCA)
The well-known Eigenface algorithm proposed by Turk and Pentland [3, 47] uses
PCA for dimensionality reduction in order to find the vectors which best account
for the distribution of face images within the entire image space. These vectors
define the subspace of the face images (face space). All faces in the training set are
3.3 Previous Work 27
projected onto the face space to find a set of weights that describes the contribution
of each vector in the face space. To identify a test image, the projection of the test
image onto the face space is required to obtain the corresponding set of weights. By
comparing the weights of the test image with the set of weights of the faces in the
training set, the face in the test image can be identified.
The key procedure in PCA is based on Karhunen–Loeve (KL) transformation. If
the image elements are considered to be random variables, then the image may be
seen as a sample of a stochastic process. The PCA basis vectors are defined as the
eigenvectors of the covariance matrix C:
C = E[X X T ] (3.1)
Since the eigenvectors associated with the largest eigenvalues have face-like
images, they also are referred to as Eigenfaces. Specifically, suppose the eigenvec-
tors of C are u 1 , u 2 , . . . , u n and are associated respectively with the eigenvalues
λ1 ≥ λ2 ≥ . . . ≥ λn . Then

n
X= x̂i u i (3.2)
i=1
The dimensionality reduction can be achieved by letting

m
X≈ x̂i u i (3.3)
i=1
where X̂ = [x̂1 , x̂2 , . . . , x̂m ] and m is usually selected such that λi is small for
i > m.
Since the eigenfaces’ method directly applies PCA, it does not destroy any
information of the image by exclusively processing only certain points, generally
providing more accurate recognition results. However, the technique is sensitive to
variation in position and scale. Some serious issues relate to the effect of back-
ground, head size and orientation. The change of head size of an input image can
be problematic because a neighbourhood pixel’s correlation is lost under head size
change. Note that variation of light can also be a problem if the light source is posi-
tioned in some specific directions.
3.3.2 Independent Component Analysis (ICA)
ICA is a widely used algorithm in statistical signal processing. It is defined as fol-

lows: having an observed m-dimensional vector X = (x1 , . . . , xm )T , the problem
is to find a linear transform A that maps observation X into a n-dimensional vector
S = (s1 , . . . , sn )T where the components si are as independent as possible:
X = AS (3.4)
where A is a m×n matrix of full rank, called the mixing matrix. In feature extraction,
the columns of A represent features, and si is the coefficient of the ith feature in the
observed data vector X .
There are several methods to compute the ICA. Here FastICA [10] is used
because of its fast convergence during the estimation of the parameters.
The FastICA method computes the independent components by maximising non-
Gaussianity of whitened data distribution using a kurtosis maximisation process.
The kurtosis measures the non-Gaussianity and the sparseness of the face represen-
tations [48]. The idea is to estimate the independent source signals U by computing
a separating matrix W where U = W X = W AS. First, the observed samples are
centred and whitened, this means that the data has a mean equal to zero and a stan-
dard deviation equal to one. Let us denote the centered and whitened samples by Z.
Then, one needs to search for the W matrix such that the linear projection of the
whitened samples by the matrix W has maximum non-Gaussianity of data distribu-
tion. The kurtosis of Ui = WiT Z is computed as:

K (Ui ) = E(Ui )4 − 3(E(Ui )2 )2 (3.5)
the separating vector Wi is obtained by maximising the kurtosis.
3.3.3 Linear Discriminant Analysis (LDA)

PCA constructs the face space without using face class (category) information where
training considers the whole face data. However, in an LDA approach the goal is
to find an efficient way to represent the face vector space [5, 2] by exploiting the
class information which can be helpful for the identification task. The Fisherface
algorithm [5] is derived from the Fisher Linear Discriminant (FLD), which uses
class specific information. By defining different classes with different statistics, the
images in the learning set are divided into the corresponding classes. Then, tech-
niques similar to those used in the Eigenface algorithm are applied. The Fisherface
algorithm results in a higher accuracy rate in recognising faces when compared with
Eigenface algorithm. The LDA finds a transform W L D A , such that the ratio of the
between-class scatter and the within-class scatter is maximised as follows:
|W T S B W |
W L D A = argmax W (3.6)
|W T SW W |
where S B is the between-class scatter matrix and SW is the within-class scatter

matrix and are defined as

c
SB = Ni (μi − μ)(μi − μ)T (3.7)
i=1
3.3 Previous Work 29

c
SW = (xk − μi )(xk − μi )T (3.8)
i=1 x∈X i
Ni is the number of training samples in class i, c is the number of distinct classes,

μi is the mean vector of samples belonging to class i and X i represents the set of
samples belonging to class i.
3.3.4 Subspace Discriminant Analysis (SDA)

The problem with traditional discriminant analysis methods is that they assume that
the sample vectors of each class are generated from underlying multivariate normal
distributions of common covariance matrix but with different mean values. Many
authors have addressed this problem by introducing extensions of LDA, for example
non-parametric DA [49] and Penalised DA [50]. However, these algorithms assume
that each class is represented by a single cluster and, therefore, none of them solve
the problem related by non-linearly separable classes. To solve this problem, one
can use non-linear methods such as Flexible DA [51] and Generalised DA [52].
However, they have two major problems: first, they require a very large number of
samples to obtain satisfactory results, and second a high computational cost in the
training and testing phases is also needed [6]. One method that addresses the above
problems is SDA. Its principal idea is to devise a solution which describes a large
number of data distributions, regardless of whether these correspond to compact sets
or not [6]. A method to achieve this is to approximate the underlying distribution of
each class as a mixture of Gaussians where each Gaussian will represent a subclass.
Figure 3.2 shows a two class problem (class of circles and class of stars) where
the second class is represented by the mixture of two Gaussians. It can clearly be
seen that there exist no direct linear method which can separate the two classes.
However, if data distribution of each class is approximated using a mixture of Gaus-
sians, the following generalised eigenvalue decomposition equation can be used to
calculate those discriminant vectors that best (linearly) classify the data:
ΣB V = ΣX V Λ (3.9)
where Σ B is the between-subclass scatter matrix, Σ X is the covariance matrix of the

data, V is a matrix whose columns correspond to the discriminant vectors where Λ
is a diagonal matrix whose elements are the corresponding eigenvalues.
3.3.4.1 Dividing Classes into Subclasses

As mentioned in the previous section, the essence of SDA is to divide each class
into different subclasses. The first question one may ask relates to how many sub-
classes should each class have? and which clustering approach is best suited in
Fig. 3.2 A two-class problem when one of the classes is a mixture of two Gaussians
order to divide the samples into a set of subclasses (clusters). Although, there exist
many clustering methods, it is accepted that the Nearest Neighbour (NN) method
yields superior or equivalent results when compared against other parametric meth-
ods such as K-means and Gaussian mixtures; or non-parametric clustering methods
such as the Valley-seeking algorithm of Koontz and Fukunaga [49]. In addition, the
NN-clustering is efficient because it can also be used when the number of samples
in each class is either large or small, and it does not require large computational
resources [6].
3.3.4.2 NN-Clustering
In a NN-Clustering approach the first step consists of sorting the feature vectors (i.e.
face images in our case) so that a set {xi1 , xi2 , . . . , xini } is constructed as follows: if
xi1 and xini are the two most distant feature vectors: arg max jk xi j − xik 2 where
x2 is a norm-2 length of x with xi2 being the closest feature vector to xi1 and
xi(n c−1 ) the closest feature vector to xin c . In general, xi j is the ( j − 1)th closest
feature vector to xi1 .
Once this done, the sorted set {xi1 , xi2 , . . . , xini } is divided into M subclasses Hi
where i = 1, . . . M. For example, data can be divided into two equally balanced (in
the sense of having the same number of samples) clusters (H1 and H2 ) by simply
partitioning the sorted set into two parts:{xi1 , . . . , xi,ni /2 } and {xi,(ni /2)+1 , . . . , xini }.
More generally, one can divide each class into h (equally balanced) subclasses;
i.e. Hi = h∀i. This is suitable for such a case where the underlying distribution
3.4 Face Recognition Using Filter Banks 31
of each class is not Gaussian, but can be represented as a combination of two

or more Gaussians. Another case is when the classes are not separable, but the
subclasses are.
3.4 Face Recognition Using Filter Banks
3.4.1 Gabor Filter Bank
The processing of facial images by a Gabor filter has been widly used for its biolog-
ical relevance and technical properties. The Gabor filter kernels have similar shapes
as the receptive fields of simple cells in the primary visual cortex [41]. They are
multiscale and multi-orientation kernels. The Gabor transformed face images yield
features that display scale, locality and differentiation properties. These properties
are quite robust to variability of face image formation, such as the variations of
illumination, head rotation and facial expressions.
3.4.1.1 Gabor Functions and Wavelets

The two-dimensional Gabor Wavelets function g(x, y) and its Fourier transform
G(u, v) can be defined as follows [53]:

1 1 x2 y2
g(x, y) = exp − + 2 + 2π j W x (3.10)
2π σx σ y 2 σx2 σy

1 (μ − W )2 υ2
G(μ, υ) = exp − + (3.11)
2 σμ2 συ2
Where σμ = 12 π σx and συ = 12 π σ y . Gabor functions can form a complete but

non-orthogonal basis set. Expanding a signal using this basis provides localised fre-
quency description. A class of self-similar functions, referred to as Gabor wavelets
in the following discussion, is now considered. Let g(x, y) be the mother Gabor
wavelet, then this self-similar filter dictionary can be obtained by appropriate dila-
tions and rotations of g(x, y) through the generating function:
gmn (x, y) = a −m G(x , y ) (3.12)
x = a −m (x cos θ + y sin θ ) (3.13)
y = a −m (−x sin θ + y cos θ ) (3.14)
where a > 1, m, n = integer and θ = nπ/k is the orientation(k is the number of

orientations) and a −m is the scale factor.
3.4.1.2 Gabor Filter Dictionary Design

The non-orthogonality of the Gabor wavelets implies that there is redundant infor-
mation in the filtered images, and the following strategy is used to reduce this redun-
dancy. Let Ul and Uh denote the lower and upper centre frequencies of interest.
Let K be the number of orientations and S be the number of scales in the multi-
resolution decomposition. As proposed by Manjunath and Ma [53], the design strat-
egy is to ensure that the half-peak magnitude support of the filter responses in the
frequency spectrum touch each other. This results in the following formulas for
computing the filter parameters σμ and συ (and thus σx and σ y ):
− S−1
1
Uh
a= (3.15)
Ul
(a − 1)Uh
σmu = √ (3.16)
(a + 1) 2 ln(2)
− 1

π 2σμ2 [2 ln(2)]2 σμ2 2
συ = tan Uh − 2 ln 2 ln(2) − (3.17)
2k Uh Uh2
where W = Uh , and m = 0, 1, . . . , S − 1. To eliminate the sensitivity of the filter

response to absolute intensity values, the real (even) components of the 2D Gabor
filters are biased by adding a constant to make them zero mean.
3.4.1.3 Augmented Gabor-Face Vector

Given any image I(x,y), its Gabor wavelet transformation is

∗
Wmn = I (x1 , y1 )gmn (x − x1 , y − y1 )d x1 dy1 (3.18)
∗
where gmn indicates the complex conjugate of gmn . The Gabor wavelet transfor-
mation of the facial image is calculated at S scales, m ∈ {0, 1, 2, . . . , S} and K
different orientations, n ∈ {0, 1, 2, . . . , K } and let us set Ul = 0.05 and Uh = 0.4.
Wmn denotes a Gabor wavelet transformation of a face image at the scale m and
orientation n. Figure 3.3 shows a sample face image from the database and its forty
filtered images (five scales: S = 5 and eight orientations: K = 8 have been taken).
The augmented Gabor-face vector can then be defined as follows [54]:
χ = (W0,0
t
, . . . , W S,K
t
)t (3.19)
where t is the transpose operator. The augmented Gabor-face vector can encompass
all facial Gabor wavelet transformations, and has important discriminatory informa-
tion that can be used in the classification step.
Fig. 3.3 Gabor filters (a) A face image from the database, (b) The filtered images: five scales and
eight orientations
3.4.2 Directional Filter Bank: A Review
A digital filter bank is a collection of digital filters with a common input or out-
put. The DFB is composed of an analysis bank (analysis filter bank) and a synthesis
bank. The analysis bank of the DFB splits the original image into 2n directionally
passed subband images (n is the order of the DFB) while the synthesis bank com-
bines the subband images into one image. A diagram of a DFB structure can be
given as a tree with two-band splits at the end of each stage (Fig. 3.4), where each
split increases the angular resolution by a factor of two.
In the analysis section of the DFB, the original image is split into two directional
subband images, then each subband image is split into two more directional images,
and so on until the order n, where 2n directional subband images are obtained.
Fig. 3.4 DFB structure

(a) 2 (b)
2
2 1
7 6 1
8 5 1 2
1 4
3 4
2 3 1 3 4
3 2
5 6 7 8
4 1
5 6 7 8 5 6 7 8
Fig. 3.5 The frequency partition map for an eight-band DFB. (a) Input (b) Eight subband outputs
At this point, the output is used as the input for the next stage. Each of the sub-
bands in the analysis part extracts frequency components based on the associated
frequency partition map as shown in Fig. 3.5.
In the synthesis bank, the dual operation is performed, i.e. the directional sub-
band images are combined into a reconstructed image in the reverse order of the
analysis stage to enable a perfect reconstruction of the signal. However, it is impor-
tant to mention that, in this work, we are only interested in the analysis section
since our goal is to extract discriminant features from each directional image. The
components of the analysis part are the downsampler D and the analysis filters
H0 and H1 .
3.4.2.1 Analysis Filters

One of the attractive features of the DFB is the fact that it can be implemented by
one filter prototype only. By using carefully designed unimodular matrices, the filter
design process can be reduced to require only one filter prototype H0 (ω). Therefore,
if the unimodular matrices which change the frequency components from R0i (ω) to
H0 (ω), for i = 1, 2, 3, and 4, respectively, are determined (Fig. 3.6), then the sys-
tems in Fig. 3.7(a,b) are identical and only one filter prototype H0 (ω) is required.
Consequently, H0 (ω) can replace the four remaining filters R0i (ω) using the unimod-
ular matrices.
Fig. 3.6 Five passbands for DFB

Fig. 3.7 Two identical structures in a DFB. (a) using R0i (ω) alone and (b) using a unimodular
matrix with H0 (ω)
3.4.2.2 Quincunx Downsampling

Quincunx downsampling uses quincunx 2×2 resampling matrices whose entries are
±1 so that their determinant is equal to 2 [55]. There are eight quincunx resampling
matrices and the most commonly used is:

1 1
Q1 = (3.20)
−1 1
Simply speaking, a quincunx downsampling corresponds to a rotated downsam-

pling. Figure 3.8 shows the original Lena image and its corresponding quincunx
downsampled image by Q 1 .
3.4.2.3 Overview of –band DFB

The Four-band DFB: A four-band DFB is composed of two-band DFBs (Fig. 3.9)
arranged in a tree-like structure. After the modulator, the constituent frequency com-
ponents are shifted, resulting in a diamond-like shape. Then, via the diamond fil-
ters, H0 (ω) and H1 (ω), each of the four frequency regions is filtered then down-
sampled by a quincunx downsampler. By cascading another set of two-band DFBs
at the ends of the first two-band DFB, a four-band directional decomposition is
obtained.
The 2n -band DFB: Two-band and four-band DFBs lead to 2n -band extensions.
To expand to eight bands, one can apply a third stage in a cascade fashion. With
an input whose directional frequencies are labeled as shown in Fig. 3.5(a), an
Fig. 3.8 The Lena image and

its quincunx downsampled
image by Q 1
Fig. 3.9 A two-band DFB structure
eight-band DFB generates the eight subband outputs shown in Fig. 3.5(b). It is worth
noting that each of the subband images is smaller than the original input, which is
necessary to ensure a maximal DFB decimation.
3.4.2.4 Directional Images

Directional images are obtained by applying all directional filters (as described
above). Practical experiments show (Fig. 3.12) that better results are achieved when
applying a two level DFB design, so four directional images are obtained at the end
of the DFB pre-processing. These directional images can be regarded as a decom-
position of the original image in four directions. Directional images contain features
associated with global directions rather than local directions. By creating directional
Fig. 3.10 Some samples from Yale Face Database

3.5 Proposed Method and Results Analysis 37
images, noise in the original image is divided into four different directions, thus
reducing its energy by a factor of four [9].
3.5 Proposed Method and Results Analysis
3.5.1 Proposed Method

Experimental tests have been performed using two different databases: FERET
Database [56] and YALE Face Database [57]. The FERET database was collected
in 15 sessions between August 1993 and July 1996 at George Mason University
and the US Army Research Laboratory facilities as part of the FERET programme
which was sponsored by the US Department of Defense Counter- drug Technol-
ogy Development Program. The database contains 1,564 sets of images for a total
of 14,126 images and includes 1,199 individuals and 365 duplicate sets of images.
A duplicate set is a second set of images of a person already in the database but
usually taken on a different day. Images were recorded with a 35 mm camera, sub-
sequently digitised and then converted to 8-bit gray-scale images. The resulting
images are stored as 256 × 384 images. The Yale database is a collection of 165
images of 15 different individuals where images belonging to a person (i.e. same
class) present variations in expressions and illumination conditions. In this database,
11 images of each individual are available (with different expressions: happy, sad,
sleepy... and different lighting sources: center, left and right), three are randomly
chosen to be used as reference faces while the eight remaining are used as input data
(test images). Figure 3.10 shows some face image samples from Yale Face Database.
The main contribution of this work is to improve the recognition rates of the
existing face recognition algorithms, such as PCA, LDA, ICA and SDA, by applying
a DFB pre-processing, thus demonstrating their suitability in capturing discriminant
information.
First, directional images were generated by applying the DFB to each face image
from the database. Figure 3.11 illustrates an example of an original face image
from the database and its directional images generated by the DFB. To evaluate
the effect of the level of DFB decomposition on the ability to capture discrimi-
nating information and hence recognition rate, extensive experiments were carried
out using both databases with varying levels of decomposition. The experiments
show that the best results are obtained when the level of the DFB equals either
two or four (Fig. 3.12). However, since the time of execution grows rapidly when
the order of the filter bank is increased, it makes sense to choose a two level DFB
decomposition. Thus four directional images are obtained for each face image in our
analysis.
In order to assess the efficiency of the proposed method, extensive experimen-
tation has been carried out using state-of-the-art face recognition algorithms such
as PCA, LDA, ICA and SDA. Experiments were conducted on data with and with-
out the DFB pre-processing step as follows: first, the four methods are applied in
isolation, and then in combination.
Fig. 3.11 Directional images generated by DFB. (a) Directional Image 1, (b) Directional Image 2,
(c) Directional Image 3, (d) Directional Image 4
0.9
0.8
0.7
Recognition Rate
0.6
0.5
0.4
0.3
0.2
0.1
0
N=2 N=3 N=4 N=5 N=6 N=7
DFB Decomposition Level
Fig. 3.12 Recognition rates for different orders of the DFB
3.5.2 PCA
In this experiment the original face database is used to extract features using the tra-
ditional Eigenfaces algorithm. The recognition rate is calculated for all the remain-
ing faces in the database. The same system is maintained and applied to a new
database obtained after DFB pre-processing. An NN algorithm using Euclidian
distance is used to compute the distances between the different feature vectors.
Table 3.3 Experiment results for the DFB–PCA method and comparison with the PCA algorithm.
Faces PCA(%) DFB–PCA(%) Improvement(%)
Normal 53.33 86.67 +62.50

No glasses 60 86.67 +44.45
Wink 53.33 86.67 +62.50
Glasses 60 73.33 +22.22
Sleepy 60 86.67 +33.33
Surprised 60 80 +33.33
Sad 53.33 86.67 +62.50
Left-light 13.33 33.33 +150.04
Global recognition rate 51.67 77.50 +49.99
Table 3.3 shows the results of this experiment over all the different expressions
and lighting conditions of the face images in the database.
Note that the improvement mentioned in Table 3.3 is a relative improvement and
can be calculated from the following equation:
Rate(D F B S D A) − Rate(S D A)
I mpr ovement = (3.21)
Rate(S D A)
It can be seen from Table 3.3 that low recognition accuracies are obtained for
both methods (i.e. PCA alone PCA with DFB pre-processing). However, it is inter-
esting to remark that the worst results are obtained for faces with changes in light-
ing conditions (only 13% for PCA), but while using the Directional Filters, the
recognition rate has been improved by more than 150%. A general increase in
the recognition accuracy of around 50% over all the faces is enough to conclude
that a DFB implementation outperforms significantly its Eigenfaces counterpart
algorithm.
Figure 3.13 illustrates the results of an experiment conducted to show how the
database size affects the recognition accuracy. To do so, 15 image faces are ran-
domly chosen from the Yale Database as test images while the number of reference
images per person is increased each time by one. A comparison with the GFB [7]
approach has been made to demonstrate that the proposed method clearly outper-
forms the other pre-processing algorithms even when database size is important.
3.5.3 ICA
This experiment is performed as with the PCA but using the FastICA algorithm
instead of the Eigenfaces algorithm. The results obtained are reported in Table 3.4
and the effect of the database size with a comparison with the GFB approach is
showed in Fig. 3.14.
0.9
0.8
0.7
Recognition Rate
0.6
0.5
0.4
PCA
0.3 GFB−PCA
DFB−PCA
0.2
0.1
0
30 45 60 75 90
Database Size
Fig. 3.13 Recognition rate of PCA-based algorithms
The ICA technique using both approaches significantly outperforms the PCA. In
addition, it can also be seen that DFB is able to improve further the ICA especially
for situations in which large facial changes occur (light source, glasses, etc.). An
overall recognition rate of 80.83% is obtained for the combined ICA–DFB method
with an overall improvement of 12.78%. This result clearly demonstrates the dis-
criminating strength of a DFB pre-processing step.
Table 3.4 Experiment results for the DFB–ICA method and comparison with the ICA algorithm
Faces ICA(%) DFB–ICA(%) Improvement(%)
Normal 80 93.33 +16.67

No glasses 73.33 80 +09.10
Wink 86.67 86.67 0
Glasses 66.67 73.33 +10
Sleepy 93.33 93.33 0
Surprised 66.67 86.67 +30
Sad 93.33 100 +7.15
Left-light 13.33 33.33 +150.04
0.9
0.8
0.7
Recognition Rate
0.6
0.5
0.4
0.3
0.2
ICA
0.1 GFB−ICA
DFB−ICA
0
30 45 60 75 90
Database Size
Fig. 3.14 Recognition rate of ICA-based algorithms
3.5.4 LDA
It is well known that the main problem with principal component methods (PCA and
ICA) is the fact that they have no information about the class of each vector in the
training database. This means that each face image is treated separately. This disad-
vantage has been resolved when using the LDA method since all the face images for
one person are considered as one class. The same procedure is used as in the previ-
ous cases and the results obtained are showed in Table 3.5. A comparison with the
Gabor approach is also illustrated in Fig. 3.15. The results obtained clearly show the
LDA technique using both approaches (with and without DFB pre-processing) sig-
nificantly outperforms the PCA. In addition, it can also be seen that DFB is able to
improve further the LDA especially when significant changes in the image occur. An
overall recognition rate of 91.67% is obtained for the combined LDA–DFB method
with an overall improvement of 4.77% which clearly demonstrates the discriminat-
ing strength of a DFB pre-processing step.
3.5.5 SDA
The principal idea of SDA is to divide each class (of the original LDA algorithm)
into multiple subclasses. This property is very interesting in our method, since, from
Table 3.5 Experiment results for the DFB–LDA method and comparison with the ICA algorithm
Faces LDA(%) DFB–LDA(%) Improvement(%)
Normal 93.33 93.33 0

No glasses 80 93.33 +16.67
Wink 93.33 100 +8.22
Glasses 80 86.67 +8.34
Sleepy 93.33 93.33 0
Surprised 80 93.33 +16.67
Sad 100 93.33 −6.67
Left-light 80 80 0
each face image in the database, 2n directional images are generated (with n being
the order of the DFB). The best application of the SDA is to place all the direc-
tional faces of a person into the same subclass. Figure 3.16 shows the proposed
scheme for this method. To assess the performance of the method, the same steps
used in the previous approaches are followed: the original face database is used to
extract the features using the SDA algorithm as proposed in [6] and the recognition
rate is calculated for all remaining faces in the database. A combined DFB–SDA
method is used as illustrated in Fig. 3.16 to compute the new recognition rates.
0.9
0.8
0.7
Recognition Rate
0.6
0.5
0.4
0.3
0.2 LDA
GFB−LDA
0.1
DFB−LDA
0
30 45 60 75 90
Database Size
Fig. 3.15 Recognition rate of LDA-based algorithms

Fig. 3.16 Proposed scheme for the DFB–SDA method
Table 3.6 Experiment results for the DFB–SDA method and comparison with the SDA algorithm
Faces SDA(%) DFB–SDA(%) Improvement(%)
Normal 93.33 93.33 0

No glasses 86.67 86.67 0
Wink 93.33 100 +8.22
Glasses 93.33 100 +8.22
Sleepy 100 100 +0
Surprised 93.33 93.33 0
Sad 100 93.33 −6.67
Left-light 73.33 93.33 +27.27
The results obtained for both SDA and DFB–SDA methods and the improvement
observed for different poses in the database are depicted in Table 3.6. The results
obtained demonstrate that a combined DFB–SDA approach improves the recogni-
tion rate obtained when applying the SDA algorithm alone by 4.54%. In addition,
with an overall recognition rate of 95.83%, it can also be concluded that the idea of
dividing the classes into subclasses is compatible with DFB-based pre-processing.
Figure 3.17 shows the effect of database growing on the global recognition rate and
a comparison with the Gabor approach.
3.5.6 FERET Database Results

To demonstrate the efficiency of the proposed method, similar experiments using the
FERET Face Database (Fig. 3.18 ) have been carried out. The following steps: ref-
erence and test databases are constructed from the original FERET database; then
0.9
0.8
0.7
Recogntion rate
0.6
0.5
0.4
0.3
0.2
SDA
0.1 GFB−SDA
DFB−SDA
0
30 45 60 75 90
Database size
Fig. 3.17 Recognition rate of SDA-based algorithms
PCA, LDA, ICA and SDA (alone and pre-processed by the DFB) algorithms are
applied on the following database sizes: 50, 100, 200 and 300 using only one image
by person as reference. The average recognition rate is then calculated for all tests.
Table 3.7 depicts the experimental results obtained. From the table, it can be seen
that DFBs improve the results obtained using a larger database with varying condi-
tions such as head rotation and face sizes. Overall, the improvements for the differ-
ent algorithms are all over 13% which is very satisfying.
Fig. 3.18 Some image

samples from FERET
Database
References 45
Table 3.7 Experiment results for the different methods with the FERET Database
Method PCA(%) ICA(%) LDA(%) SDA(%)
Without DFB 72.33 61.77 71.67 74.22

With DFB 84.89 74.00 81.11 84.90
Improvements +17.36 +19.80 +13.17 +14.39
3.6 Conclusion
This chapter proposes a new method to enhance existing face recognition methods
such as PCA, ICA, LDA and SDA by using a DFB pre-processing. The results have
shown that this pre-processing step yields robustness against changes in expressions
and illumination conditions. This step can also be very helpful when the number of
face images in the database is insufficient since the number of images will increase
by a factor of 2n (n is the order of the DFB), thus providing more discriminant power
for the classification phase. It has been shown that this method is at least as good as
all the other approaches including those with GFB pre-processing.
The effect of DFB pre-processing is significant for the YALE and FERET
databases. This is demonstrated by overall recognition rate improvements varying
from 4.54% for the SDA algorithm to 49.99% for the PCA.
The efficiency of the proposed method has been demonstrated by improvements
of (Yale=49.99%, FERET=17.36%) for PCA, (Yale=12.78%, FERET=19.80%)
for ICA, (Yale=4.77%, FERET=13.17%) for LDA and (Yale=4.54%,
FERET=14.39%) for SDA. A recognition rate of 95.83% has been obtained
for the SDA algorithm combined with DFB pre-processing.
References
1. P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques,
J. Min and W. Worek, “Overview of the face recognition grand challenge,” IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 947–954,
June 2005.
2. A. Rosenfeld, W. Zhao, R. Chellappa and P. J. Phillips, “Face recognition: A literature survey,”
ACM Computing Surveys, vol. 35, no. 4, pp. 399–458, 2003.
3. M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience,
vol. 3. no. 1, pp. 71–86, 1991.
4. M. S. Bartlett, J. R. Movellan and T. J. Sejnowski, “Face recognition by independent com-
ponent analysis,” IEEE Transactions on Neural Networks, vol. 13, no. 6, pp. 1450–1464,
November 2002.
5. P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. fisher-faces: Recogni-
tion using class specific linear projection,” IEEE Transactions Pattern Analysis and Machine
Intelligence, vol. 19, no. 7, pp. 711720, July 1997.
6. M. Zhu and A. M. Martinez, “Subclass discriminant analysis,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274–1286, August 2006.
7. W. R. Boukabou, L. Ghouti and A. Bouridane, “Face recognition using a Gabor filter bank
approach,” First NASA/ESA Conference on Adaptive Hardware and Systems, pp. 465–468,
June 2006.
8. C. H. Park, J. J. Lee, M. Smith, S. Park and K. H. Park, “Directional filter bank based finger-
print feature extraction and matching,” IEEE Transactions on Circuits and Systems For Video
Technology, vol. 14, pp. 74–85, January 2004.
9. M. A. U. Khan, M. K. Khan, M. A. Khan, M. T. Ibrahim, M. K. Ahmed and J. A.
Baig, “Improved pca based face recognition using directional filter bank,” IEEE INMIC,
pp. 118–124, December 2004.
10. X. Yi qiong, L. Bi cheng and W. Bo, “Face recognition by fast independent component anal-
ysis and genetic algorithm,” IEEE on Computer and Information Technology, vol. 28, no. 8,
pp. 194–198, August 2004.
11. P. J. Phillips, G. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi and J. M. Bone, Frvt
2002: Overview and summary. http://www.frvt.org/FRVT2002/documents.htm, March 2003.
12. D. M. Blackburn, Biometrics 101, version 3.1, volume 12. Federal Bureau of Investigation,
March 2004.
13. M. H. Yang, D. J. Kriegman and N. Ahuja, “Detecting faces in images: A survey,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, p. 3458, January
2002.
14. J. Fagertun, Face Recognition. PhD thesis, Technical University of Denmark, 2006.
15. G. Yang and T. S. Huang, “Human face detection in complex background,” Pattern Recogni-
tion, vol. 27, no. 1, pp. 53–63, 1994.
16. K. C. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision Comput-
ing, vol. 15, no. 9, pp. 713–735, 1997.
17. Y. Dai and Y. Nakano, “Face-texture model based on sgld and its application in face detection
in a color scene,” Pattern Recognition, vol. 29, no. 6, pp. 1007–1017, 1996.
18. S. McKenna and S. Gong and Y. Raja, “Modelling facial colour and identity with Gaussian
mixtures,” Pattern Recognition, vol. 31, no. 12, pp. 1883–1892, 1998.
19. R. Kjeldsen and J. Kender, “Finding skin in color images,” Automatic Face and Gesture
Recognition, pp. 312–317, 1996.
20. T. Craw, D. Tock and A. Bennett, “Finding face features,” Proceeding of Second European
Conference on Computer Vision, pp. 92–96, 1992.
21. A. Lanitis, C. J. Taylor and T. F. Cootes, “An automatic face identification system using flex-
ible appearance models,” Image and Vision Computing, vol. 13, no. 5, pp. 393–401, 1995.
22. K.-K. Sung and T. Poggio, “Example-based learning for view-based human face detection,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39–51,
January 1998.
23. H. Rowley, S. Baluja and T. Kanade, “Neural network-based face detection,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23–38, January 1998.
24. E. Osuna, R. Freund and F. Girosi, “Training support vector machines: An application to face
detection,” Proceeding of IEEE Conference on Computer Vision and Pattern Recognition,
pp. 130–136, 1997.
25. H. Schneiderman and T. Kanade, “Probabilistic modeling of local appearance and spatial
relationships for object recognition,” Proceeding of IEEE Conference on Computer Vision
and Pattern Recognition, pp. 45–51, 1998.
26. A. Rajagopalan, K. Kumar, J. Karlekar, R. Manivasakan, M. Patil, U. Desai, P. Poonacha
and S. Chaudhuri, “Finding faces in photographs,” Proceeding of Sixth IEEE International
Conference on Computer Vision, pp. 640–645, 1998.
27. A. J. Colmenarez and T. S. Huang, “Face detection with information-based maximum dis-
crimination,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,
pp. 782–787, 1997.
28. W. Zhao and R. Chellappa, “Face Processing: Advanced Modeling and Methods,” Academic
Press, New York, 2006.
References 47
29. B. Moghaddam and A. Pentland, “Probabilistic visual learning for object representation,”
July 1997.
30. P. J. Phillips, “Support vector machines applied to face recognition,” Proceedings of the 1998
conference on Advances in neural information processing systems, pp. 803–809, 1998.
31. C. Liu and H. Wechsler, “Evolutionary pursuit and its application to face recognition,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 570–582, June
2000.
32. S. Z. Li and J. Lu, “Face recognition using the nearest feature line method,” IEEE Transactions
on Neural Networks, vol. 10, no. 2, pp. 439–443, March 1999.
33. M. S. Bartlett, H. M. Lades and T. J. Sejnowski, “Independent component representation
for face recognition,” Proceedings of SPIE Symposium on Electronic Imaging: Science and
Technology, pp. 528–539, 1998.
34. M.-H. Yang, “Kernel eigenfaces vs. kernel fisherfaces: Face recognition using kernel meth-
ods,” FGR ’02: Proceedings of the Fifth IEEE International Conference on Automatic Face
and Gesture Recognition, IEEE Computer Society, Washington, DC, USA, p. 215, 2002.
35. M. D. Kelly, “Visual identification of people by computer,” Tech. rep. AI-130, Stanford AI
Project, Stanford, CA, 1970.
36. T. Kanade, “Picture processing system by computer complex and recognition of human faces,”
In doctoral dissertation, Kyoto University, November 1973.
37. T. Kanade, “Computer recognition of human faces,” Interdisciplinary Systems Research,
vol. 47, 1977.
38. I. J. Cox, J. Ghosn and P. N. Yianilos, “Feature-based face recognition using mixture-
distance,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,
pp. 209–216, 1996.
39. B. S. Manjunath, R. Chellappa and C. V. D. Andmalsburg, “A feature based approach to face
recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,
pp. 373–378, 1992.
40. K. Okada, J. Steffans, T. Maurer, H. Hong, E. Elagin, H. Neven and C. Von der Mals-
burg, “The Bochum/USC Face Recognition System And How it Fared in the FERET
Phase III test.” In H. Wechsler, P. J. Phillips, V. Bruce, F. Fogeman Soulié and T.
S. Huang, editors, Face Recognition: From Theory to Applications. Springer-Verlag,
pp. 186–205, 1998.
41. L. Wiskott, J.-M. Fellous and C. Von Der Malsburg, “Face recognition by elastic bunch graph
matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7,
pp. 775–779, January 1997.
42. F. Samaria, Face Recognition Using Hidden Markov Models. PhD thesis, University of Cam-
bridge, UK, 1994.
43. A. V. Nefian and M. H. Hayes, “Hidden Markov models for face recognition,” In Proceed-
ings of International Conference on Acoustics, Speech and Signal Processing, pp. 2721–2724,
1998.
44. A. Pentland, B. Moghaddam and T. Starner, “View-based and modular eigenspaces for face
recognition,” Proceeding of IEEE Conference on Computer Vision and Pattern Recognition,
pp. 84–91, June 1994.
45. P. Penev and J. Atick, “Local feature analysis: A general statistical theory for object represen-
tation,” Network: Computation in Neural Systems, vol. 7, pp. 477–500, June 1996.
46. J. Huang and B. Heisele, “Scomponent-based face recognition with 3d morphable models,” In
Proceedings of International Conference on Audio-and Video-Based Person Authentication,
2003.
47. M. Turk and A. Pentland, “Face recognition using eigenfaces,” IEEE Conference on Computer
Vision and Pattern Recognition, pp. 586–591, June 1991.
48. A. J. Bell and T. J. Sejnowski, “The independent components of natural scenes are edge filters”
Vision Research, vol. 37, no. 23, pp. 3327–3338, 1997.
49. K. Fukunaga, “Introduction to Statistical Pattern Recognition,” (2nd edition). Academic Press,
New York, 1990.
50. A. Buja, T. Hastie and R. Tibshirani, “Penalized discriminant analysis,” Annals of Statistics,
vol. 23, pp. 73–102, 1995.
51. T. Hastie, R. Tibshirani and A. Buja, “Flexible discriminant analysis by optimal scoring,”
Journal of the American Statistical Association, vol. 89, pp. 1255–1270, 1994.
52. G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural
Computation, vol. 12, pp. 2385–2404, 2000.
53. B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,”
August 1996.
54. G. Dai and C. Zhou, “Face recognition using support vector machines with the robust feature,”
The 2003 IEEE International Workshop on Robot and Human interactive Communication
Millbrae, California, USA, 2003.
55. S. Park, “New Directional Filter Banks and Their Applications in Image Processing.” PhD
thesis, Georgia Institute of Technology, 1999.
56. P. J. Phillips, H. Moon, P. J. Rauss S. and Rizvi, “The feret evaluation methodology for face
recognition algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 22, no. 10, pp. 1090–1104, October 2000.
57. Department of Computer Science Yale University. The Yale face database.
http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
Chapter 4
Recent Advances in Iris Recognition:
A Multiscale Approach
4.1 Introduction
Consistent and protected identification of a person is a key subject in security. In

government and conventional environments, security is usually provided through
badges, provision of information for visitors and issuing of keys. These are the most
common means of identification since they are the easiest to remember and the eas-
iest to confirm. However, these solutions are the most unreliable putting all compo-
nents of security at risk. IDs can be stolen, passwords can be forgotten or cracked. In
addition, security breaches resulting in access to restricted areas of airports or other
sensitive areas are a source of concern for terrorist activities. Although there are
laws against false identification, incidents of invasions and unauthorised modifica-
tions to information occur daily with catastrophic effects. For example, credit card
fraud is rapidly increasing and traditional technologies are not sufficient to reduce
the impact of counterfeiting and/or security breaches therefore a more secure tech-
nology is needed to cope with the drawbacks and pitfalls [1]. Biometrics, the use of
biology, which deals with data statistically, provides a powerful answer to this need,
since the uniqueness of an individual arises from his/her personal or behavioural
characteristics with no passwords or numbers to remember. These include finger-
print, retinal and iris scanning, hand geometry, voice patterns, facial recognition
and other techniques. Typically a biometric recognition system records data from a
user and performs a comparison each time the user attempts to claim his/her identity.
Biometric systems can be classified into two operating modes: verification and
identification modes. In the “verification” mode, the user claims an identity and
the system compares the extracted features with the stored template of the asserted
identity to determine if the claim is true or false. In the “identification” mode, no
identity is claimed and the extracted feature set is compared with the templates of all
the users in the database in order to recognise the individual. For such approaches
to be widely applicable, they must be highly reliable [2]. Reliability relates to the
ability of the approach to support a signature that is unique to an individual and
that can be captured in an invariant manner over time. The use of biometric traits
require that a particular biometric factor be unique for each individual that it can be
readily measured, and that it is invariant over time. Biometrics such as signatures,

Technology, DOI 10.1007/978-0-387-09532-5 4,

50 4 Recent Advances in Iris Recognition
photographs, fingerprints, voiceprints and retinal blood vessel patterns, all have sig-
nificant drawbacks. Although signatures and photographs are cheap and easy to
obtain and store, they are insufficient to identify automatically with assurance, and
can be easily forged. Electronically recorded voiceprints are susceptible to changes
in a person’s voice, and they can be counterfeited. Fingerprints or handprints require
physical contact, and they also can be counterfeited and marred by artifacts [2].
It is currently accepted within the biometric community that biometrics has the
potential for high reliability because it is based on the measurement of an intrin-
sic physical property of an individual. Fingerprints, for example, provide signa-
tures that appear to be unique to an individual and reasonably invariant with age,
whereas faces, while fairly unique in appearance can vary significantly with time
and place. Invasiveness, the ability to capture the signature while placing as few
constraints as possible on the subject of evaluation, is another constraint. In this
regard, acquisition of a fingerprint signature is invasive as it requires that the sub-
ject makes physical contact with a sensor, whereas images of a subject’s face or
iris that are sufficient for recognition can require a comfortable distance. Consid-
erations of reliability and invasiveness suggest that the human iris is a particularly
interesting structure on which to base a biometric approach for personnel verifica-
tion and identification [3]. From the point of view of reliability, the special patterns
that are visually apparent in the human iris are highly distinctive to an individual;
the appearance of a subject’s iris suffers little from day to day variations. In addi-
tion, the method is non-invasive since the iris is an overt body that can be imaged
at a comfortable distance from a subject with the use of extant machine vision tech-
nology. Owing to these features of reliability and non invasiveness, iris recogni-
tion is a promising approach to biometric-based verification and identification of
people [2].
An authentication system on iris recognition is reputed to be the most accurate
among all biometric methods because of its acceptance, reliability and accuracy.
Ophthalmologists originally proposed that the iris of the eye might be used as a
kind of optical fingerprint for personal identification [4]. Their proposal was based
on clinical results that every iris is unique and it remains unchanged in clinical
photographs. The human iris begins to form during the third month of gestation and
is complete by the eighth month, though pigmentation continues into the first year
after birth. It has been discovered that every iris is unique since two people (even
two identical twins) have uncorrelated iris patterns [5], and yet stable throughout
the human life. It is suggested in recent years that the human irises might be as
distinct as fingerprint for different individuals, leading to the idea that iris patterns
may contain unique identification features.
In 1936, Frank Burch, an ophthalmologist, proposed the idea of using iris patterns
for personal identification [6]. However, this was only documented by James Dog-
garts in 1949. The idea of iris identification for automated recognition was finally
patented by Aran Safir and Leonard Flom in 1987 [6]. Although they had patented
the idea, the two ophthalmologists were unsure as to a practical implementation of
the system. They commissioned John Daugman to develop the fundamental algo-
rithms in 1989. These algorithms were patented by Daugman in 1994 and now
4.2 Related Work: A Review 51
form the basis for all current commercial iris recognition systems. The Daugman
algorithms are owned by Iridian Technologies and they are licensed to several other
companies [6].
4.2 Related Work: A Review

Research in the area of iris recognition has been receiving considerable attention
and a number of techniques and algorithms have been proposed over the last few
years.
Flom and Safir first proposed the concept of automated iris recognition in 1987
[7]. Since then, a number of researchers have worked on iris representation and
matching, and significant progress has been made [8, 9–11]. Daugman [4, 12] made
use of multiscale Gabor filters to demodulate texture phase structure information
of the iris. In his work, filtering an iris image with a family of filters resulted in
1,024 complex-valued phasors which denote the phase structure of the iris at dif-
ferent scales. Each phasor was then quantised to one of the four quadrants in the
complex plane. The resulting 2,048-component iris code was used to describe an
iris. The difference between a pair of iris codes was measured by their Hamming
distance. Sanchez-Reillo et al. [13] provided a partial implementation of the algo-
rithm proposed by Daugman. Wildes et al. [3] represented the iris texture with a
Laplacian pyramid constructed with four different resolution levels and used the
normalised correlation to determine whether the input image and the model image
are from the same class. Boles and Boashash [14] calculated a zero-crossing repre-
sentation of one-dimensional (1D) wavelet transform at various resolution levels of
a concentric circle on an iris image to characterise the texture of the iris. Iris match-
ing was based on two dissimilarity functions. In [15], Sanchez-Avila et al. further
developed the iris representation method proposed by Boles et al. [14]. They made
an attempt to use different similarity measures for matching, such as Euclidean and
Hamming distances. Lim et al. [16] decomposed an iris image into four levels using
2D Haar wavelet transform and quantised the fourth-level high-frequency informa-
tion to form an 87-bit code. A modified competitive learning neural network (LVQ)
was adopted for the classification process. Tisse et al. [17] analysed the iris charac-
teristics using an analytic image constructed by the original image and its Hilbert
transform. Emergent frequency functions for feature extraction were in essence
samples of the phase gradient fields of the analytic image’s dominant components
[18–20]. Similar to the matching scheme of Daugman, they sampled the binary
emergent frequency functions to form a feature vector and used a Hamming dis-
tance for matching. Kumar et al. [20] utilised correlation filters to measure the con-
sistency of iris images from the same eye. The correlation filter of each class was
designed using the two-dimensional (2D) Fourier transforms of training images.
If the correlation output (the inverse Fourier transform of the product of the input
image’s Fourier transform and the correlation filter) exhibited a sharp peak, the input
image was determined to be from an authorised subject, otherwise an imposter one.
Bae et al. [10] projected the iris signals onto a bank of basis vectors derived by
independent component analysis and quantised the resulting projection coefficients
as features. In another approach by Li Ma et al., Multichannel [9] and Even Symme-
try Gabor filters [4] were used to capture local texture information of the iris, which
is used to construct a fixed length feature vector. Nearest feature line method is used
for iris matching. In [21] a set of 1D intensity signals is constructed to effectively
characterise the most important information of the original 2D image using a par-
ticular class of wavelets; a position sequence of local sharp variation points in such
signals is recorded as features. A fast matching scheme based on an exclusive OR
operation is used to compute the similarity between a pair of position sequences.
4.3 Iris Localisation
4.3.1 Background
The eye is essentially made up of two parts: the scalera or “white” portion of the
eye and cornea. The scalera consists of closely interwoven fibres and a small section
in the front and centre known as the cornea. The cornea consists of fibres arranged
in regular fashion. Conveniently this makes the cornea transparent, allowing light
to filter in. Behind the cornea is the anterior chamber filled with a fluid known as
the aqueous humor. A spongy tissue, the ciliary bodies, arranged around the edge
of the cornea, constantly produces the aqueous humor. Immersed in the aqueous
humor is a ring of muscles commonly referred to as the iris. The word iris is most
likely derived from the Latin word for rainbow. It appears that the term was first
applied in the sixteenth century, making reference to this multicoloured portion of
the eye [2, 3]. The iris itself extends out in front of the lens, forming a circular array,
with a variable opening in the centre, otherwise known as the pupil. The pupil is
not located exactly in the centre of the iris, but rather slightly nasally and inferiorly
(below the centre) [4]. The iris, which is made up of two bands of muscles, controls
the pupil, the dilator, which contracts to enlarge the pupil, and the sphincter, which
contracts to reduce the size of the pupil. The visual appearance of the iris is directly
related to its multilayered construction.
4.3.2 Iris Segmentation
Image acquisition captures the iris as part of a larger image that also contains data
derived from the immediately surrounding eye region. Therefore, prior to perform-
ing iris pattern matching, it is important to localise that portion of the acquired image
that corresponds to the iris [3]. This corresponds to the portion of the image derived
from inside the limbus (the border between the sclera and the iris) and outside the
pupil (Fig. 4.1 from CASIA iris database). If the eyelids are an occluding part of the
iris, then only that portion of the image below the upper eyelid and above the lower
4.3 Iris Localisation 53
Fig. 4.1 Eye image
eyelid should be included. The eyelid boundary can also be irregular due to the pres-
ence of eyelashes. From these suggestions, it can be said that in iris segmentation a
wide range of edge contrasts must be taken into consideration, and iris segmentation
must be robust and effective.
4.3.3 Existing Methods for Iris Localisation
Methods such as the Integro-differential, Hough transform and active contour mod-
els are well known techniques in use for iris localisation. These methods are
described below including their strengths and weaknesses.
4.3.3.1 Daugman s Integro-Differential Operator

In order to localise an iris, Daugman proposed an Integro-differential operator. The
operator assumes that the pupil and limbus are circular contours and operate as a
circular edge detector [4]. Detecting the upper and lower eyelids is also carried out
using the Integro-differential operator by adjusting the contour search from circu-
lar to a designed arcuate shape [22]. The Integro-differential operator is defined as
follows:

∂ I (x, y)

max(r, x0 , y0 ) G σ (r ) ∗ ds . (4.1)
∂r (r,x0 ,y0 ) 2πr
The operator pixel-wise searches throughout the raw input image, I(x,y), and
obtains the blurred partial derivative of the integral over normalised circular con-
tours in different radii. The pupil and limbus boundaries are expected to maximise
the contour integral derivative, where the intensity values over the circular borders
would make a sudden change.
Gσ (r) is a smoothing function controlled by σ that smoothes the image intensity
for a more precise search.
This method can result in false detection due to noise such as strong boundaries
of upper and lower eyelids since it works only on a local scale.
4.3.3.2 Hough Transform

Hough transform is a standard image analysis tool for finding curves that can be
defined in a parametrical form such as lines and circles. The circular Hough trans-
form can be employed to deduce the radius and centre coordinates of the pupil and
iris regions.
Wildes [3], Kong and Zhang [23], Tisse et al. [17] and Ma et al. [24] have all used
Hough transform to localise irises. The localisation method, similar to Daugman s
method, is also based on the first derivative of the image. In the method proposed by
Wildes, an edge map of the image is first obtained by thresholding the magnitude of
the image intensity gradient:
|∇G(x, y) ∗ I (x, y)|

1 − (x−x0 )2 +(y−y0)
2 (4.2)
where ∇ ≡ ∂/∂x , ∂/∂ y and G(x, y) = e 2σ2
2π σ 2
G(x,y) is a Gaussian smoothing function with scaling parameter σ to select the

proper scale of edge analysis. The edge map is then used in a voting process to
maximise the defined Hough transform for the desired contour. A maximum point
in the Hough space will correspond to the radius r and centre coordinates xc and yc
of the circle best defined by the edge points according to equation
x2c + y2c + r2c = 0. (4.3)
Wildes et al. and Kong and Zhang also make use of the parabolic Hough trans-
form to detect the eyelids by approximating the upper and lower eyelids with
parabolic arcs.
The Hough transform method requires the threshold values to be chosen for edge
detection, and this may result in critical edge points being removed, thus resulting
in failures to detect circles/arcs. In addition, Hough transform is computationally
intensive due to its “brute-force” approach, and thus may not be suitable for real-
time applications.
4.3.3.3 Discrete Circular Active Contours

Ritter proposed an active contour model to localise iris in an image [25]. The model
detects pupil and limbus by activating and controlling the active contour using two
defined forces: internal and external forces.
The internal forces are designed to expand the contour and keep it circular. This
force model assumes that pupil and limbus are globally circular, rather than locally,
to minimise the undesired deformations due to peculiar reflections and dark patches
near the pupil boundary. The contour detection process of the model is based on
4.4 Proposed Method for Iris Localisation 55
the equilibrium of the defined internal forces with the external forces. The exter-
nal forces are obtained from the grey level intensity values of the image and are
designed to push the vertices inward.
The movement of the contour is based on the composition of the internal and
external forces over the contour vertices. Each vertex is moved between time t and
(t+1) by
Vi (t + 1) = Vi (t) + Fint,i + Fext,I (4.4)
where Fint,i i is the internal force, Fext,i is the external force and Vi is the position of
vertex i.
A point interior to the pupil is located from a variance image and then a discrete
circular active contour (DCAC) is created with this point as its centre. The DCAC
is then moved under the influence of internal and external forces until it reaches
equilibrium, and the pupil is then localised.
4.4 Proposed Method for Iris Localisation
To the best knowledge of the authors, there exist no previous work on iris seg-
mentation and localisation using a multiscale approach. In this chapter we pro-
pose a novel approach for iris segmentation with multiscale edge detection based
on wavelet maxima which can provide significant edges where noise disappears
with an increase of the scales (to a certain level), with less texture points produc-
ing local maxima thus enabling us to find the real geometrical edges of the image
thereby yielding an efficient detection of the significant circles for inner and outer
iris boundaries and eyelids.
In our proposed method, multistage edge detection is used to extract the points of
sharp variations (edges) with modulus maxima where the local maxima are detected
to produce only single pixel edges. Depending on the requirement of details desired
in the edges the level of decomposition can be selected.
4.4.1 Motivation
4.4.1.1 Edge Detector Using Wavelets

Edges in images can be mathematically defined as local singularities. Until recently,
the Fourier transform was the main mathematical tool for analysing singularities.
However, the Fourier transform is global and as such not well adapted to local singu-
larities and it is hard to find the location and spatial distribution of singularities with
Fourier transforms. On the other hand, Wavelet transforms provide a local analysis;
they are especially suitable for time-frequency analysis [26] such as for singular-
ity detection problems. With the growth of wavelet theory, the wavelet transforms
have been found to be remarkable mathematical tools to analyse the singularities
including the edges, and further, to detect them effectively. Mallat, Hwang, and
Zhong [27, 28] proved that the maxima of the wavelet transform modulus can detect
the location of the irregular structures. The wavelet transform characterises the local
regularity of signals by decomposing them into elementary building blocks that are
well localised both in space and frequency. This not only explains the underlying
mechanism of classical edge detectors, but also indicates a way of constructing opti-
mal edge detectors under specific working conditions.
A remarkable property of the wavelet transform is its ability to characterise the
local regularity of functions. For an image f(x, y), its edges correspond to singular-
ities of f(x, y), and thus are related to the local maxima of the wavelet transform
modulus. Therefore, the wavelet transform can be used as an effective method for
edge detection.
Assume f(x, y) is a given image of size M × N. At each scale j with j>0 and
S0 f = f(x, y), the wavelet transform decomposes S j−1 f into three wavelet bands: a
low-pass band S j f, a horizontal high-pass band W Hj f and a vertical high-pass band
WVj f. The three wavelet bands (S j f, W Hj f, WVj f) at scale j are of size M×N, which is
the same as the original image, and all filters used at scale j (j>0) are upsampled by
a factor of 2 j compared with those at scale zero. In addition, the smoothing function
used in the construction of a wavelet reduces the effect of noise. Thus, the smoothing
step and edge detection step are combined together to achieve the optimal result.
4.4.1.2 Multiscale Edge Detection

The resolution of an image is directly related to the appropriate scale for edge detec-
tion. High resolution and a small scale will result in noisy and discontinuous edges;
low resolution and a large scale will result in undetected edges. The scale controls
the significance of edges to be shown. Edges of higher significance are more likely
to be preserved by the wavelet transform across the scales. Edges of lower signifi-
cance are more likely to disappear when the scale increases.
Since an edge separates two different regions, an edge point is a point where the
local intensity of the image varies rapidly – more rapidly than in the neighbouring
points which are close from the edge; such a point could therefore be characterised
as a local maximum of the gradient of the image intensity. The problem is that such
a characterisation is to be applied to differentiable images, and above all that, it also
detects all noise points. All techniques used so far to resolve the problem are based
on smoothing the image first [15, 3, 29, 30]. However, a problem with smoothing
arises: how much and what smoothing should one choose? A strong smoothing will
lead to the detection of fewer points while a lighter one will be more permissive.
That is why Mallat defined, in his work with Zhong [6], the concept of multiscale
contours. In this case, every edge point of an image is characterised by a whole chain
of the scale-space plane: the longer the chains, the more important the smoothing
imposed is, and the smaller the number of edge points we will get. In addition, this
allows us to extract useful information about the regularity of the image at the edge
point it characterises. This can be very attractive in terms of a finer characterisation
of edge map.
The multiscale edge detection method described in [31] is used to find the edges.
This wavelet is a nonsubsampled wavelet decomposition and essentially implements
the discretised gradient of the image at different scales. At each level of the wavelet
transform the modulus M j f of the gradients can be computed by
2 2

M j f = W jH f + W jV f (4.5)
and the associated phase A j f is obtained by

W jV
A j f = tan−1 . (4.6)
W jH
The sharp variation points of the image f(x, y) smoothed by S j f (f(x, y) × S j f)

are the points (x, y), where the modulus M j f has a local maxima in the direction of
the gradient given by A j f.
4.4.2 The Multiscale Method
Multilevel wavelet decomposition is applied and an edge detection is carried out

before computing the local maxima to produce both iris outer and pupil boundary
edge maps.
A Hough transform is then used for the detection of circles (outer and pupil
boundaries) followed by a conversion of iris images from Cartesian to polar coordi-
nates system for normalisation purpose. The diagram of this method is depicted in
Fig. 4.2.
4.4.2.1 Edge Map Detection

In our work, we have used the algorithm described in [27] to obtain the wavelet
decomposition using a pair of discrete filters H, G as shown in Table 4.1. A detailed
description of the technique can be found in [36–37].
At each scale s, the algorithm decomposes the eye image I(x,y) into I(x,y,s),
Wv (x, y, s) and Wh (x, y, s)
– I(x,y,s): the image smoothed at scale s.
– Wh (x, y, s) and Wv (x, y, s) can be viewed as the two components of the gradi-
ent vector of the analysed image I(x,y) in the horizontal and vertical directions,
respectively.
In each scale s (s < S) where S is the number of scales or decomposition level,
the image I(x, y) is smoothed by a low-pass filter: s = 0,
I (x y, s + 1) = I (x, y, s) × (Hs , Hs ). (4.7)

Fig. 4.2 Diagram of the Multiscale

proposed method edge detection
Local
maxima
Iris outer Pupil

boundary edge boundary
map edge map
Pupil
Iris outer
circle
circle detection
detection
Eyelids and
eyelashes isolating
Iris
normalization
Table 4.1 Response of filters H, G
H G
0 0
0 0
0.12 0
5 –2.0
0.37 2.0
5 0
0.37 0
5
0.12
5
0
Both horizontal and vertical details are obtained by
1
Wh (x, y, s) = .I (x, y, s) × (G s , D), (4.8)
λs
1
Wv (x, y s) = .I (x, y, s) × (D, G s ). (4.9)
λs
We denote by
– D the Dirac filter whose impulse response is equal to 1 at 0 and 0 otherwise.

– A × (H, L) the separable convolution of the rows and columns, respectively, of
the image A with the 1D filters H and L.
– Gs , Hs are the discrete filters obtained by putting 2s –1 zeros between consecutive
coefficients of H and G.
– λs , as explained in [27] due to discretisation, the wavelet modulus maxima of
a step edge do not have the same amplitude at all scales as they should fit in
a continuous model. The constants λs compensate for this discrete effect. The
values of λs are given in Table 4.2.
Figure 4.3 clearly shows the application of the algorithm on an eye image where it
can be observed that the edges of the image in both horizontal and vertical directions
and at different scales are efficiently computed.
From Fig. 4.3 it can also be observed that there is significant information about
the edge in an eye image, with Wh (x, y, s) eyelids and that the horizontal pupil’s
lines are clearer than the outer boundary circle, and with Wv (x, y, s) useful infor-
mation about both pupil and outer boundary circles. After computing the two
components of the wavelet transform, we compute the modulus at each scale as
follows:

M(x, y, s) = |wh (x, y, s)|2 + |wv (x, y, s)|2 (4.10)
The modulus M(x, y, s) has local maxima in the direction of the gradient given by
A(x, y, s) = arctan (wv (x, y, s) / wh (x, y, s)) (4.11)
Table 4.2 Normalisation coefficient λs .for s > 5, λs = 1
s λs
1.50
2 1.12
3 1.03
4 1.01
5 1.00
Fig. 4.3 Original image at

the top, the first column on
the left shows Wh (x, y, s) for
1 ≤ s ≤ 3, and the second
column on the right shows
Wv (x, y, s) for 1 ≤ s ≤ 3
From the modulus M(x,y,s) one can see how edges across the scales change and
only real edges remaining at all the scales, for example if the intensities along spec-
ified column (see Fig. 4.4) are compared one can determine how well the edges are
detected.
A thresholding operation is then applied to the modulus M(x,y, s). This is carried
out on the modulus maxima MAX(M(x, y, s)) and then multiplied by a factor α to
obtain a threshold value that yields an edge map. The threshold value T is computed
as follows:
T = α ∗ MAX(M(x, y, s)). (4.12)
Therefore only values of M(x,y,s) greater or equal to T are considered edge

points. The constant α takes different values for pupil edge detection and outer
boundary edge detection (see Figs. 4.5 and 4.6).
The use of vertical coefficients for outer boundary edge detection will reduce the
influence of the eyelids when performing a circular Hough transform because the
eyelids are usually horizontally aligned [3].
80
70
60
50
40
30
20
10
0
0 50 100 150 200 250 300
80
70
60
50
40
30
20
10
0 50 100 150 200 250 300

80
70
60
50
40
30
20
10
0
100 0 50 100 150 200 250 300
90
80
70
60
50
40
30
20
10
0
0 50 100 150 200 250 300
120
100
80
60
40
20
0
0 50 100 150 200 250 300
Fig. 4.4 The first column on the left shows the modulus images M(x, y, s) for 1 ≤ s ≤ 5, and the
second column on the right displays intensities along specified column
Fig. 4.5 Pupil edge detection

in scale s = 3 with α = 0.66
Fig. 4.6 Outer boundary

edge detection using only the
vertical coefficients Wv
4.4.2.2 Iris Outer and Pupil Circle Detection

The Hough transform locates contours in an n-dimensional parameter space by
examining whether they lie on curves of specified shapes. For Iris outer and pupil
circles detection (see Fig. 4.7) a circular Hough transform has been used, The
Hough transform locates contours in an n-dimensional parameter space by examin-
ing whether they lie on curves of specified shape. For iris outer or pupillary bound-
aries and a set of recovered edge points (xi , yi ), i = 1, . . . , n, a Hough transform is
defined as:

n
H(xc , yc , r) = h(xi , yi , xc , yc , r ) (4.13)
i=1
where H(xc , yc , r) shows a circle through a point, the coordinates xc , yc , r define a

circle by the following equation:
x2c + y2c + r2 = 0 (4.13a)
For edge detection for iris boundaries the above equation will become
(xi − xc )2 + (yi − yc )2 − r2 = 0 (4.13b)
Fig. 4.7 Iris localised: (a)

pupil detected, (b) outer
circle detected
4.4.2.3 Eyelids and Eyelashes Isolating

The horizontal coefficients Wh (x, y, s) are only used for multiscale edge detection
to create an edge map as shown in Fig. 4.8. The eyelids are then isolated by first
fitting a line to the upper and lower eyelid parts using a linear Hough transform.
A second horizontal line is then drawn, which intersects with the first line at the
iris edge that is closest to the pupil; the second horizontal line allows a maximum
isolation of eyelid regions while the thresholding operation is used to isolate the
eyelashes (Fig. 4.9).
4.4.2.4 Iris Normalisation and Polar Transformation

Once the iris region is segmented the next stage is to normalise this part to enable
generation of the iris code and the subsequent comparison step. Since variations
in the eye, such as optical size of the iris, position of pupil in the iris, and the iris
orientation change from person to person, it is required to normalise the iris image,
so that the representation is common to all, with similar dimensions [36–37].
Fig. 4.8 Edges for eyelids

detection: the first column on
the left shows the original
images and the second
column on the right shows
the edges detected using the
horizontal coefficients
Wh (x, y, 3)
Fig. 4.9 Iris localisation

without noise
Fig. 4.10 Unwrapping the

iris θ
0 1
r r
The normalisation process involves unwrapping the iris and converting it into its
polar equivalent. It is done using Daugman’s Rubber sheet model (Fig. 4.10). The
centre of the pupil is considered as the reference point and a remapping formula is
used to convert the points from the Cartesian scale to the polar scale.
The remapping of iris image I(x, y) from raw Cartesian coordinates to polar
coordinates (r, θ ) can be represented as
I (x(r, θ ), y(r, θ )) → I (r, θ ) (4.14)
where r is on the interval [0,1] and θ is angle [0,2π] and with:
x(r, θ ) = (1 − r )x p (θ ) + r xl (θ ) (4.14a)
y(r, θ) = (1 − r)y p (θ) + ryl (θ) (4.14b)
where x p (θ), y p (θ) and xl (θ), yl (θ) are the coordinates of the pupil and iris bound-
aries along the direction θ.
In this model a number of data points are selected along each radial line (defined
as the radial resolution). The number of radial lines going around the iris region is
defined as the angular resolution as shown in Fig. 4.11. The normalisation process
proved to be successful as demonstrated by Fig. 4.12 showing the normalised iris of
image 4.11.
Fig. 4.11 Normalised

portion with radial resolution
of 15 pixels and angular
resolution of 60 pixels
Fig. 4.12 Normalised iris

image
4.4.3 Results and Analysis
The proposed algorithm has been evaluated using the CASIA iris image database,
which consists of 80 persons, 108 set eye images and 756 eye images.
A perfect segmentation was obtained as shown in Fig. 4.13 with a success rate
of 99.6% which is very attractive when compared with the basis for all current iris
recognition systems such as Daugman and Wildes methods.
A multiscale approach can provide a complete and stable description of signals
since it is based on a wavelet formalisation multiscale approach. This character-
isation provides a new approach to classical iris edge detection problems since all
existing research in iris localisation is based either on the integro-differential method
proposed by Daugman or the derivatives of the images proposed by Wildes. For
example, a problem with Daugman’s algorithm [15] is that it can fail in the pres-
ence of noise (i.e. from reflections, etc.) since the algorithm operates only on a local
scale basis.
However, in the proposed algorithm a multiscale approach provides more useful
information about the sharp variations (images at each scale with a horizontal and a
vertical decomposition) as shown in Fig. 4.2 and demonstrated in [27, 31].
It is clear from Fig. 4.14 that the proposed algorithm is capable to detect pupil
and outer boundary circles even with poor quality iris images because of the efficient
edge map detected with the multiscale edge detection.
On the other hand, there are problems with threshold values to be chosen for
edge detection. First, this may result in critical edge points being removed, resulting
in a failure to detect circles/arcs. Secondly, there is no precise criterion to choose
a threshold value. Wildes [3] chose a hard threshold value and applied the Hough
transform; however, the choice of threshold was not based on solid ground.
Fig. 4.13 Illustration of a

perfect iris segmentation
Fig. 4.14 Poor quality iris

image is efficiently localised
In the proposed algorithm the threshold value is selected by computing the max-
imum of the modulus at a given scale s which provides a solid criterion, because the
sharp variation points of the image smoothed by h(x, y, s) are the pixels at locations
(x, y), where the modulus M(x, y, s) has a local maxima in the direction of the gradi-
ent A(x, y, s) [31]. It can be clearly seen from Fig. 4.15 that edges are well detected
and the pupil is clearer as shown in (b) and (c) than the edge and pupil as shown in
(a). It can also be seen that, as a result, the pupil’s circle is well localised as shown
in (e). This is the reason why the proposed algorithm outperforms other algorithms
which used a local scale and canny edge detector.
This analysis confirms and explains the effectiveness of our proposed method
based on multiscale edge detection using wavelet maxima for iris segmentation,
provides a precise detection of circles (iris outer boundary and pupil boundary)
and obtains a precise edge map from the wavelet decomposition in the horizontal
and vertical directions. This in turn greatly reduces the search space for the Hough
transform and performs well in the presence of noise, thereby improving the overall
performance with a better success rate than that of Daugman and Wildes methods
(Fig. 4.16).
Fig. 4.15 Edge influence in iris segmentation: (a) pupil edge map using Canny edge detector
and threshold value (T1 = 0.25 and T2 = 0.25), (b) and (c) pupil edge obtained with a multiscale
edge detection using wavelet maxima for α = 0.4 and α = 0.6, (d) result of iris segmentation
using Canny edge detector of example (a), (e) result of iris segmentation using a multiscale edge
detection of example (c)
4.5 Texture Analysis and Feature Extraction 67
100
99
98
97
Success rate
96
95
94
93
Daugman's method
92 Proposed method
Wildes' method
91
0 10 20 30 40 50 60
Noise
Fig. 4.16 Success rate of iris segmentation
4.5 Texture Analysis and Feature Extraction
Since iris has an interesting structure with plenty of texture information it makes
sense to search for efficient methods that aim to capture crucial iris information
locally. The distinctive spatial characteristics of the human iris are manifest at a
variety of scales [2]. For example, distinguishing structures range from the overall
shape of the iris to the distribution of tiny crypts and detailed texture. To capture
this range of spatial details, it is advantageous to make use of a multiscale represen-
tation. A few researchers have investigated the use of multiresolution techniques for
iris feature extraction [15, 3, 14] and a high recognition accuracy has been achieved
so far.
At the same time it has been observed that each multiresolution technique has
its own specification and situation where it is suitable. For example, a Gabor fil-
ter bank is the most known multiresolution method used for iris feature extraction
and Daugman [15], in his proposed iris recognition system, has demonstrated the
accuracy of using Gabor filters. Moreover we have investigated the use of wavelet
maxima components as part of a multiresolution technique for iris feature extraction
by analysing iris textures in both horizontal and vertical directions. Since the iris has
a rich structure with a very complex textures, it is important to analyse iris textures
by combining all information extracted from iris region by taking orientation and
both horizontal and vertical details.
For this purpose, we have proposed a new combined multiresolution iris feature
extraction scheme by analysing the iris using wavelet maxima components before
applying a dedicated Gabor filter bank to extract all dominant texture features.
4.5.1 Wavelet Maxima Components

Wavelet decomposition provides a very good approximation of images and natural
setting for the multilevel analysis. Since wavelet transform maxima provide useful
information about textures and edges analysis [31], we propose to employ wavelet
components for a fast and effective feature extraction. Wavelet maxima have been
shown to work well in detecting edges which are key features in a query. Moreover,
this method provides useful information about texture features by using horizontal
and vertical details.
As described in [27] to obtain the wavelet decomposition a pair of discrete filters
H, G has been used (Table 4.1).
At each scale s, the algorithm decomposes the iris image I(x,y) into I(x,y,s), Wv
(x,y,s) and Wh (x,y,s) as shown in Figs. 4.17 and 4.18, respectively.
4.5.2 Special Gabor Filter Bank
The 2D Gabor Wavelets function g(x,y) and its Fourier transform G(u,v) can be
defined as follows [32]:
1 1 x2 y2
g(x, y) = exp[− ( 2 + 2 ) + 2πj W x] (4.15)
2π σx σ y 2 σx σy
1 (u − W )2 v2
G(u, v) = exp[− ( + )] (4.16)
2 σu2 σv2
where σ u = 1/2π σ x and σ v = 1/2π σ y . Gabor functions can form a complete but
non-orthogonal basis set. Expanding a signal using this basis provides a localised
0.12
0.1
0.08
0.06
0.04
0.02
-0.02
-0.04
-0.06
-0.08
0 20 40 60 80 100
Fig. 4.17 Wavelet maxima vertical component at scale 2 with intensities along specified column
0.08
0.06
0.04
0.02
-0.02
-0.04
-0.06
-0.08
-0.1
0 10 20 30 40 50 60 70 80 90 100
Fig. 4.18 Wavelet maxima horizontal component at scale 2 with intensities along specified column
4.5 Texture Analysis and Feature Extraction 69
frequency description. A class of self-similar functions, referred to as Gabor

wavelets in the following discussion, is now considered. Let g(x,y) be the mother
Gabor wavelet, then this self-similar filter dictionary can be obtained by appropriate
dilations and rotations of g(x,y) through the generating function
gmn (x, y) = a −m G(x , y ), (4.17)
x = a −m (x cos θ + y sin θ ), (4.17a)
y = a −m (−x sin θ + y cos θ ), (4.17b)
where a>1, m, n = integer and θ = nπ /k is the orientation (k is the number of

orientations) and a–m is the scale factor.
The non-orthogonality of the Gabor wavelets implies that there is redundant
information in the filtered images, and the following strategy is used to reduce this
redundancy [32]. Let Ul and Uh denote the lower and upper centre frequencies of
interest, respectively. Let K be the number of orientations and S be the number of
scales in the multiresolution decomposition. Then the design strategy is to ensure
that the half-peak magnitude support of the filter responses in the frequency spec-
trum touch each other as shown in Fig. 4.19.
This results in the following formula for computing the filter parameters σu and
σv (and thus σx and σy ):
a = (Uh /Ul )− S−1

1
(4.18)
(a − 1)Uh
σu = √ (4.19)
(a + 1) 2 ln 2
0.6
0.5
0.4
0.3
0.2
0.1
–0.1
–0.2
–0.6 –0.4 –0.2 0 0.2 0.4 0.6
Fig. 4.19 Gabor filter dictionary, the filter parameters used are Uh = 0.4, Ul = 0.05,K = 6
and S = 4
π σ2 (2 ln 2)2 σu2 −1
σv = tan( )[Uh − 2 ln( u )][2 ln 2 − ]2 (4.20)
2k Uh Uh2
where W = Uh , and m = 0, 1, . . . , S–1. In order to eliminate the sensitivity of the

filter response to absolute intensity values, the real (even) components of the 2D
Gabor filters are biased by adding a constant to make them with a zero mean.
4.5.3 Proposed Method

• Compute wavelet maxima components in the horizontal and vertical directions
using 5 scales.
• For each component, a special Gabor filter bank is applied with 4 scales and 6
orientations to obtain ((4×6)×5)×2 = 120×2 = 240 filtered images.
• The feature vector is computed by using two different techniques: the first uses a
statistical measure (mean and variance) while the second experiment is based on
moments invariant (Fig. 4.20).
Normalized iris
Wavelet maxima components
Horizontal details {5 scales} Vertical details {5 scales}
Special Gabor filter Special Gabor filter

bank 6 orientations {4 scales} bank 6 orientations {4 scales}
4*6*5 = 120 filtered image 4*6*5 = 120 filtered image
240 filtered images
Feature vector extracted using Feature vector extracted using

statistical features (mean, variance) moment invariants (7)
2*240 = 480 feature vector elements 7*240 = 1680 feature vector elements
Fig. 4.20 Proposed combined multiscale feature extraction techniques diagram

4.6 Matching 71
4.5.3.1 Template Generation

Statistical Features
For each filtered image the statistical features, the mean, μmn , and standard devia-
tion, σ mn , are computed in order to construct a feature vector. In the experiments, we
show that the best results are obtained when four scales (S = 4) and six orientations
(K = 6) are used. The resulting feature vector is constructed as follows:
f = [μ0,0, σ0,0, μ0,1, σ0,1, . . . , μ3,5, σ3,5 ]. (4.21)
After applying wavelet maxima components on the normalised iris images, a

Gabor filter bank has been used with 4 scales and 6 orientations, thus resulting in
120 images for horizontal and vertical details (wavelet maxima (5 images)→ Gabor
filters (4×6×5) = 120 images.
Therefore, by employing statistical features (mean and variance), there will be
(120×2) = 240 images to thus produce a feature vector of 480 elements.
Moments Invariant
The theory of moments provides an interesting series expansion for representing
objects. This is also suitable to mapping the filtered images to vectors so that their
similarity distance can be measured [33].
Certain operations of moments are invariant to geometric transformations such
as translations, rotations and scaling. Such features are useful in the identification of
objects with unique signatures regardless of their location, size and orientation [33].
A set of seven 2D moments invariant that are insensitive to translations, rotations
and scaling have been computed for each image analysed by the horizontal and ver-
tical components wavelet maxima and Gabor filters. This has produced 240 filtered
images for each image. Therefore, for seven moments a feature vector of size 1680
(240×7) elements is constructed.
4.6 Matching
It is very important to present the obtained vector in a binary code because it is
easier to compute the difference between two binary code-words than between two
number vectors since Boolean vectors are always easier to compare and manipulate.
A Hamming distance matching algorithm for the recognition of two samples has
been employed. It is basically an Exclusive OR (XOR) operation between two bit
patterns. Hamming distance is a measure, which delineates the differences, of iris
codes. Every bit of a presented iris code is compared to the every bit of a referenced
iris code: if the two bits are the same (i.e. two 1’s or two 0’s) the system assigns
a value “0” to that comparison while a value of “1” is assigned if the two bits are
different. The formula for iris matching is therefore as follows:
1
HD = Pi ⊕ Ri (4.22)
N
where N is dimension of feature vector, Pi is the ith component of the presented
feature vector while Ri is the ith component of referenced feature vector. The Match
Ratio between two iris templates is then computed by

Tz
Ratio = ∗ 100 (4.23)
Tb
where Tz is the total number of zeros calculated by Hamming distance vector and
Tb is the total number of bits in iris template.
4.7 Experimental Results and Analysis
A series of experiments have been conducted to evaluate the performance of the

proposed technique. In addition, the proposed iris identification system has been
compared with some existing methods for iris recognition and detailed discussions
on the overall experimental results are presented.
4.7.1 Database
The Chinese Academy of Sciences – Institute of Automation (CASIA) eye image

database [34] containing 756 greyscale eye images with 108 unique eyes or classes
and 7 different images of each unique eye has been used in the analysis. Images from
each class are taken from two sessions with 1 month interval between the sessions.
The images were captured especially for iris recognition research using specialised
digital optics developed by the National Laboratory of Pattern Recognition, China.
The eye images are mainly from persons of Asian decent, whose eyes are charac-
terised by irises that are densely pigmented, and with dark eyelashes.
4.7.2 Combined Multiresolution Feature Extraction Techniques

In our new approach, we have introduced the combined multiresolution technique-
based feature extraction which provides more significant iris texture in terms of iris
structure and texture’s nature. Since the visual appearance of the iris is a direct result
of its multilayered structure [2], the structure of the iris texture should be defined
in all directions and orientations. In fact, some existing methods in iris recognition
have used a multiscale approach to process iris features. However, these methods
only address one class of iris representation: phase-based methods [4, 12], zero-
crossing representation [14, 35], texture analysis [3, 16] and intensity variation anal-
ysis [29, 10]. Since each technique has a strong effect on one class, it fails to analyse
4.7 Experimental Results and Analysis 73
some types of texture if some conditions are not met. A solution is therefore required
to combine a number of techniques to balance this problem with a view to analyse
all types of texture.
Our proposed approach has demonstrated that a combined multiscale technique
is effective and robust for the analysis of iris texture and a high system performance
can be achieved (Table 4.4).
4.7.3 Template Computation
Two different experiments to compute the feature vector were conducted. In the first
case, statistical features (mean and variance) have been used to compute the feature
vector elements leading to 480 elements. The second method employs a set of seven
moment invariants that are insensitive to translations, scaling and rotations. This has
led to a feature vector of 1680 elements. From the experimental results depicted in
Table 4.3, it has been found that the accuracy is higher with the second method.
Therefore, it can be concluded that moment invariants are useful in identifying and
efficiently representing images especially that they are compact and can be easily
used to compute similar distances. Also, these moments are invariant to all affine
transformations.
4.7.4 Comparison with Existing Methods
Our comparative study was against the methods proposed by Daugman [4], Boles
and Boashash [14] and Li Ma et al. [21] which are the best known among exist-
ing schemes for iris recognition. These methods characterise local details of the iris
based on phase, texture analysis, zero-crossing and local sharp variations’ represen-
tation. However, it is worth noting that Wildes et al.[3] method only operates in a
verification mode and as such a comparative study against it is not appropriate since
our proposed method is more useful in an identification mode.
From the results shown in Table 4.4, it can be seen that Daugman’s method and
the proposed method have the best performances followed by Tan’s method [21] and
then Boles [14]. Daugman’s method is slightly better than the proposed method in
identification tests. Daugman’s method operates by demodulating the phase infor-
mation of each small local region using multiscale quadrature wavelets. The result-
ing phasor, denoted by a complex-valued coefficient to one of the four quadrants
in the complex plane, is then quantised. To achieve high accuracy, the size of each
Table 4.3 Performance evaluations according to the size of feature Vectors
Feature vector representation Statistical features Moment invariants
Feature vector size (bits) 480 1680

Correct recognition rate (%) 99.60 99.52
Table 4.4 Comparison of methods’ correct recognition rate
Methods Correct recognition rate
Daugman 99.90
LiMa and Tan 99.23
Boles et boashashe 93.2
Proposed method (statistical features) 99.52
Proposed method (moments invariant) 99.60
local region must be small enough, which results in a high dimensionality of the fea-
ture vector (2048 components). This means that Daugman’s method captures much
more information in the much smaller local regions. This makes his method slightly
better than ours. Boles [14] and Li Ma et al. [21] used a kind of 1D ordinal measures
thereby losing much information when compared with 2D ordinal representations.
This directly leads to a worsening of their performances when compared with our
method. Boles and Boashash [14] only employed extremely little information along
a virtual circle on the iris to represent the whole iris, which in turn, results in a
relatively low accuracy. Li Ma et al.’s [21] method uses local features, so that the
performance may be affected by iris localisation, noise and inherent iris deforma-
tions caused by pupil movements.
Table 4.5 depicts the computational cost involved for the computation of the
feature extraction using the methods described in [4, 14, 21] including our proposed
algorithm. These experiments are carried out using Matlab 7.0. Since Boles’ method
[14] is based on 1D signal analysis, the computational cost incurred is smaller than
that of the other methods. However, our proposed approach is faster than both Daug-
man and Li Ma and Tan’s methods because it employs a compact feature vector
representation with a high recognition rate.
Table 4.5 Comparison of the computational complexity
Proposed Proposed
Li Ma Boles and (statistical (moment
Methods Daugman and Tan Boashash features) invariants)
Feature 285 95 55 74 81
extraction
complexity
(ms)
4.8 Discussion and Future Work
From the analysis and comparative study given above, the following conclusions
can be drawn:
• The proposed multiscale approach has introduced a technique to detect the edges
for precise and effective iris region localisation. This approach uses modulus
References 75
wavelet maxima to define pupil and iris edges. This in turn greatly reduces the
search space for the Hough transform, thereby improving the overall perfor-
mance.
• A combination of Gabor filters with wavelet maxima components provide more
textured information, since wavelet maxima allow us to efficiently detect hor-
izontal and vertical details through scale variations. By applying Gabor filters
to the resulting components with varying orientations and scales more precise
information can be captured for use in iris recognition accuracy.
• Moment invariants are useful and are efficient to capture iris features since they
are insensitive to affine transformations (i.e. translations, rotations and scaling)
thereby providing a complete and compact feature vector which can improve the
matching process quickly.
The experimental results also show that our proposed method is reasonable and
promising for the analysis of iris texture. Future work will include:
• Analysis of local variations to precisely capture local fine changes of the iris with
a view to further improve the accuracy.
• Analysis of a combined local and global texture analysis for robust iris
recognition.
4.9 Conclusion
Iris recognition, as a biometric technology, has great potential for security and iden-
tification applications. This is mainly due to its variability and stability features. This
chapter has discussed an iris localisation method based on a multiscale edge detec-
tion approach using wavelet maxima as a preprocessing step which is highly suitable
for the detection of iris outer and inner circles. This approach yields attractive iris
localisation a necessary step to achieve higher recognition accuracy. The chapter
has also introduced a novel and efficient multiscale approach for iris recognition
based on a combined feature extraction methods that consider both the textural and
topological features of an iris image. These features being invariant to translations,
rotations and scaling yield a superior performance in terms of recognition accuracy
and computational cost when compared against the algorithms proposed by Boles
[14] and Li Ma et al. [21]. However, it performs with a marginally less accuracy, but
with a lower complexity when compared against Daugman’s method [4].
References
1. M. K. Khan, J. Zhang and S. J. Horng, “An effective iris recognition system for identification
of humans”, INMIC Multitopic Conference, pp. 114–117, 24–26 December 2004.
2. J. Wayman, A. Jain, D. Maltoni and D. Maio, “Biometric systems, Technology, Design and
Performance Evaluation”, Springer-Verlag London, UK, 2005.
3. R. Wildes, “Iris recognition: an emerging biometric technology”, Proceedings of the IEEE,
vol. 85, pp. 1348–1363, 1997.
4. J. Daugman, “High confidence visual recognition of persons by a test of statistical inde-

pendence”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15,
pp. 1148–1161, 1993.
5. A. Muron and J. Pospisil, “The human iris structure and its usages”, Physica, vol. 39,
pp. 87–95, 2000.
6. P. C. Kronfeld, “The gross and embryology of the eye”, The Eye, vol. 1, pp. 1–66, 1968.
7. L. Flom and A. Safir, “Iris Recognition System,” U.S. Patent 4 641 394, 1987.
8. R. Wildes, J. Asmuth, G. Green, S. Hsu, R. Kolczynski, J. Matey and S. McBride, “A machine-
vision system for iris recognition”, Machine Vision and Applications, vol. 9, pp. 1–8, 1996.
9. R. Johnson, “Can Iris Patterns be Used to Identify People?” Chemical and Laser Sciences
Division LA-12 331-PR, Los Alamos National Laboratory, Los Alamos, CA, 1991.
10. K. Bae, S. Noh and J. Kim, “Iris feature extraction using independent component analysis,”
Proceedings of the 4th International Conference Audio- and Video-Based Biometric Person
Authentication, pp. 838–844, 2003.
11. J. Daugman, “Biometric Personal Identification System Based on Iris Analysis,” U.S. Patent
5 291 560, 1994.
12. J. Daugman, “Demodulation by complex-valued wavelets for stochastic pattern recogni-
tion,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 1,
pp.1 –17, 2003.
13. R. Sanchez-Reillo and C. Sanchez-Avila, “Iris recognition with low template size,” Proceed-
ings of the International Conference Audio and Video-Based Biometric Person Authentica-
tion, pp. 324–329, 2001.
14. W. Boles and B. Boashash, “A human identification technique using images of the iris
and wavelet transform”, IEEE Transactions on Signal Processing, vol. 46, pp. 1185–1188,
1998.
15. J. Daugman, “How iris recognition works”, IEEE Transactions on Circuits and Systems for
Video Technology, vol. 14, pp. 21–30, 2004.
16. S. Lim, K. Lee, O. Byeon and T. Kim, “Efficient iris recognition through improvement of
feature vector and classifier,” ETRI Journal, vol. 23, no. 2, pp. 1–70, 2001.
17. C. Tisse, L. Martin, L. Torres and M. Robert, “Person identification technique using human
iris recognition”, Proceedings of the Vision Interface, pp. 294–299, 2002.
18. T. Tangsukson and J. Havlicek, “AM-FM image segmentation,” Proceedings of the IEEE
International Conference Image Processing, pp. 104–107, 2000.
19. J. Havlicek, D. Harding and A. Bovik, “The mutli-component AM-FM image representation,”
IEEE Transactions on Image Processing, vol. 5, pp. 1094–1100, June 1996.
20. B. Kumar, C. Xie and J. Thornton, “Iris verification using correlation filters,” Pro-
ceedings 4th International Conference Audio- and Video-Based Biometric Person
Authentication, pp.697–705, 2003.
21. L. Ma, T. Tan, et al. “Efficient iris recognition by characterizing key local variations,” IEEE
Transactions on Image Processing, vol. 13, pp. 739–750, 2004.
22. J. G Daugman, “The importance of being random: statistical principles of iris recognition”, In
Pattern Recognition, vol. 36, no. 2, pp. 279–291, 2003.
23. W. Kong and D. Zhang, “Accurate iris segmentation based on novel reflection and eyelash
detection model”, Proceedings of 2001 International Symposium on Intelligent Multi-media,
Video and Speech Processing, Hong Kong, 2001.
24. L. Ma, Y. Wang and T. Tan, “Iris Recognition Using Circular Symmetric Filters”, National
Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,
Beijing, 2002.
25. N. J. Ritter and J. R. Cooper, “Locating the Iris: A First Step to Registration and Identifica-
tion,” Proceedings of the 9th IASTED International Conference on Signal and Image Process-
ing, IASTED, pp. 507–512, August 2003.
26. J. C. Goswami and A. K. Chan, “Fundamentals of Wavelets: Theory, Algorithms, and Appli-
cations”, John Wiley & Sons, New York 1999.
References 77
27. S. Mallat and S. Zhong, “Characterization of signals from multiscale edges”, IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 710–732, 1992.
28. S. Mallat and W. Hwang, “Singularity detection and processing with wavelets”, IEEE Trans-
actions on Information Theory, vol. 38, pp. 617–643, 1992.
29. L. Ma, T. Tan, Y. Wang and D. Zhang, “Personal identification based on iris texture analysis”,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1519–1533,
2003.
30. L. Pan and M. Xie, “Research on iris image preprocessing algorithm”, IEEE International
Symposium on Machine Learning and Cybernetics, vol. 8, pp. 5220–5224, 2005.
31. S. Mallat, “A Wavelet Tour of Signal Processing”, Second Edition, Academic Press, New
York, 1998.
32. B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, August 1996.
33. A. K. Jain. “Fundamentals of Digital Image Processing,” Prentice-Hall Inc., Upper Saddle
River, 1989.
34. Chinese Academy of Sciences – Institute of Automation. Database of 756 Greyscale Eye
Images. http://www.sinobiometrics.com Version 1.0, 2003.
35. C. Sanchez-Avila and R. Sanchez-Reillo, “Iris-based biometric recognition using dyadic
wavelet transform,” IEEE Aerospace and Electronic Systems Magazine, vol. 17, pp. 3–6, Oct.
2002.
36. M. Nabti and A. Bouridane, “An improved iris recognition system using feature extrac-
tion based on wavelet maxima moment invariants,” Advances in Biometrics, Springer
Berlin/Heidelberg, vol. 462, pp. 988–996, 2007.
37. M. Nabti and A. Bouridane, “An effective and fast iris recognition system based on a combined
multiscale feature extraction technique,” Pattern Recognition, vol. 41, pp. 868–879, 2008.
Chapter 5
Spread Transform Watermarking
Using Complex Wavelets
5.1 Introduction
The use of wavelets in digital watermarking has increased dramatically over the last
decade replacing previously popular domains such as the Discrete Cosine Transform
(DCT) and the Discrete Fourier Transform (DFT). The main reason for this relates
to several advantages which wavelets offer over these domains, such as better energy
compaction and efficiency of computation. The Discrete Wavelet Transform (DWT)
however suffers from some disadvantages, it lacks directional selectivity so it cannot
differentiate between opposing diagonals. It also lacks shift invariance meaning that
small geometrical changes in the input signal can cause large shifts in the wavelet
coefficients. To overcome these shortcomings complex wavelets have been devel-
oped. This chapter describes two complex wavelet transform implementations and
their properties. The benefit of these properties to watermarking is also detailed.
Watermarking schemes can be roughly categorised into two main methodologies;
spread spectrum and quantisation-based schemes. Balado terms these as interference
non-rejecting and interference rejecting schemes, respectively [1]. Spread transform
has been developed as a combination between these two methodologies, “spreading”
the quantisation over multiple host samples through the use of a vector projection.
Spread transform embedding therefore combines the robustness gained from using
multiple host samples with the host interference rejecting nature of quantisation-
based schemes allowing for higher levels of capacity to be reached.
Further as watermarking has matured as a subject area an urgent need has arisen
to objectively find the absolute performance limits of watermarking systems. To
this end the process of deriving the capacity of watermarking algorithms has been
developed by Moulin [23]. Through statistical modelling of wavelet coefficients and
the application of information and game theory it is possible to derive an estimate of
the maximum achievable performance of any given watermarking system and host
data.
This chapter first introduces the concept of spread transform watermarking and
then applies this algorithm and information theoretic capacity analysis to the case
of watermarking with complex wavelets. This will demonstrate the improved levels
of capacity that can be achieved through the superior feature representation offered
by complex wavelet transforms.

Technology, DOI 10.1007/978-0-387-09532-5 5,

80 5 Spread Transform Watermarking Using Complex Wavelets
5.2 Wavelet Transforms

The DWT applies filtering to the source data with both a high-pass (wavelet) filter
(h1) and low-pass (scaling) filter (h0), and then down-sampling the result by 2. The
process is then repeated recursively on the low-pass section of the resulting signal
as shown in Fig. 5.1. The DWT can be extended to the case of 2D data by applying
the filters on the horizontal lines of the image and then repeating the process in the
vertical direction at each level on both the coarse and detail subbands to create four
subbands – the low-pass subband (LL) and the horizontal (HL), vertical (LH) and
diagonal (HH) detail subbands. Due to the down-sampling at each stage an N pixel
image will result in N wavelet coefficients and so the DWT is non-redundant.
h0(n) ↓2
h0(n) ↓2
h0(n) ↓2 h1(n) ↓2
h1(n) ↓2
h1(n) ↓2
Fig. 5.1 DWT filterbank
The disadvantage of the DWT is its lack of directional selectivity as shown in

Fig. 5.2. The HL and LH subbands are oriented at 0◦ and 90◦ , respectively while
the HH subband is oriented at both +45◦ and –45◦ (Fig. 5.2). The HH subband thus
has no dominant diagonal orientation and is unable to separate the diagonal features
of an image. This has important implications for digital watermarking as it creates
difficulty in properly adapting the watermark to the host image. For features in the
image that have a dominant diagonal orientation, part of the watermark signal will
consequently be perpendicular to the host image feature increasing the visibility of
the watermark. This is demonstrated in Fig. 5.2.
In addition, the DWT coefficients tend to produce “checkerboard” artefacts in
the embedded watermark. This can make the watermark look unnaturally blocky
(Fig. 5.2) making it harder to fulfil the imperceptibility requirement. DWT coeffi-
cients also suffer from a lack of phase information which makes them unsuitable for
geometric modelling.
5.2.1 Dual Tree Complex Wavelet Transform
To overcome the deficiencies of the DWT, Kingsbury [19, 20] proposed the use of a
complex wavelet filterbank and a dual tree filterbank implementation of the complex
wavelet. This involves the application of two DWTs acting in parallel on the same
data, each of which can be viewed as one of the two trees of a dual tree complex
wavelet transform. The two trees can then be modelled as the real and complex parts
5.2 Wavelet Transforms 81
Fig. 5.2 Top: DWT

coefficients, Middle: DWT
level 2 diagonal watermark
features added to detail of
F16 image (×8), Bottom:
DTWT level 2 diagonal
watermark features added to
detail of F16 image (×8)
of the wavelet transform, respectively. The two DWTs act in parallel on the same
data, one DWT acts upon the even samples of the data while the other acts upon the
odd samples. The difference and sum of these two DWT decompositions are then
taken to produce the two trees of the dual tree wavelet transform (DTWT).
If the two DWTs used are the same then no advantage is gained. However if the
DWTs are designed so as to be an approximate Hilbert transform of each other then
it is possible to obtain a directionally selective complex wavelet transform (Fig. 5.4).
This process is demonstrated in Fig. 5.3, with the scaling (h0 ) and wavelet (h1 ) filters
of the upper DWT and the scaling (g0 ) and wavelet (g1 ) filter of the lower DWT
applied recursively on their respective low-pass outputs at each level. The sum and
the difference of the high-pass subbands produced at each level are then calculated
to obtain the dual tree coefficients of the dual tree wavelet transform.
The application of the transform to 2D data follows the same methodology as
that of the DWT. Although the complex version has the advantage of excellent shift
invariance, this comes at the cost of 4:1 redundancy for 2D signals. This is due
to the use of four DWTs acting in parallel in the case of 2D data leading to 12
Fig. 5.3 DTWT filterbank
Fig. 5.4 DTWT wavelet, real (blue solid) and imaginary (red dashed)
different subbands at each level of decomposition. This places restrictions upon the
embedding algorithm as the watermark in the wavelet domain must have a valid rep-
resentation in the spatial domain. As a result of the redundancy, much of the power
added to wavelet coefficients will be in the softy space of the wavelet transform and
will be lost upon re-composition. For this reason in this thesis the lower redundancy
version of the dual tree complex wavelets transform developed by Selesnick et al.
[28] is used instead. This uses only two DWTs acting in parallel for 2D data and so
has a much more manageable redundancy of 2:1 (Fig. 5.5) for 2D signals allowing
for more freedom when embedding. This decreased redundancy also makes it an
attractive option for use in compression [14].
The DTWT overcomes the problem of the DWT lacking directional selectiv-
ity. The DTWT can discriminate between opposing diagonals with six different
Fig. 5.5 DTWT decomposition
Fig. 5.6 DTWT coefficients
subbands orientated at 15◦ , 75◦ , 45◦ , –15◦ , –75◦ and –45◦ (Fig. 5.6). This allows
the watermark embedding to adapt to diagonal features in the host image better.
The DTWT also represents horizontal and vertical features better giving two direc-
tional subbands for each. Also the DTWT is free from the checkerboard artefacts
that characterise the coefficients of the DWT.
5.2.2 Non-redundant Complex Wavelet Transform
The non-redundant complex wavelet transform (NRCWT) has been developed by

Fernandes et al. [13] as an alternative to the class of over complete, redundant
complex wavelet transforms. It makes use of a tri-band filterbank where the data
is down-sampled by 3 at each stage.
Fig. 5.7 NCWTR and NCWTC filterbanks for real and complex inputs, respectively
There are two filterbanks NCWTR and NCWTC and both consist of a real scaling
filter (h0) and two complex wavelet filters (h+ and h–). The NCWTR and NCWTC
are applied to real and complex inputs, respectively (Fig. 5.7). The complex filters
h+ and h– (Fig. 5.9) when applied to real input will produce wavelet coefficients
that are complex conjugates of each other, as a result one set of these complex
coefficients can be discarded. In the case of the NCWTR the output consists of one
real part and two complex outputs. As a result of their being conjugates of each
other one of these can be discarded. In the case of the NCWTC the output consists
of three complex outputs. The two complex outputs in this case are unique and both
must be kept.
The NCWTR is first applied to the real value rows of the image to be decom-
posed. This results in the creation of one subband of real value columns and two
subbands of complex value columns. The complex value columns are conjugates
of each other and so one can be discarded as redundant. An input of N coefficients
will thus produce 5N/3 coefficients (N/3 real and 2N/3 complex coefficients). How-
ever after the discarding of one of the complex subbands N coefficients will remain,
hence the NCWTR is non-redundant. The NCWTR is then applied to the real-valued
columns to produce one real and two complex-valued outputs. Again, one of these
complex outputs can be discarded as a conjugate leaving the real-valued LL band
and a complex value subband consisting of the horizontal features of the image.
The NCWTC is applied to the complex value rows to create three complex
value outputs consisting of the vertical and opposing diagonal features of the
image, respectively. Due to down-sampling by three the storage space required
for the three complex subbands will be the same as the original complex sub-
band and so the NCWTC is non-redundant. The 2D NRCWT decomposition is
illustrated in Fig. 5.8.
The process is repeated on the LL band at each level to produce one real-valued
subband and four complex-valued subbands at each level of decomposition. The
subbands produced are orientated at 0◦ , 90◦ , 45◦ and –45◦ , in both real and imag-
inary subbands. While this offers less directional subbands than the DTWT, the
NRCWT maintains the directional selectivity of the DTWT with regard to diagonal
features (Fig. 5.11). However unlike the DTWT the transform will produce as many
coefficients as there are pixels in the original image and is therefore non-redundant
(Fig. 5.10). As a result there will be no loss of information in the wavelet coefficients
upon re-composition.
Fig. 5.8 NRCWT filterbank for 1 level of decomposition
Fig. 5.9 NRCWT wavelet, real (solid blue) and imaginary (dashed red)
Fig. 5.10 NRCWT

decomposition
Fig. 5.11 NRCWT

coefficients
In addition the NRCWT coefficients have a high degree of phase coherency. This
means that the phase of the coefficients is coherent in places where the coefficients
have strong directional tendency.
5.3 Visual Models

To fulfil the imperceptibility requirement of watermarking systems it is necessary to
derive a just-noticeable-distortion (JND) model for the watermark embedding. The
JND model assigns a value to each wavelet coefficient that quantifies the maximum
distortion that can be applied to that coefficient before creating an unacceptable level
of visual distortion. As noted by the psycho-visual studies in [30], the visibility of
noise in an image is dependent on two main factors, these are
Luminance Masking: The average background luminance of a region affects the
visibility of watermark distortions. Distortions in bright and very dark areas of the
image are less visible than those in areas of the image with middling levels of
brightness.
Contrast Masking: Textured areas and edges in an image where spatial varia-
tions are large are much better at masking distortions than smoother areas where the
spatial variation is much smaller.
In addition the visibility of wavelet noise can depend on the orientation of the
wavelet coefficients under consideration with diagonal features generally less visi-
ble than horizontal and vertical ones. Also, low frequency features will generally be
more visible than higher frequency features. JND models have been derived for use
with discretely sampled wavelets such as those by Watson et al. [29] and Wolfgang
et al. [27]. However these are not suitable for the more complicated decomposition
produced by the complex wavelet transforms.
Two methods of deriving the JND model are adapted for use in watermarking
with complex wavelets. The first is a “universal” JND model that adapts a spatial
JND profile to fit the subband structure under consideration. The second uses a series
5.3 Visual Models 87
of visual tests to derive JND values directly from the coefficients of the wavelet
decomposition. A combination of both these methods is also considered.
5.3.1 Chou’s Model

Chou’s method [8] operates by composing a full-band JND model in the spatial
domain and then decomposing it into separate subband JND profiles. The advan-
tage of this is that extensive visual tests are not required to derive the JND values
and the visual model can be applied to any wavelets filter. As covered in the previous
section the JND values are modelled as the dominant effects of both overall lumi-
nance and luminance contrast. The full-band JND model is constructed from the
following Eq. [22]:
JND f b (x, y) = max{ f 1 (bg (x, y), m g (x, y)), f 2 (bg (x, y))} (5.1)
f 1 (bg (x, y), m g (x, y)) = m g (x, y)α(bg (x, y)) + β(bg (x, y)) (5.2)

bg (x, y) 1/2
f 2 (bg (x, y)) = T0 . 1 − + 3; f or bg (x, y) ≤ 127
127 (5.3)

f 2 (bg (x, y)) = γ . bg (x, y) − 127 + 3; f or bg (x, y) > 127
α(bg (x, y)) = bg (x, y).0.0001 + 0.115 (5.4)
β(bg (x, y)) = λ − bg (x, y).0001 (5.5)
Through visual experiments Chou derived T0 ; γ and λ were found to be 17, 3/128
and 1/2, respectively. The values bg (x, y) and mg (x, y) are the average background
luminance and luminance contrast around the pixel at (x, y), respectively. They are
obtained using the following filters:
⎡ ⎤ ⎡ ⎤
0 0 0 0 0 0 0 1 0 0
⎢ 1 3 8 3 1⎥ ⎢ 0 0 3 8 0⎥
⎢ ⎥ ⎢ ⎥
G1 = ⎢ ⎥
⎢ 0 0 0 0 0⎥ G 3 = ⎢−1
⎢ −3 0 3 1⎥⎥
⎣−1 −3 −8 −3 −1⎦ ⎣ 0 −8 −3 0 0⎦
0 0 0 0 0 0 0 −1 0 0
⎡ ⎤ ⎡ ⎤
0 0 1 0 0 0 1 0 −1 0
⎢ 0 8 3 0 0 ⎥ ⎢ 0 3 0 −3 0⎥
⎢ ⎥ ⎢ ⎥
G2 = ⎢
⎢ 1 3 0 −3 −1 ⎥ ⎢
⎥ G4 = ⎢ 0 8 0 −8 0⎥⎥
⎣ 0 0 −3 −8 0 ⎦ ⎣ 0 3 0 −3 0⎦
0 0 −1 0 0 0 1 0 −1 0
m g (x, y) = max {|gradk (x, y)|} (5.6)

k=1,2,3,4
1
5 5
|gradk (x, y)| = p(x − 3 + i, y − 3 + j).G k (i, j) (5.7)
16 i=1 j=1
And for average background luminance,

⎡ ⎤
1 1 1 1 1
⎢1 2 2 2 1⎥
⎢ ⎥
B(i, j) = ⎢⎢1 2 0 2 1⎥⎥
⎣1 2 2 2 1⎦
1 1 1 1 1
1 5 5
bg (x, y) = p(x − 3 + i, y − 3 + j).B(i, j) (5.8)
32 i=1 j=1
Finally, the individual subband JND profiles are calculated as follows:

⎡ ⎤

3
3
JNDq2 (x, y) = ⎣ JND 2f b (i + x.4, j + y.4)⎦ . ωq (5.9)
i=0 j=0
for q = 0,1,. . .,15 and 0 <= x <= N/4, 0 <= y <= N/4.
JNDq (x, y) represents the JND value at position (x, y) of the qth subband. The
factor ωq is calculated as follows:
−1

15
ωq = Sq . Sk−1 (5.10)
k=0
Sk denotes the average sensitivity of the HVS to distortions in the kth subband.
It is calculated as
16 ( pk +1)w−1
(εk +1)h−1
Sk = ε(u, v) for k = 0, 1, . . . ., 15 (5.11)
N .N u=εk .h v= pk .w
ε(u, v)denotes the response curve of the modulation transfer function (MTF)
for 0<=u<=N, 0<=v<=N. Chou [22] proposes the following formula for its
calculation:

Ω(u, v) Ω(u, v) c
ε(u, v) = a. b + . exp − (5.12)
Ω0 Ω0
where
2 2 1/2
32v 24u
Ω(u, v) = + (5.13)
N N
Fig. 5.12 Linear subband

structure
Fig. 5.13 DWT subbands
is derived from the MTF curve modelled by a=2.6, b=0.0192, c=1.1 and
Ω 0 =8.772.
The model as originally proposed by Chou has a linear subband structure that is
not suitable for the subband structure of the discrete and complex wavelet transforms
(Fig. 5.12).
This linear subband structure must first be altered to fit the multi-resolution
nature of wavelet decomposition subbands that vary in size according to level of
decomposition. Ghouti and Bouridane [18] propose the subband structure shown in
Fig. 5.13 for the balanced multi-wavelet transform.
Due to the similarity of the DWT and BMW wavelet decompositions this sub-
band decomposition is also used for DWT embedding. The linear subbands are
resized through a variation of Eq. 5.9 given by Eq. 5.14. However this decompo-
sition is not applicable to the subbands produced by the complex wavelets and so
different channel decompositions are proposed.
⎡ ⎤
2
t
−1 2
t
−1
JNDq2 (x, y) = ⎣ JND 2f b (i + x.2 , j + y.2 )⎦ .ωq
t t
(5.14)
i=0 j=0
for q = 0,1,. . .,15 and 0 <= x <= N/2t , 0 <= y <= N/2t

t = 5 − p−1
3
, if 0 < p ≤ 15
and
t = 5, i f p=0
For the DTWT each section of the dual tree decomposition is treated as being
of the same channel subband. The same subband weight is applied to opposing
halves of the dual tree composition as they are of opposing orientations and the
same frequency and so can be treated identically in this respect (Fig. 5.14).
Besides, to take into account the improved directionality of the DTWT a different
set of filters are used to obtain the value of m with G1 and G2 orientated at –15◦ and
+ 15◦ , respectively and G5 and G6 orientated at –75◦ and +75◦ , respectively. G3 and
G4 are the same diagonally orientated filters used for the case of the DWT.
⎡ ⎤ ⎡ ⎤
0 0 −1 0 0 0 0 −1 0 0
⎢−1 −3 −3 0 0 ⎥ ⎢0 0 −3 −3 −1 ⎥
⎢ ⎥ ⎢ ⎥
G1 = ⎢
⎢ 0 0 0 0 0 ⎥ G2 = ⎢ 0
⎥ ⎢ 0 0 0 1⎥⎥
⎣ 0 0 3 3 1 ⎦ ⎣1 3 3 0 0⎦
0 0 1 0 0 0 0 1 0 0
⎡ ⎤ ⎡ ⎤
0 0 1 0 0 0 0 1 0 0
⎢0 8 3 0 0 ⎥ ⎢ 0 0 3 8 0⎥
⎢ ⎥ ⎢ ⎥
G3 = ⎢
⎢1 3 0 −3 −1 ⎥ ⎢
⎥ G 4 = ⎢−1 −3 0 3 1⎥⎥
⎣0 0 −3 −8 0 ⎦ ⎣ 0 −8 −3 0 0⎦
0 0 −1 0 0 0 0 −1 0 0
⎡ ⎤ ⎡ ⎤
0 0 0 1 0 0 −1 0 0 0
⎢ 0 0 0 3 0⎥ ⎢ 0 −3 0 0 0⎥
⎢ ⎥ ⎢ ⎥
G5 = ⎢
⎢−1 −3 0 3 1⎥ ⎢
⎥ G 6 = ⎢−1 −3 0 3 1⎥⎥
⎣ 0 −3 0 0 0 ⎦ ⎣ 0 0 0 3 0⎦
0 −1 0 0 0 0 0 0 1 0
Fig. 5.14 DTWT subbands

m is then calculated using Eq. 5.15
m g (x, y) = max {|gradk (x, y)|} (5.15)

k=1,2,3,4,5,6
As the NRCWT down-samples by three at each level and has only four sub-
bands at each level the decreased number of subbands must be taken into account.
Imaginary and real parts of each subband are considered as being in the same
channel leading to four different channels at each level and one low-pass subband
for a total of 13 channels. Therefore the subband structure shown in Fig. 5.15 is
used.
Equation 5.16 is then altered to take into account the reduction in the number of
channels in the subband decomposition and the down-sampling by three instead of
two at each level
⎡ ⎤
3
t
−1 3
t
−1
JNDq2 (x, y) = ⎣ JND 2f b (i + x.3t , j + y.3t )⎦ .ωq (5.16)
i=0 j=0
for q = 0,1,. . .,12 and 0 <= x <= N/3t , 0 <= y <= N/3t

t = 3 − p−1
4
, if 0 < p ≤ 12
and
t = 3, i f p=0
Fig. 5.15 NRCWT subbands

Fig. 5.16 Chou’s JND profile (top) and Loo’s JND profile (bottom) for DTWT of Lena image
The factor ωq is also calculated to take into account the reduction in channels
from 15 to 12.
−1

12
ωq = Sq . Sk−1 (5.17)
k=0
The main disadvantage of Chou’s model is that it is less capable at adapting the
watermark to complex textured areas of the image due to the simplicity of the filters
employed by Chou. A visual description of both models is shown in Fig. 5.16.
5.3.2 Loo’s Model

Loo’s model [21] is obtained through a series of visual tests that determine the
maximum imperceptible distortion at differing levels of brightness and localised
coefficient intensity. A series of visual tests were conducted to obtain the JND value
for each of the directional subbands down to the fifth level of decomposition and
the third level in the case of the NRCWT. Reference [21] proposed a JND model
calculated as shown in Eq. 5.18.

jndl,θ (u, v) = B k 2 x(u, v)l,θ
2
+ Cl,θ
2
(5.18)
where k and C are subband-dependent constants, dependent on the level l and ori-
entation θ , respectively. The value x is the absolute mean value of a 3×3 Gaussian
window of standard deviation 0.5 centred round the coefficient at position (u, v).
B is a measure of the spatial brightness corresponding to the coefficient at position
(u, v).
In regions with a lot of texture the term k2 x(u, v)2 dominates the equation sub-
stantially increasing the value of the corresponding JND value. In the absence of
texture the JND will be dependent upon the term C, which decreases sharply when
level of decomposition increases. The factor B is a measure of the local brightness
and is calculated as detailed in Eqs. 5.19, 5.20 and 5.21 for the DWT, DTWT and
NRCWT, respectively where y represents the value of the level 5 low-pass coef-
ficient corresponding to position (u, v) normalised to fall within the range [0,1].
The rest of the equations are approximated using quadratic regression based upon
measurement of the visibility of watermark noise at different levels of background
brightness.
B = 2.12(y(u, v) − 0.56)2 + 1 (5.19)
B = 2.03(y(u, v) − 0.53)2 + 1 (5.20)
B = 2.43(y(u, v) − 0.55)2 + 1 (5.21)
The visual tests were conducted by setting all subbands of an image to 0. Then all
the values in the subband under consideration were set randomly distributed in the
range [0, n]. The image was then recomposed and added to a sine wave grating of
the appropriate frequency and orientation. The value n was then increased uniformly
until the distortion became visible. Using the value of n and the average value of the
coefficients composing the sine wave grating, an estimate of k for each level and
orientation was derived. The tests were repeated with different amplitudes of sine
wave gratings to obtain varied results for multiple values of x. The results for the
DWT, DTWT and NRCWT are shown in Table 5.1. All visual tests were conducted
with a gamma correction value of 2.1, a resolution of 32 pixels/cm and a viewing
distance of 30 cm. Three subjects took part in the tests and the results obtained from
each averaged to get the final factors.
Table 5.1 Loo’s JND factors
Subband DWT-k DWT-C DTDWT-k DTDWT-C NRCWT-k NRCWT-C
Level 1 – Diag. 0.33 5 1.00 6 0.25 3

Level 1 – Hor./Ver. 0.20 3 0.60 4.5 0.16 2
Level 2 – Diag. 0.25 4 0.50 1.5 0.15 2
Level 2 – Hor./Ver. 0.14 2 0.25 1 0.145 1
Level 3 – Diag. 0.25 1 0.25 1 0.145 1
Level 3 – Hor./Ver. 0.11 1 0.21 1 0.14 1
Level 4 – Diag. 0.18 1 0.20 1 – –
Level 4 – Hor./Ver. 0.11 1 0.195 1 – –
Level 5 – Diag. 0.18 1 0.195 1 – –
Level 5 – Hor./Ver. 0.11 1 0.19 1 – –
However it also suffers from a “spreading” effect around finer details such as
edges which can increase watermark visibility around these features. An additional
drawback is that a separate set of visual tests must be conducted for each individual
wavelet transform used.
5.3.3 Hybrid Model
In this thesis, a novel Hybrid model has been proposed and used. To create this
model, a combination of both Chou and Loo’s models to combine the benefits of
both approaches has been adopted. The JND factors obtained through both JND
models are averaged as shown in Eq. 5.22.
JND H ybrid = (JNDChou + JND Loo ) /2 (5.22)
This Hybrid model combines both Chou’s model’s precise approximation of fea-
ture edges combined with Loo’s model’s excellent approximation of textured image
regions. However it comes at the computational cost of having to calculate both JND
models.
5.4 Watermarking as Communication with Side Information
In the case of non-blind watermarking in a Gaussian channel, which is a water-

marking scenario in which the decoder has access to the host data, the host data
interference x can be subtracted from the received data received at the decoder z
causing the capacity to be equal to the well-known Eq. 5.23.

1 σ2
C= log2 1 + w2 (5.23)
2 σv
5.4 Watermarking as Communication with Side Information 95
where σ 2 w is the variance of the watermark and σ 2 v is the variance of the attack. The
statistics of the host therefore have no effect on the capacity in this case. However
in blind scenarios where knowledge of the host data is not available the capacity
is limited by interference from the original host data. As in most watermarking
scenarios the encoder will have knowledge of the host, the watermarking scenario
can be viewed as a communications problem with side information at the encoder.
In his landmark paper Costa [9] likened the problem of communication across
a noisy channel in the face of host interference to writing on dirty paper. It was
demonstrated that if the encoder has knowledge of the host channel then the capacity
of a blind communication scenario can be independent of the host interference, and
does not depend on whether the decoder has access to knowledge of the original
host channel. Eggers and Girod [12] extend Costa’s scenario to watermarking by
modelling the watermarking process as communication over a noisy channel with
side information at the encoder.
The result obtained by Costa can be adapted to the case of watermarking as illus-
trated in Fig. 5.17. The host data x and the distortion v are assumed to be indepen-
dent and identically distributed (iid) and Gaussian distributed, i.e. x∼N(0, σ2 x ) and
v∼N(0, σ2 v ), respectively, and of length N. The message to be encoded m is taken
from the alphabet M. The algorithm proceeds as follows:
1. First an N-dimensional codebook U is generated. This codebook consists of

2N·I(w; z) entries where I(w; z) is the mutual information between the watermark
sequence w and the received data z. These entries are partitioned into N M sub-
codebooks so that each possible message m is allocated approximately the same
amount of sequences. This code book is made available at both the encoder and
the decoder.
2. For the transmission of a given message m a sequence u is chosen from the
corresponding sub-codebook Um so that w = u−αx is nearly orthogonal to x.
The chance of there being no such sequence approaches 0 as N − > ∞.
code book U
x U1
U2
.....
UM
X α v
–
m w y z m'
Search in Search in
Um + + +
U
Fig. 5.17 Costa’s solution

3. At the decoder stage a search is conducted for the sequence u that is orthogonal
to the sequence received after the attack has been applied, z. The index m of the
sub-codebook in which this sequence is found then determines the message that
has been transmitted.
Using this method it can be shown that the capacity of the blind watermarking
system is equal to Eq. 5.23 and so the capacity is not limited by the absence of
knowledge of x at the decoder stage. However Costa’s scheme is not realisable in
practise as the codebook U tends to become extremely large even for moderate data
lengths N and alphabets M.
5.4.1 Quantisation Index Modulation
In their highly influential papers Chen and Wornell [3–6] provide a practical applica-
tion of Costa’s solution which they term Quantisation Index Modulation (or QIM).
Costa’s optimal random codebook is replaced with a uniform scalar quantiser which
is applied universally during encoding with a step size Δ that encodes a bit by shift-
ing the data to the appropriate quantisation bin.
The encoding of a bit b (b ∈ {−1, 1}) into the host data can then be accom-
plished by using one of two scalar quantisers Q–1 (.) and Q1 (.). The bit is encoded
by quantising the host data to the nearest quantisation bin of the appropriate scalar
quantiser. Given a host sample x the watermarked host sample y can then be obtained
by adding w, the watermark energy required to shift x into the appropriate quanti-
sation bin. A scalar quantiser of step size Δ for encoding either 0 or 1 is shown in
Fig. 5.18.
y = x +w (5.24)
The watermark w is then the quantisation error from applying the appropriate
scalar quantiser which can be calculated as in Eq. 5.25.
w = (Q b (x) − x) (5.25)
The estimate of the decoded bit b, can then be obtained through the use of a min-
imum Euclidean distance decoder using the same scalar quantiser and the received
data y.
b = arg min y − Q b (y)2 (5.26)

0,1
0 1 0 1
Fig. 5.18 Embedding of 1
bit using scalar quantiser of
step size Δ
5.4 Watermarking as Communication with Side Information 97
Eggers and Girod then further extended QIM to create what they termed the
Scalar Costa Scheme (SCS) [11]. This involves the use of a distortion compensation
factor α. Chen and Wornell [7] added a similar extension to their scheme terming it
distortion compensated QIM (DC-QIM). This increases the size of the quantisation
bins used in return for a decrease in the accuracy of the quantisation and is best
used for high levels of attack. Once quantised with the expanded quantisation bin
the extra distortion is compensated for by adding part of the quantisation back to the
host data x which results in the same overall distortion as the case where α=1.
x = q(x; Δ/α) + (1 − α)[x − q(x; Δ/α)] (5.27)
This sub-optimal implementation of Costa’s scheme QIM has the same advan-
tages as Costa’s approach in that its performance is independent of the interference
from the host data.
5.4.2 Spread Transform Watermarking

The QIM embedding covered in the previous section is uni-dimensional; it can only
be applied to one sample at a time which is highly vulnerable to noise. The robust-
ness of QIM can be vastly improved by watermarking multiple host samples using
repetition encoding by applying QIM repeatedly to multiple host samples. It was
soon realised that further improvements could be gained by extending QIM to the
multi-dimensional case. One of the earliest attempts at this method was by Chae
et al. who used high-dimensional lattice quantisers to embed data [2]. While allow-
ing for large amounts of data to be embedded this approach also suffers from a low
level of robustness.
Spread transform data hiding was originally proposed by Chen and Wornell [4] as
an extension of QIM. It applies quantisation in a reduced dimensional space to a vec-
tor projection consisting of several host samples. This aims to combine quantisation-
based methods with the spread spectrum-based methodologies by spreading the
quantisation over multiple host samples. The host samples x are drawn from a set
S={x1 , x2 ,. . .. . .., xr } of length r. The vector projection u is obtained by multiplying
the host samples with the key dependent vector t also of length r, which is termed
the spreading factor.
r
u= x n tn (5.28)
n=1
The vector projection is then quantised in a similar fashion to that of the uni-
dimensional QIM case with the host samples being quantised in the direction of the
vector v. The addition of the watermark vector uw creates the watermarked vector uy
which lies in the appropriate quantisation bin for the bit to be embedded.
u y = u x + uw (5.29)
Fig. 5.19 Quantisation Index Modulation and Spread Transform encoding of symbols X and O in
2 host samples for Δ=2 and Spread Transform vector t
u w = Q b (u x ) − u x (5.30)
r r r
yn tn = x n tn + w n tn (5.31)
n=1 n=1 n=1
The difference between spread transform embedding and uni-dimensional QIM

is illustrated in Fig. 5.19 for the embedding of two symbols X and O (0 and 1 in
the case of embedding a bit). Noise applied in the case of QIM can easily shift
the quantised data into the wrong quantisation bin. However in the case of spread
transform embedding to get the same effect the attacker will have to apply all noise
in the direction of the embedding vector t creating an effective watermark-to-noise-
ratio (WNR) advantage for the embedder. As such the vector t must remain secret
or the spread transform advantage will be lost.
5.5 Proposed Algorithm

The proposed algorithm that will be considered in the rest of the chapter applies the
spread transform embedding principles detailed above to watermarking. It applies
the wavelet transforms and visual models detailed previously as shown in Fig. 5.20.
The bit sequence m of length L is embedded into the original image I. Once
the watermark is encoded into the wavelet coefficients the image is recomposed to
create the watermarked image J. The watermarked image is then passed through
the attack channel where some distortion is applied in an attempt to remove the
watermark, this creates the watermarked and attacked image J’ that is then passed
on to the decoding stage.
5.5 Proposed Algorithm 99
I m = {m1,m2,.....,mL}
Forward
WT
μ Calculate
Calculate
Watermark K
JND profile
x Vectors
μ
K
w' w
μ' Calculate
+ x Calculate
Watermark
JND profile
Vectors
y
J J' z
Inverse Attack Forward De-
WT channel WT quantization
m'={m1',m2',.....,mL'}
Fig. 5.20 Proposed watermarking system
5.5.1 Encoding of Watermark
1. A binary pseudo-random key K of the same size as the host image is generated
(an optional dither can be added for increased security).
2. The image is decomposed to five levels using the DWT and the DTWT or to
three levels using the NRCWT.
3. The JND values for each individual coefficient in the decomposition are calcu-
lated using an appropriate visual model.
4. Coefficients are selected from all subbands to compose the host vectors that will
carry the individual watermark bits.
5. The quantisation step Δ is calculated based upon the JND values of the coeffi-
cients to ensure that embedding does not exceed the perceptual limit.
6. The current value of the vector projection is calculated by multiplication of the
host coefficients with the key values corresponding to the host coefficients as
shown in Eq. 5.28.
7. The vector projection ux is calculated as in Eq. 5.28. The quantisation error
required to encode the bit b is then calculated using the quantisation step Δ
obtained in step 5 and multiplied by the corresponding vector value so that the
data is quantised in the direction of the vector t.

wn = (Q b (u x ) − u x )∗ tn (5.32)
Individual elements of the watermark vector are then scaled by the size of their
corresponding JND value (μ) as shown in the following equation.
wn
wn = ∗ μn

r
μn
r
n=1
This novel method of perceptually shaping the watermark vector ensures that
coefficients with higher JND values will contribute more towards the vector pro-
jection, thereby visually masking the introduced watermark distortion. Finally,
the watermark vector elements are added to those of the host data to create the
watermarked data y.
yn = xn + wn
8. The watermarked wavelet subbands are then inverse transformed to give the
watermarked image in the spatial domain.
5.5.2 Decoding of Watermark
1. The image is decomposed to five levels using the DWT and the DTWT or to
three levels using the NRCWT.
2. The original visual mask used is estimated from the received image. The visual
mask generated will not be exactly the same as the original and so there is a
possible source of error being introduced.
3. The original quantisation step Δ is estimated based upon the estimated JND
values obtained in step 2.
4. The current value of the vector projection is calculated by a multiplication of the
host coefficients with the key values corresponding to the host coefficients.
5. The original watermark sequence generated is then estimated using the minimum
Euclidean distance decoder given in Eq. 5.26.
5.6 Information Theoretic Analysis

In order to objectively derive the theoretical capacity limits of the algorithms consid-
ered, an information theoretic analysis has been conducted. The framework applied
is that used in [12] where the capacity is calculated under the assumption that a sub-
optimal embedding scheme is being used at the encoder. Although general capacity
frameworks have been developed, Eggers et al. [12] show that capacity estimates
for sub-optimal schemes such as spread transform actually, some cases, exceed the
general capacity estimates. For this reason, and because of the focus on spread trans-
form embedding techniques in this thesis, the capacity assuming optimised spread
transform embedding at the encoder is considered.
5.6 Information Theoretic Analysis 101
5.6.1 Decoding of Watermark

It has been shown [12] that spread transform provides an effective WNR bonus
over uni-dimensional QIM embedding. The Mean Squared Error (MSE) distortion
measure is used throughout to quantify the watermark and attack power, that is d(x,y)
= x−y 2 . Letting the watermark MSE be equal to D 1 , the attack MSE equal to D 2
and the WNR equal to 10 log 10 (D1 /D2 ), the effective gain in WNR over QIM when
using spread transform is equal to
WNRr = WNR1 + 10 log10 r (5.35)
where WNR1 is the WNR when using only one sample and WNRr is the WNR
when using r samples. It should be noted that increasing the size of the spread
vector r comes at the cost of decreasing the maximum number of bits that can be
embedded. For example with r = 2 the maximum capacity possible is limited to 0.5
bits/elements. The capacity when faced with additive white Gaussian noise (AWGN)
C AWGN of spread transform (ST) data hiding can be calculated from the capacity of
embedding without spread transform (QIM, r = 1) as follows:
AW G N
C ST,1 (WNR + 10 log10 r )
AW G N
C ST, r = (5.36)
r
The spread factor is considered to be optimised according to the magnitude of
the WNR. For high WNR large spread vectors will be optimal as the WNR will
outweigh the disadvantage of decreasing the maximum achievable capacity. At low
WNR short spread factors will be optimal as no WNR advantage is needed. There-
fore for low WNR, QIM may be more optimal. The capacity of QIM then is given
by
AW G N
C ST,1 = max I (y; d) (5.37)
α
where y is the data received by the decoder, d is the alphabet of messages that may
be embedded equal to 0 or 1 for binary data embedding and I is the mutual infor-
mation. The optimum value of the distortion compensation factor alpha is applied.
The solution to (5.37) is obtained through a comparison of the PDFs of the trans-
mitted data assuming different alphabet values d were transmitted. This solution to
obtaining the mutual information is given by [10] as
I (y; d) = h(y) − h(y|d) (5.38)
The PDFs used in the calculation are illustrated in Fig. 5.21. Finally the power
of the watermark distortion introduced by quantisation is calculated by [7] as
Δ2
E q2 = (5.39)
12
d=0 d=0
d=1 d=1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Fig. 5.21 PDFs of data before and after attack is applied, Δ = 7, σv = 1 for two different possible
transmitted watermark values 0 and 1
The advantage of spread transform is its complete independence of interference

from the host image. It also offers capacities that are generally higher than spread
spectrum for low to moderate levels of attack.
5.6.2 Parallel Gaussian Channels

To derive capacity limits it is necessary to divide the source image into separate
channels. To divide the wavelet coefficients of an image into separate channels, the
EQ (estimation quantisation) model proposed by Lopresto et al. [22] has been used.
Within this scheme wavelet subbands are modelled as Gaussian distributions with
zero mean and variance dependent upon the coefficient’s location within the wavelet
subbands to create independent parallel Gaussian channels. The coefficients’ vari-
ances lie in a quantisation band k where 1 ≤ k ≤ K. These channels are assigned as
follows:
1. Apply 5 levels of the DWT or DTWT to the image under consideration. Due to
the greater down-sampling by 3 at each level of the NRCWT only 3 levels of
decomposition are used.
2. Calculate the local variance in a 5×5 window for finer detail levels (1,2,3) and
3×3 window for coarser levels (4,5). For the NRCWT the 5×5 window is applied
to levels 1 and 2 while the 3×3 window is applied to the coarser level 3.
3. The natural logarithm of each variance is quantised using K levels and step size
Δ. A channel then consists of all coefficients with the same quantised variance
within every subband.
The quantiser step size Δ is determined by the range of variances in the subband
decomposition. In this work, K equal to 256 is used. The estimated 256 parallel
Gaussian channels are shown in Figs. 5.22 and 5.23 for the DTWT and NRCWT
decompositions of the “Lena” image and Figs. 5.24 and 5.25 for the DTWT and
NRCWT decompositions of the Baboon image, where the higher the channel the
greater the variance quantisation band.
Fig. 5.22 EQ 256 parallel Gaussian channels for DTWT decomposition of Lena image
Fig. 5.23 EQ 256 parallel Gaussian channels for NRCWT decomposition of Lena image
Fig. 5.24 EQ 256 parallel Gaussian channels for DTWT decomposition of Baboon image
Fig. 5.25 EQ 256 parallel Gaussian channels for NRCWT decomposition of Baboon image
Each channel is assumed to be iid and Gaussian with zero mean and variance
σ 2 k . Each channel has an inverse sub-sampling rate R k . For all transforms channels
are critically sampled so that

K
Rk = 1 (5.40)
k=1
Smoother less detailed images like “Lena” will tend to have lower rates for higher
power channels with most of their coefficients being concentrated in lower energy
channels particularly at high frequency levels. However more complex textured
images like “Baboon” will tend to have high power channels consisting of many
coefficients. For all images the higher energy channels will tend to be concentrated
at lower frequency levels, and in textured areas of the image where the coefficients
are larger due to the concentration of image content in these regions. As will be seen
in the next section these channels tend to offer the highest capacities.
5.6.3 Watermarking Game

The problem of finding the capacity can be viewed as a game across the parallel
Gaussian channels derived in the previous section [23, 24] where both embedder
and attacker attempt to maximise their advantage within a game-theoretic frame-
work across every channel. The host is modelled as a series of independent parallel
Gaussian channels (x) indexed by k (1 ≤ k ≤ K, K = 256) of variance σ k 2 and mean
0 while the embedder and attacker are modelled as intelligent agents, the embedder
(w) attempting to maximise the final capacity and the attacker (v) attempting to min-
imise it. Both embedder and attacker are limited by distortion constraints with the
attacker being allocated a greater allowance than the embedder.
The MSE error distortion is used throughout so the power of the original chan-
nel pk , the distortion applied by the embedder ek and the distortion applied by the
attacker ak for each channel k are then given by

pk = E x 2 (5.41)

ek = E w 2 (5.42)

ak = E v 2 (5.43)
where x is the host data vector, w the watermark vector and v the attack vector. The
power of the Gaussian channels pk and the number of coefficients they contain is
shown in Figs. 5.26 and 5.27. Note that more textured images such as Baboon and
Barbara will tend to have more high power channels as the textured regions are
represented by large coefficients in the wavelet decomposition. Smoother images
such as Lena and Pepper will tend to have less detail and so smaller coefficients
when decomposed.
Fig. 5.26 EQ 256 parallel Gaussian channels for Lena and Peppers image for DWT (solid), DTWT
(dotted) and NRCWT (dashed)
Fig. 5.27 EQ 256 parallel Gaussian channels for Barbara and Baboon image for DWT (solid),
DTWT (dotted) and NRCWT ( dashed)
Also for all images the NRCWT decompositions tend to have more high power
channels, this is due to the ability of the NRCWT to well represent image features,
especially textured regions. The same trend can be seen for the DTWT which will
also have higher power channels as its improved directional nature allows for better
representation of diagonally orientated features within the host image. More tex-
tured images like Baboon and Barbara will also tend to have flatter power distribu-
tions as they represent a wider range of wavelet coefficients.
For the capacity estimates to be meaningful distortion constraints are imposed
upon both the embedder and the attacker. For the channel model under consideration
the global embedder and attacker distortions across all channels are given as

K
rk θk ek = D1 (5.44)
k=1

K
rk θk ak = D2 (5.45)
k=1
where θ is the distortion modifier for channel k dependent upon the orientation and
level of the coefficients in the channel, e and a are the weighted MSE of the attack
and embedding strategy, respectively. Wavelet coefficients are normalised before the
analysis so θ is 1 in all cases.
The three local distortion constraints placed upon the embedder and attacker
within each channel are
0 ≤ ek (5.46)
ek ≤ ak (5.47)
a k ≤ pk (5.48)
The attack is applied as the additive white Gaussian noise with amplitude scaling
attack (SAWGN). The SAWGN attack involves the application of an optimised mix
of both amplitude scaling and then the addition of AWGN to the watermarked data.
If the attack distortion a is equal to p then the attacker can scale by 0 effectively
erasing the channel completely and so the attack distortion never needs to be greater
than pk . The attack model applied differs from the analysis in [23] in that amplitude
scaling is also applied by the embedder after the watermark has been added making
the watermark process a more general one. But in practical situations embedding
distortion is a small fraction of the original power in a channel, this has little effect
on the results.
The total capacity of all the parallel Gaussian channels and so the images as a
whole is then given by the maximisation–minimisation relation shown in (5.49).

K
C = max min rk CkS AW G N ( pk , ek , ak ) (5.49)
ek ak
k=1
Equation 5.49 is solved through the application of game theory. The maxmin
relation is viewed as a game across the parallel Gaussian channels where each side
attempts to maximise their advantage in every channel. An optimisation algorithm
is applied to find the saddle point condition where both embedder and attacker are
applying their optimal strategy across all channels and it is no longer beneficial for
either side to alter its strategy. While the optimal attack can be calculated numeri-
cally for any given embedding strategy, the embedding strategy must be calculated
through simulated annealing. Allocation of pk , ek and ak as well as per channel
capacity are shown in Fig. 5.28 for the case of a moderate attack applied to the Lena
image and with the DWT decomposition. Graphs using the other wavelet transforms
and images are of similar shape and are neglected here.
As demonstrated in Fig. 5.28 the attacker is able to match and erase most of the
low power channels. As such it is not beneficial for the embedder to concentrate
much watermark strength in these channels. By contrast in the higher power chan-
nels the attacker can only afford to allocate a fraction of the channel power. These
are the channels in which it is also beneficial for the embedder to concentrate most
Fig. 5.28 Allocations of p, e and a and capacity for channels 1–256 of Lena image when decom-
posed using the DWT and experiencing a moderate attack
of the watermark energy to take advantage of the relatively weak attack applied in
these channels. This runs counter to the strategy employed in many watermark-
ing algorithms where low frequency features are ignored [21]. Capacity results
suggest that this results in a drastic lowering of the possible capacity of watermark-
ing algorithms.
Results for the capacity analysis are given in Table 5.2 as the total capacity of the
image (NC). These results are compared against those obtained by Ghouti for the
DWT [16] and NRCWT [15] when applying the general capacity analysis model
in Table 5.3. It should be noted that the analysis in [15] applies the NRCWT to
four levels of decomposition rather than three, but the low number of coefficients
Table 5.2 Total spread transform data hiding capacities in bits for images of size 512×512
D2 = 2D1 D2 = 5D1
Image D1 NC NC-Spike NC NC-Spike
Lena (Daub-8) 10 20138 17140 2042 3873

Lena (9/7 Linear phase filters) 19196 17861 1847 4087
Lena (DTWT) 25859 26717 2806 4698
Lena (NRCWT) 37086 31216 4812 7596
Baboon (Daub-8) 25 48580 49858 7953 10970
Baboon (9/7 Linear phase filters) 48300 50142 7813 11077
Baboon (DTWT) 52229 53038 9315 11991
Baboon (NRCWT) 60324 66044 11842 14894
Peppers (Daub-8) 10 26963 30064 2581 4574
Peppers (9/7 Linear phase filters) 27026 29702 2470 4668
Peppers (DTWT) 33304 35121 3741 5944
Peppers (NRWT) 49523 50151 7370 10876
Barbara (Daub-8) 20 21181 27301 2767 5166
Barbara (9/7 Linear phase filters) 21528 27642 2794 5112
Barbara (DTCWT) 25150 29737 3503 5452
Barbara (NRCWT) 37663 38538 5785 8703
Table 5.3 General data hiding capacities in bits for images of size 512×512 [16]
D2 = 2D1 D2 = 5D1
Image D1 NC NC-Spike NC NC-Spike
Lena (Daub-8) 10 27664 22080 3677 4818

Lena (9/7 Linear phase filters) 27233 21714 3651 4589
Lena (NRCWT) 37512 30979 6061 6674
Baboon (Daub-8) 25 26347 26148 4018 5455
Baboon (9/7 Linear phase filters) 24212 25218 3781 5842
Baboon (NRCWT) 61394 57473 12555 11976
Peppers (Daub-8) 10 19422 20708 3042 4344
Peppers (9/7 Linear phase filters) 16922 17852 2790 3962
Peppers (NRCWT) 44004 33917 7127 6875
Barbara (Daub-8) 20 22840 24495 3683 5475
Barbara (9/7 Linear phase filters) 18289 20026 2868 4531
Barbara (NRCWT) 39045 37118 7041 8081
in the low-pass level 3 NRCWT subband means that this will have little effect on
the results. Also given are results for the NC-Spike model [23] where a 2-channel
rather than 256-channel model is considered instead.
The subjective levels of distortion allocated to the embedder D1 are the same
as those employed in [23, 17]. These are 10 for Lena and Peppers, 20 for Barbara
and 25 for Baboon. More textured images can tolerate more noise before the noise
becomes visible. The attacker is then allowed to apply two different attack strengths
D2 . These are adjusted relative to the embedding distortion and are 2D1 and 5D1 .
The NRCWT produces the highest capacity estimates. This is a direct result of it
producing more high power channels than the other wavelet transforms. The DTWT
produces the next highest capacity estimates as it still produces more high power
channels than the DWT. This is due to the improved ability of these wavelet trans-
forms to represent the host image in the wavelet domain. Higher power channels
allow for greater robustness against the scaling introduced by the attacker and so
higher per channel capacity.
The Baboon image produces the highest capacity results, followed by the Pep-
pers image and then the Barbara and Lena image. This can be explained by ref-
erence to the characteristics produced by the wavelet decompositions of these
images. The large textured areas of the Baboon image produce a lot of large coef-
ficients that lead to many high power and hence high capacity channels. By con-
trast the smoother images will have smaller coefficients and so fewer high power
channels.
It should be noted that a deficiency of this analysis is that it employs a simplifi-
cation in that the host data is assumed to be uniform within each spread transform
quantisation cell. Essentially this is the equivalent of regarding the host power pk as
being infinite in each channel; an assumption that leads to an under-estimation of
the true performance of ST watermarking. This deficiency will be addressed in the
next section.
5.6.4 Non-iid Data

The capacity analysis in the previous section assumes that the host power is evenly
distributed across all quantisation cells and the data is equally likely to be quan-
tised to any centroid in the scalar quantiser. However, when embedding at low
document-to-watermark-ratios (DWR), the embedding strength will overwhelm the
host power. As a result, the probability that the host data will be quantised to any-
thing other than the two centroids adjacent to the origin becomes negligible. In such
cases the performance can be improved by using very small values for the distortion
compensation factor α. This is because the centroids can be made very large but the
host data will still be close to the boundary between two adjacent cells.
This has the effect of dramatically increasing the size of the centroids at the
origin and hence the robustness. As the host data will be gathered around the origin
between the two possible centroids the accuracy of the embedding becomes much
less important as the data will always be close to or on the border of both possible
quantisation cells. As a result, when the embedder needs to shift the data into the
correct quantisation bin the strength needed to do so is very low. Effectively taking
advantage of the low host power the algorithm begins to resemble a spread spectrum
based one. It is possible to view spread spectrum embedding methods as being a case
of spread transform embedding with two infinite quantisation cells [25, 26].
Perez-Freire et al. [26] term the use of just two quantisation cells adjacent to
the origin as distortion compensated spread spectrum (DC-SS). It should be noted
that this differs from classical SS schemes in that the strength of the watermark
embedding is not fixed, but adjusted according to the distance to the nearest centroid.
They show that the effective SNR (signal-to-noise ratio) when employing such a
scheme can be calculated as [26]

ξ 1 − λα 2
SNR DC−SS = (5.50)
1 + λξ (1 − α)2
where
ξ = 10(WNR/10) (5.51)
λ = 10(DWR/10) (5.52)
The optimum value of α for DC-SS can then be calculated as that which min-
imises the probability of error:
1
1 + ξ + λξ − (1 + ξ + λξ )2 − 4λξ 2 2
α DC−SS =
2λξ
Finally the capacity of a DC-SS channel is calculated as

1
C DC−SS (λ, ξ, α) ∼
= log2 (1 + S NR DC−SS ) (5.54)
2
As shown in Eq. 5.55 the spreading factor r affects the WNR. It also has a corre-
sponding effect on the DWR, effectively decreasing the DWR:
DWRr = DWR1 − 10 log10 r (5.55)
At sufficiently low DWR, DC-SS will offer improved levels of performance over
that of standard ST embedding. Table 5.4 shows capacity estimates obtained taking
into account the improved performance offered by DC-SS in the case of low DWR
channels. In all cases where the capacity is increased, the increase is relatively more
significant for the DWT, as it will have more low power channels. However, the
same trend of the NRCWT and the DTWT producing superior capacity estimates
remains.
5.6.5 Fixed Embedding Strategies
A drawback with the distortion measure used in the capacity analysis is that the dis-
tortion measure used allows for unacceptably large local distortions to be globally
compensated. For example placing as much watermark energy in the low frequency
Table 5.4 Total DC-SS data hiding capacities in bits for images of size 512×512
D2 = 2D1 D2 = 5D1
Image D1 NC % Increase NC % Increase
Lena (Daub-8 ST) 10 22923 13.83 2380 16.55

Lena (9/7 Linear phase ST) 22191 15.60 2164 17.16
Lena (DTCWT ST) 29101 12.58 3321 18.35
Lena (NRCWT ST) 40079 8.07 5560 15.54
Baboon (Daub-8) 25 49890 2.70 8830 11.03
Baboon (9/7 Linear phase) 49447 2.37 8733 11.78
Baboon (DTCWT) 53061 1.59 10088 8.30
Baboon (NRCWT) 60770 0.74 12408 4.78
Peppers (Daub-8) 10 31319 16.16 3078 19.26
Peppers (9/7 Linear phase) 31449 16.37 2964 20.00
Peppers (DTCWT) 35938 7.91 4616 23.39
Peppers (NRCWT) 50562 2.10 8728 18.43
Barbara (Daub-8) 20 22742 7.37 3149 13.81
Barbara (9/7 Linear phase) 23168 7.62 3184 13.96
Barbara (DTCWT) 27270 8.43 3915 11.76
Barbara (NRCWT) 39360 4.51 6371 10.13
components as indicated in Section 5.6.3 will usually lead to too great a perceptible
distortion. However, when using the basic MSE distortion metric these perceptible
distortions can be compensated for by neglecting the higher frequency components.
The optimised embedding strategies take no account of the requirement for imper-
ceptibility, concentrating the watermark energy into areas where it may become
visible while neglecting lower power channels that may have higher perceptual lim-
its. For this reason, in this section, the JND models derived earlier in the chapter
are taken into account when applying the capacity analysis, rather than applying the
optimised embedding strategies derived.
The embedder can take perceptual constraints into account by allocating the
embedding strength ek to channels based on a fixed rather than optimal embed-
ding strategy. In addition to the two JND models described earlier, both PSC (power
spectrum condition) compliant watermarking and white embedding are also taken
into account for comparison. The optimal attack is calculated for the fixed embed-
ding strategy used and the capacity is then calculated as detailed earlier. The fixed
embedding strategy is restricted to the same amount of distortion used in Section
5.6.3. The five embedding strategies analysed are:
1. Embedding energy allocated optimally as calculated in Section 5.6.3.

2. Embedding energy allocated proportionally to JND profile derived by Chou’s
method.
3. Embedding energy allocated proportionally to JND profile derived by Loo’s
method.
4. Embedding energy allocated proportionally to the original power of the host
channel.
5. Embedding energy allocated evenly across all channels. This creates a “flat” allo-
cation of embedding distortion across all channels with each receiving the same
amount of embedding distortion regardless of host channel power.
The capacities produced by each of these embedding strategies for all four
images and the different wavelet transforms is shown in Fig. 5.29. For low details
images like Peppers and Lena Chou’s JND is closer to the optimum embedding
allocation. This is due to the ability of Chou’s JND to more effectively isolate
edges in the images. By contrast, for higher detail images like Baboon and Bar-
bara Loo’s JND is closer to the optimal allocation. This is due to the weak-
ness of Chou’s JND when it comes to modelling the large areas of texture in
these images. However due to being based on the wavelet coefficients, Loo’s
JND is able to take advantage of the coefficient’s accurate modelling of textured
regions.
It is also interesting to note that in the cases where Loo’s JND performs bet-
ter than Chou’s JND the white embedding performs better than the PSC compli-
ant embedding. This is due to the flatter host power distributions found in textured
images rather than smoother images which tend to have a peak in the power distri-
bution instead. White embedding better approximates this “flat” power distribution.
References 113
9000 9000
8000 8000
7000 7000
6000 6000
Optimized Optimized
5000 Chou 5000
Bits
Bits
Chou
Loo Loo
4000 PSC 4000 PSC
White White
3000 3000
2000 2000
1000 1000
0 0
Lena Baboon Peppers Barbara Lena Baboon Peppers Barbara
Image Image
(a) DWT (9/7 Linear phase filters) (b) DWT (Debauchies 8)
9000 14000
8000 12000
7000
10000
6000 Optimized
Optimized
Bits
5000 Chou 8000 Chou
Bits
Loo Loo
4000 PSC
PSC 6000 White
White
3000
4000
2000
1000 2000
0 0
Lena Baboon Peppers Barbara Lena Baboon Peppers Barbara
Image Image
(c) DTWT (d) NRCWT
Fig. 5.29 Capacity estimates for fixed embedding and optimised embedding strategies
5.7 Conclusion
By applying the principles of spread transform embedding the benefits of both quan-
tisation and spread spectrum are combined in the proposed system. This chapter
has demonstrated theoretically the improved levels of performance offered by the
DTWT and NRCWT combined with the high capacities offered by spread trans-
form embedding. This is due to the higher power channels offered by the NRCWT
and DTWT due to their superior ability to represent the features of the host image.
Further, the analysis clearly shows the areas of the image such as textured and low
frequency components, into which watermarks should be embedded to maximise
the capacity.
Finally, the case of non-iid data was considered as well as the application of fixed
embedding strategies to the theoretical analysis.
References
1. F. Balado, “Digital Image Data Hiding Using Side Information”, PhD thesis, University of
Vigo, Spain, 2003.
2. J. J. Chae, B. S. Manjunath, “A robust data hiding technique using multidimensional lattices”,
Proceedings of the IEEE Conference on Advances in Digital Libraries, April 1998.
3. B. Chen, G. Wornell, “Digital Watermarking and Information Embedding Using Dither Mod-
ulation”, Proceedings of the IEEE Workshop on Multimedia Signal Processing (MMSP-98),
pp. 273–278, Redondo Beach, CA, USA, December 1998.
4. B. Chen, G. Wornell, “Achievable performance of digital watermarking systems”, Proceedings

of the IEEE International Conference on Multimedia Computing and Systems (ICMCS ’99),
vol. 1, pp. 13–18, Florence, Italy, June 1999.
5. B. Chen, G. Wornell, “Dither Modulation: A New Approach to Digital Watermarking and
Information Embedding”, Proceedings of the SPIE volume 3657: Security and Watermarking
of Multimedia Contents, pp. 342–353, San Jose, January 1999.
6. B. Chen, G. Wornell. “An Information-Theoretic Approach to the Design of Robust Digital
Watermarking Systems,” Proceedings of the IEEE International Conference on Speech and
Signal Processing 1999 (ICASSP’99), Phoenix, USA, April 1999.
7. B. Chen, G. Wornell, “Quantization index modulation: a class of provably good methods for
digital watermarking and information embedding”, IEEE Transactions on Information Theory,
vol. 47, no. 4, May 2001.
8. C. H. Chou and Y. C. Li, “A perceptually tuned sub-band image coder based on the measure
of just-noticeable-distortion profile,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 5, pp. 467–476, December 1995.
9. M. Costa, “Writing on dirty paper”, IEEE Transaction on Information Theory, vol. IT–29,
pp. 439–441, May 1983.
10. T. M. Cover and J. A. Thomas, “Elements of Information Theory”, John Wiley & Sons,
New York, 1991.
11. J. Eggers, B. Girod, “Quantization Watermarking”, Proceedings in SPIE Vol. 3971: Security
and Watermarking of multimedia Contents II, San Jose Ca. January 2000.
12. J. Eggers and B. Girod, “Informed Watermarking”, Kluwer Academic Publishers, Dordrecht,
May 2003, ISBN 1402070713.
13. F. C. A. Fernandes, M. B. Wakin and R. G. Baraniuk, “Non-redundant, Linear Phase, Semi-
Orthogonal, Directional Complex Wavelets,” Proceedings of the IEEE International Confer-
ence on Speech and Signal Processing (ICASSP), Phoenix, USA, 2004.
14. J. E. Fowler, J. B. Boettcher and B. Pesquet-Popescu, “Image Coding Using a Complex
Dual-Tree Wavelet Transform,” Proceedings of the European Signal Processing Conference,
Poznan, Poland, September 2007.
15. L. Ghouti, “Data-Hiding Capacities of Non-Redundant Complex Wavelets”, VIE 2006.
16. L. Ghouti, A. Bouridane, M. K. Ibrahim and S. Boussakta, “Digital image watermark-
ing using balanced multi-wavelets”, IEEE Transactions on Signal Processing, vol. 54,
no. 4, April 2006.
17. L. Ghouti, “Multiresolution-Based Digital Multimedia Watermarking”, PhD thesis, Queen’s
University, Belfast, UK.
18. L. Ghouti, A. Bouridane “A Just-Noticeable Distortion (JND) Profile for Balanced Multi-
wavelets,” 13th European Signal Processing Conference (EUSIPCO 2005), Istanbul, Turkey,
September 2005.
19. N. Kingsbury, “The Dual Tree Complex Wavelet Transform: A New Efficient Tool for Image
Restoration and Enhancement”, EUSIPCO, pp. 319–322, Rhodes, Greece, 1998.
20. N. G. Kingsbury: “Image Processing with Complex Wavelets”, Phil. Trans. Royal Society
London A, September 1999, on a Discussion Meeting on “Wavelets: the key to intermittent
information?” London, February 24–25, 1999.
21. P. Loo, “Digital Watermarking using Complex Wavelets”. PhD thesis, University of Cam-
bridge, March 2002.
22. S. M. Lopresto, K. Rachandran and M. T. Orchard, “Image coding based on mixture modeling
of wavelet coefficients and a fast estimation-quantization framework”, Proceedings DCC’97
(IEEE Data Compression Conference), Snowbird, Utah, USA, March 1997.
23. P. Moulin, “The role of information theory in watermarking and its application to image water-
marking”, Signal Processing, Special Issue on Information-Theoretic Issues in Digital Water-
marking, vol. 81, no. 6, June 2001.
24. P. Moulin and M. C. Mihcak, “The parallel-gaussian watermarking game”, IEEE Transactions
on Information Theory, vol. 50, pp. 272–289, February 2004.
References 115
25. L. Perez-Freire, F. Perez-Gonzalez, “Spread-Spectrum vs. Quantization Based Data Hiding:

Misconceptions and Implications”, Security, Steganography and Watermarking of Multimedia
Contents VII, San Jose CA, 17–20 January 2005.
26. L. Perez-Freire, F. Perez-Gonzalez and S. Voloshynovskiy, “An accurate analysis of scalar
quantization-based data hiding”, IEEE Transactions on Information Forensics and Security,
vol. 1, no. 1, March 2006.
27. R. B. Wolfgang, C. I. Podilchuk and E. J. Delp, “Perceptual Watermarks for Digital Images
and Video,” Proceedings of the IEEE, pp. 1108–1126, July 1999.
28. I. W. Selesnick, R. G. Baraniuk and N. G. Kingsbury, “The dual-tree complex wavelet trans-
form,” IEEE Signal Processing Magazine, vol. 22, no 6, pp. 123–151, November 2005.
29. A. B. Watson, G. Y. Yang, J. A. Solomon, J. Villasenor, “Visual thresholds for wavelet quan-
tization error,” SPIE Proceedings, Human Vision and Electronic Imaging, 1996.
30. T. R. Hsing, K. H. Zou and J. G. Dunham, “Applications of physiological human visual system
model to image compression,” Proceedings of the SPIE Conference Apllication Digital Image
Processing VII, vol. 504, pp. 419–424, 1984.
Chapter 6
Protection of Fingerprint Data Using
Watermarking
6.1 Introduction
Although biometric-based systems require more challenges (time, hardware,
software, etc.) to crack when compared to traditional systems, several breaches
of security exist and hackers can apply different attacks in order to get illegal
access, especially in remote unattended applications such as e-commerce appli-
cations, where they have enough time to make numerous attempts before being
noticed. In addition, biometric-based systems are much more complicated than tra-
ditional systems and as a result, there exist several critical points that can be com-
promised and used to violate the security of such systems. Ratha et al. [1] describe
eight basic types of possible attacks on such systems and the position of each type
of attack in the system is illustrated in Fig. 6.1. These eight types of attack are:
• Type 1: a fake biometric is presented to the sensor (e.g. dummy fingerprint, lens
with fake iris, mask face). The attacker creates a copy the biometric of a genuine
user with or without his/her cooperation.
• Type 2: also called replay attack because an old digitally recorded biometric data
is replayed into the system bypassing the sensor and this is done with or without
the cooperation of the owner of the biometric data.
• Type 3: the feature extractor is attacked with a Trojan horse programme which
produces features chosen according to the hacker s specification.
• Type 4: the genuine features extracted by the feature extractor are replaced by
other features (synthesised or real) given by the attacker.
• Type 5: the matcher can be attacked with a Trojan horse programme to produce
the desired score of the attacker.
• Type 6: an attacker tries to get access to the database in order to insert, modify
or delete the stored templates.
• Type 7: the templates are intercepted and tampered with while transmitted from
the database to the matcher.
• Type 8: the final score is overridden with the score chosen by the attacker, who
can either allow access to himself or/and other intruders (forcing the score to
accept) or denying the access of legitimate users (forcing the score to reject).
Technology, DOI 10.1007/978-0-387-09532-5 6,

118 6 Protection of Fingerprint Data Using Watermarking
Fig. 6.1 Possible attack Type 6 Database

points in a generic
biometric-based system Type 7
Feature Yes/No
Sensor Matcher
Extractor
Type 1 Type 2 Type 3 Type 4 Type 5 Type 8
Schneier [2] finds that the biometric data does not provide secrecy because it is
not a secret (e.g. leaving fingerprint impressions on almost everything we touch,
face images can be easily obtained by hidden cameras, iris features can be observed
everywhere we look) and not replaceable (e.g. once someone gets or steals your
biometric data, it remains stolen and cannot be replaced like passwords or cards).
Also, Schneier points out that using the same biometric trait (due to the limited
number of useful biometric traits and available biometric systems) across different
applications makes them unsecure once this biometric trait is stolen (e.g. if someone
uses his fingerprint to start his car, open his office door and read his emails, all these
functions are accessible by an attacker once he/she steals or forges that fingerprint).
In their work, Maltoni et al. [3] describes six threats for a typical biometric-based
system. In circumvention, an attacker gets access to a part of the system protected
by the biometric application. This threat can be cast as a privacy attack, where
an attacker accesses data that he/she has no right to access (e.g. medical records,
personal details), or as a subversive attack, where an attacker can manipulate the
accessed data (e.g. delete some records, changing personal details of other users).
In repudiation, a user denies accessing the system and then claims that an attacker
has circumvented the system (e.g. a corrupt bank clerk can illegally modify some
costumers’ records and then claim that his biometric was stolen and used by some-
one else, or he/she can argue that the False Accept Rate (FAR) associated with the
system, allowed an intruder to access his/her account). In contamination (covert
acquisition), an attacker can surreptitiously obtain the biometric data of genuine
users (e.g. lifting latent fingerprint from object, taking face pictures by hidden cam-
era) and use it to construct a digital or physical artifact of that data (e.g. construct
dummy finger using the lifted latent fingerprint, make face mask using the face
pictures). In collusion, a legitimate user with super access privileges, such as sys-
tem administrator, can be the attacker and illegally modify the system parameters
allowing the access of intruders. In coercion, attackers force legitimate users (e.g. at
gunpoint) to grant them access to the system. In denial of service (DoS), an attacker
corrupts the authentication system to a point where the legitimate users cannot use
it (e.g. an attacker bombards an online server that processes access requests with
large number of requests to a point where it cannot process the requests of genuine
users).
Deployment of watermarking techniques can be useful to increase the security to
the biometric data at different levels. This is achieved by embedding a watermark
6.2 Generic Watermarking System 119
signal to the host data to be watermarked such that the watermark signal is unobtru-
sive and secure in the signal mixture but can partly or fully be recovered from the
signal mixture later on.
For example, watermarking of fingerprint images can be deployed to: (i) pro-
tecting the originality of fingerprint images stored in databases against intentional
and unintentional attacks; (ii) fraud detection in fingerprint images by means of
fragile watermarks (which do not resist to any operations on the data and get lost,
thus indicating possible tampering of the data); (iii) guaranteeing secure transmis-
sion of acquired fingerprint images from intelligence agencies to a central image
database, by watermarking data prior to transmission and checking the watermark
at the receiver site.
This chapter discusses a comparative study of the generalised Gaussian,
Laplacian and Cauchy models for use in the 1-bit multiplicative watermarking of
fingerprint images. The optimum watermark detection is based on information the-
ory, in which the decision rule is derived by using the maximum-likelihood (ML)
scheme, while the decision threshold is derived by using Neyman–Pearson criterion.
Such an optimum detection is based on the parameters of a probability distribution
function (pdf) which must model the statistical behaviour of the DWT coefficients
accurately.
In the next section, a brief introduction to watermarking is given. In Section
6.3, state-of-the-art fingerprint image watermarking available in the literature is
reviewed. Then, the problem of optimum detection is formulated based on informa-
tion theory in Section 6.4. Section 6.5 describes the different distributions used to
model the DWT coefficients namely the generalised Gaussian distribution (GGD),
the Laplacian distribution and the Cauchy distribution. The DWT modelling and the
detection performance are evaluated through extensive experiments of Section 6.6.
Finally, conclusions are drawn in Section 6.7.
6.2 Generic Watermarking System

Watermarking is the process of embedding a secret imperceptible signal called
watermark into the original data in such a way that it cannot be removed. There
are three main issues when designing a watermarking system:
• Design of the watermark signal W to be added to the host signal. Typically, the
watermark signal depends on a key K and watermark information I,
W = f 0 (I, K ) (6.1)
Possibly, it may also depend on the host image X into which it is embedded:
W = f 0 (I, K , X ) (6.2)
Fig. 6.2 Digital Watermark

watermarking encoder W
Image Watermarking Watermarked image
X encoder f 1 Y
Secret key
K
Watermark W and /
or original image X
Watermarked image Watermarking ∧

decoder f 2 Watermark I
Y
Secret key
K
Fig. 6.3 Digital watermarking decoder
• Design of the embedding strategy/method to incorporate the watermark signal W

into the host image X to produce the watermarked image Y (see Fig. 6.2)
Y = f 1 (W, X ) (6.3)
• Design of the corresponding extraction method to recover the watermark infor-

mation from the signal mixture using the key K and the original image X (see
Fig. 6.3),
Î = f 2 (X, Y, K ) (6.4)
or without the original image
Î = f 2 (Y, K ) (6.5)
Although, every watermarking system has its own requirements, there is no set
of requirements that meets all watermarking techniques. Nevertheless, some general
requirements can be given for wide range of systems. These requirements are:
• Perceptual transparency: is one of the main requirements in most applications.

The data embedding process should not introduce any perceptible artifacts into
the host data. A watermark embedding is truly imperceptible if humans cannot
make the difference between the original and the watermarked data. However, the
modifications introduced by watermarking are only noticeable when the original
data is directly compared against the watermarked data, which is not the case in
most applications since the users of the watermarked data are normally unable
to access the original data. For this purpose, some form of masking is usually
used. For example, in image watermarking system, the Human Visual System
(HVS) characteristics can be used. Similarly, the frequency masking properties
6.2 Generic Watermarking System 121
of the Human Auditory System (HAS) can be considered when designing audio
watermarking systems.
• Robustness: is another main requirement in the design of many watermarking
applications, especially those requiring a permanent presence of the watermark
in the host data, even if the quality of the host data is degraded through manipula-
tions and modifications which can be applied intentionally or unintentionally. In
the first case, the watermarked data is subject to some data processing to extract
or remove the watermark. In the second case, alterations are introduced with-
out the intention of removing the watermark. For example, applications which
involve data storage or transmission usually use lossy compression to reduce the
size of data. However, lossy compression affects the quality of the watermark
and can remove the embedded watermark. Note that some applications use frag-
ile watermarks for use to prove the authenticity of the host data and as such are
not robust against manipulations, since failure to detect the watermark proves
that the host data has been tampered with and is thereby no longer authentic.
• Capacity: refers to the number of bits that can be embedded in an image and
it depends on the application at hand. In the literature, two different types of
watermarking systems can be found. In the first type referred to as 1-bit water-
marking, systems embed a specific information or pattern and check the existence
of that information later on in the watermark recovery. This is usually achieved
by employing some hypothesis testing method. The second type referred to as
multi-bit watermarking, embeds an arbitrary information such as serial number,
ID, tracking number, etc., into the host data and a full extraction of the hidden
information is necessary at the watermark recovery stage. A 1-bit watermarking
is sufficient for most copyright-protection applications while a multi-bit water-
marking is usually used in applications such as the protection of intellectual prop-
erty rights, fingerprinting, copy tracking, etc. Although most existing methods or
systems are developed for either watermark extraction or watermark detection, it
should be noted that in fact both approaches are inherently equivalent. A scheme
that considers a 1-bit watermarking can be extended to any number of bits and
the inverse is true [4].
• Security: similar to the case of encryption techniques, a watermarking technique
is truly secure if knowing the exact algorithms for embedding and extracting the
watermark does not help an unauthorised party to detect or remove the watermark
[5]. In most cases, the security of a watermarking technique is guaranteed by
using one or several secret and cryptographically secure keys in the embedding
and extraction process. These secret keys can be used to generate the watermark
sequence and/or determine the locations where the watermark is embedded.
• Blind vs. non-blind watermarking: in some applications, such as copyright pro-
tection and data monitoring, the use of original, unwatermarked data to recover
the embedded watermark is required. This is called non-blind or non-oblivious
watermarking and in this case, the watermark recovery is easier and more robust.
Furthermore, the availability of the original data in the recovery process allows
for the detection and the inversion of the distortions which change the data
geometry. However, access of the original data is not possible in most cases.
For example, copy tracking and indexing applications make the recovery process
more difficult. In fact, most recent applications do not require the original image
at the watermark recovery process. This kind of application is referred to as blind
or oblivious watermarking.
These requirements are conflicting and are also related to each other. For instance,
embedding high watermark sequence values leads to robust watermark, but this
introduces large modifications which in turn affect the perceptual quality of the host
data. In addition, the more information bits one wants to embed, the lower is the
watermark robustness.
In order to design a watermarking system that meets the desired requirements,
some criteria are usually used. For instance, to ensure imperceptibility caused by
the watermark embedding, the individual samples used for watermark embedding
can only be modified by an amount relatively small to their average amplitude. Also,
to ensure robustness while still allowing small changes, the watermark information
is usually redundantly distributed over many samples of the host data, thereby pro-
viding a holographic robustness. This means that the watermark can be recovered
from a small fraction of the watermarked data, but the recovery is more robust if
more watermarked data is available and used at the recovery stage.
Existing image watermarking algorithms operate either in the spatial domain
[6,7] or in a transform domain such as Discrete Cosine Transform (DCT) [8,9], Dis-
crete Wavelet Transform (DWT) [10,11], Discrete Fourier Transform (DFT) [12,13]
or the Fourier-Mellin Transform [14,15]. While, spatial domain methods are simple
and easy to deploy, embedding in a transform domain is more advantageous espe-
cially in terms of visibility and robustness. This is due to its energy compaction
property which suggests that the distortion introduced by the watermark into a num-
ber of transform coefficients will spread over all the pixels in the spatial domain so
as the changes introduced in these pixel values are visually less significant.
In the literature, the watermark embedding makes use of either an additive rule
or a multiplicative one. In the former, the watermark is simply added to the host
data, whereas, in the latter, the watermark is inserted with respect to the host data.
The commonly used additive rule is
yi = xi + λwi ; i = 1, . . . , N , (6.6)
where x = {x1 , . . . , x N } is a sequence of data from the original image, w =

{w1 , . . . , w N } is a sequence of watermark signals, λ is a gain factor used to control
the strength of the watermark, and y = {y1 , . . . , y N } is a sequence of watermarked
data. Using the same notation, the commonly used multiplicative embedding
rule is
yi = xi (1 + λwi ); i = 1, . . . , N . (6.7)
Due to its simplicity, the additive rule is widely adopted in the literature. How-
ever, multiplicative watermarking offers a data-dependent watermark casting and
6.3 State-of-the-Art 123
exploits the HVS characteristics in a better way. Nevertheless, most recent additive-
based watermarking methods use some perceptual masks obtained by using psycho-
visual models to take into account the HVS properties.
6.3 State-of-the-Art
There have been few published works on watermarking fingerprint images in the
literature. Pankanti and Yeung [16] proposed a fragile watermarking for fingerprint
image verification. The watermark, in the form of a spatial image, is embedded in
the spatial domain of a fingerprint image by employing a verification key. Before
embedding the watermark, the authors proposed to mix the watermark image to
increase its security level. The proposed method can localise any region of image
that has been tampered with; therefore, it can be used to check the integrity of fin-
gerprint images stored in a database. Experiments were conducted on database of
1,000 fingerprints (4 images each for 250 fingers) and the reported results indicate
that this technique does not lead to a significant performance loss in fingerprint
verification.
Sonia [17] proposed a method to detect any alterations or changes introduced to
an image while being transmitted. The method is based on a local average scheme
where corresponding block by block local average of the transmitted and received
images are compared. The author applied this method to fingerprint and face images.
Ratha et al. [1] proposed a data hiding method for fingerprint images compressed
with the WSQ (Wavelet Scalar Quantisation) compression scheme. The discrete
wavelet transform coefficients are changed during WSQ encoding, by taking into
consideration possible image degradations. This method operates on the quantised
indices to embed the watermark before the final step (Huffman encoder) is applied.
The proposed method here has the advantage of working in compressed domain
which suggests that the distortions introduced by the watermark into a number of
transform coefficients will spread over all the pixels in the spatial domain so as the
changes introduced in these pixel values are visually less significant.
Gunsel et al. [18] described two spatial domain watermarking methods for finger-
print images. The first method utilises gradient orientation analysis for watermark
embedding where the pixel values are changed in a way to keep the quantised gra-
dient orientations around these pixels unchanged. This method is applied before the
feature extraction process. The second method preserves the singular points in the
fingerprint image; hence, it preserves the classification of the watermarked finger-
print image.
Jain and Uludag [19] proposed two application scenarios for hiding biometric
data. The basic data hiding is based on amplitude modulation watermarking method
and it is the same for both applications. The first scenario involves steganography-
based application: the biometric data (fingerprint minutiae) that need to be trans-
mitted are hidden in a host image. However, in this scenario, the security of the
system is based on the secrecy of the communication. The second scenario is based
on hiding facial features, i.e. Eigen-face coefficients, into fingerprint images.
Ahmed and Moskowitz [20] proposed the use of a composite signature-based

watermarking technique for fingerprint authentication. Their technique extends the
PhasemarkTM watermarking technique [21], which was originally developed for
image authentication. In the embedding stage, a signature is first extracted from the
Fourier phase of the original image, and an encoded signature is then hidden back
into the original image to form the watermarked image. The detection process com-
putes the Fourier transform of the watermarked image, extracts the hidden signature
and then correlates it with a calculated signature.
Zebbiche et al. [22] proposed a multi-resolution wavelet-based method to hide
the minutiae data into fingerprint images. The proposed method can be used to trans-
mit the minutiae from point to point, such as from a database to the matcher or from
the feature extractor to the database. The watermark (i.e. minutiae data) is embed-
ded using Quantisation Index Modulation (QIM) method. The method is blind since
the original image is not required when extracting the embedded data. The authors
claim that their method is robust to common attacks such as mean filtering, additive
noise and compression.
Khan et al. [23] proposed a Non-uniform Discrete Fourier Transform (NDFT)-
based watermarking technique to hide fingerprint templates into audio signal. Fin-
gerprint templates are first encrypted by a chaotic encryption, encoded by BCH error
correcting code and modulated by chaotic parameter modulation (CPM) scheme.
Then, the resulting templates are hidden into chaotically selected frequency points
of the original audio signal. The authors claimed that encrypting and modulating
fingerprint templates, and selecting the frequency points chaotically using a secret
key ensure the robustness against steganalysis attack.
6.4 Optimum Watermark Detection

Watermark detection aims to decide whether a given watermark has been inserted or
not into the watermarked image. During the detection process, two types of errors
may occur. In the first case, it detects the presence of a watermark, even if none was
embedded (false alarm). In the second, the detector fails to detect the watermark,
although it is embedded (missed detection). The probabilities of these two types
of errors (i.e. the probability of false alarm (PF A ) and the probability of missed
detection (PM D )) are used to assess the performance of the detection system. It is
worth noting that the probability of true detection (PDet ) is often used instead of the
probability of missed detection and it is defined as PDet = 1− PM D . The probability
function for the detection process is presented in Fig. 6.4.
Early methods were based on computing the correlation between the given water-
mark w ∗ = {w1∗ , . . . , w ∗N } and image coefficients in which the actual watermark was
embedded y = {y1 , . . . , y N }, and then, comparing the calculated correlation with
some threshold T. During the detection process, the correlation value will be very
high for the embedded watermark and would be very low otherwise. However, the
6.4 Optimum Watermark Detection 125
Fig. 6.4 Probability function Threshold

for the detection process
No watermark Watermark
exist
False alarm
Missed detection
commonly used correlation-based detectors are only optimal in the additive case by
assuming that the host data follows a Gaussian model [24, 25].
Recently, the watermark detection is considered as a binary hypothesis test where
the problem is one of taking measurements and then estimating the results in which
of the finite number of states an underlying system resides. More precisely, the
system is the possibly watermarked image and the system observation variable is
the set y = {y1 , . . . , y N } of the possibly watermarked coefficients. Two hypotheses
can be defined: the given image is watermarked by the candidate watermark w∗ =
{w1∗ , . . . , w ∗N } (hypothesis H1 ) or the given image does not contain the candidate
watermark!(hypothesis H0 ). Consequently, the watermark space can be defined as
W = W0 W1 , where W1 = {w ∗ } and W 0 = {w j = w ∗ }, including w = 0 that
corresponds to the case where no watermark exists. The issue now is to define a test
of the simple hypothesis H1 versus the composite alternative H0 that is optimum
with respect to certain criterion.
The likelihood-ratio test, which is a statistical test for making a decision between
two hypotheses based on the value of the likelihood ratio, is adopted. The likelihood
ratio, usually denoted by Λ(y), is defined as
f y (y|W1 )
Λ(y) = (6.8)
f y (y|W0 )
where f y(y|W 1), f y (y|W 0) are the pdfs of the set y conditioned to W1 and
W0 , respectively. By relying on the fact that the coefficients in y are statistically
independent," the pdf of y conditioned to an event " N W1 and W0 can be written as
N
f y (y|W1 ) = i=1 f yi (yi |W1 ) and f y (y|W0 ) = i=1 f yi (yi |W0 ).
Assuming that the watermark components are uniformly distributed in [−1, +1],
W0 is composed by an infinite number of watermarks. Therefore, by using the total
probability theorem [26], the pdf f y (y|W0 ) can be written as:
+1
f yi (yi |W0 ) = f yi (yi |wi ) f wi (wi )dwi (6.9)
−1
where f wi = 1
2
is the pdf of wi . Hence, Eq. 6.8 becomes:
"N ∗
i=1 f yi (yi |wi )
Λ(y) = " N # +1 (6.10)
i=1 −1 f yi (yi |wi )dwi
1
2N
Using the multiplicative rule given by Eq. 6.7, the pdf f yi (yi |wi ) of a water-
marked coefficient yi conditioned to a watermark value wi is given by
1 y1
f yi (yi |wi ) = f xi ( ) (6.11)
1 + λwi 1 + λwi
where f x (x) indicates the PDF of the original unwatermarked coefficients.

Under the assumption that the gain factor is very small (i.e. λ << 1), which is
a reasonable assumption used to respect the transparency requirement of the water-
mark, f y (y|W0 ) can be approximated by f y (y|0), or simply f y (y|W0 ) ≈ f x (y). By
relying on this assumption and using Eq. 6.11, the likelihood-ratio can be written as
yi
$
N
1 f xi ( 1+λw ∗)
Λ(y) = i
(6.12)
i=1
1 + λwi∗ f yi (yi )
The decision rule reveals that the hypothesis H1 is accepted if and only if Λ(y)
exceeds certain threshold η. Further simplification can be made by taking the log-
likelihood ratio, which is defined as the natural logarithm of the likelihood ratio,
l(y) = ln(Λ(y)), so the decision rule becomes

N
yi 1
l(y) = [ln( f xi ( )) − ln( f xi (yi ))] ≷ H
H0 η (6.13)
i=1
1 + λwi∗
N ∗
where η = ln(η) + i=1 −ln(1 + λwi ) is the new threshold.
By employing the Neyman–Pearson criterion [27], the threshold η is obtained
in such a way that the probability of detection PDet is maximised, subject to a fixed
false alarm probability PF A [28]
PF A = P(l(y) > η |W0 )

+∞
(6.14)
= fl(y) (l(y))dl(y)
η
where fl(y) (l(y)) is the pdf of l(y) conditioned to W0 . The variable l(y) is a sum of
statistically independent terms. Therefore, by using the central limit theorem [26],
it can be modelled by a Gaussian distribution with mean μ0 = E[l(y)|W0 ] and
variance σi2 = V ar [l(y)|W0 ]. Finally, the PF A can be written as
6.5 Statistical Data Modelling and Application to Watermark Detection 127

+∞
1 l(y) − μ0 2
PF A = exp − dl(y)
η 2π σ02 2σ02
⎛ ⎞ (6.15)

1 η − μ0 ⎠
= er f c ⎝
2 2σ 2 0
where
# erfc(.) is the complementary error function, given by er f c(x) =
2 +∞ −t 2
π x
e dt. By fixing the value of PF A , the threshold η can be obtained using
the equation

η = er f c−1 (2PF A ) 2σ02 + μ0 (6.16)
6.5 Statistical Data Modelling and Application

to Watermark Detection
The most apparent structural characteristic of a fingerprint is a pattern of inter-

leaved ridges and valleys. In a fingerprint image, ridges are dark whereas valleys
are bright. Ridges and valleys often run in parallel; sometimes they terminate and
sometimes they bifurcate. Consequently, a fingerprint is rich in textures and detail
information. It has been found that edges and textures are usually well confined
to the DWT coefficients of the high-frequency subbands. Furthermore, watermark-
ing in the DWT is also very robust to wavelet-based compression methods such as
WSQ system which is the standard adopted by the FBI and many other investiga-
tion agencies. It is well known that a DWT at level l produces: (i) a low resolution
subband (LL), (ii) high-resolution horizontal subbands (H L l , H L l−1 , . . . , H L 1 ),
(iii) high-resolution vertical subbands (L Hl , L Hl−1 , . . . , L H1 ) and (iv) high-
resolution diagonal subbands (H Hl , H Hl−1 , . . . , H H1 ). It is worth noting that for
the reason of imperceptibility, a watermark should only be embedded in high-
resolution subbands where the human eye is less sensitive to noise and distortions
[29, 30].
As shown in the previous section, the structure of the detector will lead to reliable
results if the data (i.e. the DWT coefficients) are modelled as accurately as possible.
Therefore, the optimality of the optimum detector depends on the distribution used
to model the statistical behaviour of the host data. In this section, an investigation of
the best statistical model that best characterise the DWT coefficients of fingerprint
images is carried out on the well-known and used distributions: Laplacian, gener-
alised Gaussian and Cauchy distributions. The Gaussian model is not used in this
work because it has been shown that such a distribution is a poor model for the DWT
coefficients.
6.5.1 Laplacian and Generalised Gaussian Models

A generalised Gaussian distribution (GGD) is a flexible parametric distribution fam-
ily that incorporates various distribution shapes such as uniform, Gaussian, Lapla-
cian and even more highly peaked distributions with exponentially decaying heavy
tails. It was widely used in the literature to model the statistical behaviour of the
DWT coefficients [31, 10]. The pdf of a zero-mean GGD is given by

β |xi | β
f X (xi ; α, β) = exp − (6.17)
2αΓ(1/β) α
#∞
where Γ(.) is the Gamma function, Γ(z) = 0 e−t t z−1 dt, z > 0. The parameter α is
referred to as the scale parameter and it models the width of the PDF peak (standard
deviation) and β is called the shape parameter and it is inversely proportional to the
decreasing rate of the peak (see Fig. 6.5). Note that β = 1 and β = 2 yield Laplacian
and Gaussian distributions, respectively. The value β = 0.5 is widely used in the
literature; however, an accurate estimate of the parameters α and β can be found as
described in [31]. By replacing the pdf of the GGD in Eq. 6.13, the detector can be
defined by
N
|yi | βi
l(y) = 1 − |1 + λwi∗ |−βi (6.18)
i=1
αi
The threshold η can be obtained using Eq. 6.16 where
N
1
μ0 = [1 − |1 + λwi∗ |−βi ] (6.19)
i=1
βi
Fig. 6.5 pdf of the

generalised Gaussian
distribution
6.5 Statistical Data Modelling and Application to Watermark Detection 129
and
N
1
σ02 = [1 − |1 + λwi∗ |−βi ]2 . (6.20)
i=1
βi
Laplacian model is simpler than the generalised Gaussian one since the latter
requires the use of interpolation methods to estimate the shape parameter. It has
been used to model the DWT coefficients in [32,33]. In this chapter, the Laplacian
pdf is obtained by letting β = 1 in Eq. 6.17. Also, the Laplacian detector can be
obtained by substituting β by 1 in equations Eq. 6.18, 6.19, and 6.20.
6.5.2 Alpha Stable Model

The symmetric alpha-stable (SαS) distribution family has recently gained con-
siderable interest due to both empirical and theoretical reasons and also due to
their capability of modelling heavy-tailed data in various applications. It has been
used in [34] to model the DWT coefficients, especially to take into account the
heavy-tailed coefficients. The SαS can be best determined by its characteristic
function
ϕ(ω) = exp( jδω − γ |ω|α ) (6.21)
where α (0 < α ≤ 2) is the characteristic exponent, δ (−∞ < δ < ∞) corresponds

to the location parameter, and γ (γ > 0) represents the scale parameter, known
also as the dispersion. For values of α in the interval [1,2], the location parame-
ter δ corresponds to the mean of the SαS distribution, while for 0 < α < 1, it
determines its median. The dispersion parameter γ determines the spread of the dis-
tribution around the location parameter δ, similarly to the variance of the Gaussian
distribution.
The characteristic exponent α is the most important parameter of the SαS distri-
bution since it determines its shape. The smaller the α is, the heavier the tails of the
SαS density and the corresponding random process displays high impulsiveness,
while higher values of α correspond to distributions that approach the Gaussian dis-
tribution. Actually, no closed-form expressions for pdf of SαS random variables are
known except for α = 2 and α = 1, which correspond to the Gaussian and Cauchy
distributions, respectively. The Cauchy pdf is given by
1 γ
f X (x; γ , δ) = (6.22)
π γ 2 + (x − δ)2
where δ(−∞ < δ < ∞) corresponds to the location parameter, and (ξ > 0) repre-
sents the scale parameter, also known as the dispersion parameter. The peak shape
of a Cauchy distribution is controlled by ξ . The smaller the value of, the narrower
is the peak shape and vice versa (see Fig. 6.6.)
Fig. 6.6 Pdf of Cauchy

distribution
The two parameters γ and δ can be estimated from the data set using the con-
sistent ML method described by Nolan [35], which gives reliable estimates and
provides the most tight confidence intervals. The use of the Cauchy distribution in
Eq. 6.13 leads to the following watermark detector:
2

N
yi
l(y) = ln(γ + (yi − δ) ) − ln γ +
2 2 2
−δ (6.23)
i=1
1 + λwi∗
For the sake of simplicity, the mean μ0 and the variance σ02 are estimated numer-
ically by evaluating l(y) for n fake sequences {w j : w j ∈ [−1, +1]; 1 ≤ j ≤ n}, so
that the estimated mean and variance of l(y) are given by
1
n
μ0 = lj (6.24)
n j=1
1
n
σ02 = (l j − μ0 )2 (6.25)
n − 1 j=1
where l j represents the log-likelihood ratio corresponding to the sequence w j .

Through experiments, we found that a good estimation of μl0 and σl02 , with rea-
sonable computational complexity, can be obtained by letting n = 100.
6.6 Experimental Results
In this section, modelling the DWT coefficients and the performance of the
detectors discussed earlier are evaluated. A wide range of real fingerprint
6.6 Experimental Results 131
(a) (b)
(c) (d)
Fig. 6.7 Test images with different visual quality: (a) “Image 22 1: good quality with normal
ridges area”, (b) “Image 83 1: good quality with large ridges area”, (c) “Image 43 8: small ridges
area (latent fingerprint)” and (d) “Image 68 7: poor quality”
images from Fingerprint Verification Competition “FVC 2000, DB3” database

http://biometrics.cse.msu.deu/fvc00db/index.html and FVC 2004 are examined
[36]. Without loss of generality, the results on sample images shown in Fig. 6.7
are plotted since the results obtained with other images are very similar. These test
images are chosen in order to take into account the different visual qualities, rang-
ing from high quality to low quality, thus, allowing the modelling results to be more
general and reliable. In these experiments, a three-level DWT using Daubechies’
linear-phase 9/7 wavelet is used because it has been adopted as part of the WSQ
compression standard.
6.6.1 Experimental Modelling of DWT Coefficients

A set of experiments have been carried out to investigate the best distribution that
can model the statistical behaviour of the DWT coefficients of fingerprint images.
To do so, two different sets of tests were carried out. In the first test, the similarity
between the real distribution and the of DWT coefficients against those obtained
by the GGD, Laplacian and Cauchy distributions is evaluated by using the relative
entropy or Kullback–Leibler (K–L) divergence, while in the second set of tests, the
Quantile–Quantile (Q–Q) plots are examined.
The relative entropy is a measure of the difference between two probability dis-
tributions: from the real distribution p to an arbitrary probability distribution q. The
smaller the K–L divergence is, the more similar are the distributions and vice versa.
The K–L divergence is given by

pi
D K −L ( p||q) = Σi pi ln (6.26)
qi
The results obtained are reported in Table 6.1 and clearly show that the GGD pro-
vides the smallest K–L divergence for all images. On the other hand, this divergence
is larger when using a Cauchy model.
A Q–Q plot is a graphical technique for determining if two data sets are gen-
erated from populations having a common distribution. It is a plot of the quantiles
of the first data set against the quantiles of the second data set. If the two data sets
are taken from two populations with the same distribution, the points should fall
approximately along a reference line. The greater the departure from this reference
line, the greater the evidence that the two data sets have been generated from popu-
lations with different distributions. In our experiments, for a given fingerprint image
we first estimate the parameters for each model from the DWT coefficients and then
generate a large number of random samples drawn from the corresponding model
having the estimated parameters. The quantiles of the real DWT coefficients against
the quantiles of the random generated samples are plotted.
For the Q–Q plot corresponding the GGD, most of the ‘+’ marks have a straight
line shape for Image 22 1 (Fig. 6.8a) and Image 86 7 (Fig. 6.9b), deviating slightly
from the reference line for Image 83 1 (Fig. 6.8b) but with more significant devia-
tion for Image 43 8 (Fig. 6.9a). For the Laplacian model, most of the ‘+’ marks of
Table 6.1 K–L divergence of the high-resolution DWT subbands obtained using Daubechies 9/7
wavelet at the 3rd level. LH : horizontal subband; LH: vertical subband; HH: diagonal subband.
Image 22 1 Image 83 1 Image 43 8 Image 86 7
GGD 0.0582 0.0661 0.1530 0.0376

Laplacian 0.1267 0.1808 0.7098 0.1224
Cauchy 0.1741 0.2332 0.0893 0.1542
GGD Q−Q Plot for Image 22__1 GGD Q−Q Plot for Image 83__1
−400 −200 0 200 400 600 −600 −400 −200 0 200 400 600 800
DWT Coefficients Quantiles DWT Coefficients Quantiles
(a) (b)
Laplacian Q−Q Plot for Image 22__1 Laplacian Q−Q Plot for Image 83__1
−400 −200 0 200 400 600 −600 −400 −200 0 200 400 600 800
(c) (d)
Cauchy Q−Q Plot for Image 22__1 Cauchy Q−Q Plot for Image 83__1
−400 −200 0 200 400 600 −600 −400 −200 0 200 400 600 800
(e) (f)
Fig. 6.8 Q–Q plots of DWT coefficients of sample images (Left: “Image 22 1”, Right: “Image
83 1”) for different models (Top: GGD; Middle: Laplacian; Bottom: Cauchy.)
GGD Q−Q Plot for Image 43__8 GGD Q−Q Plot for Image 86__7
−600 −400 −200 0 200 400 600 800 −300 −200 −100 0 100 200 300 400
(a) (b)
Laplacian Q−Q Plot for Image 43__8 Laplacian Q−Q Plot for Image 86__7
−600 −400 −200 0 200 400 600 800 −300 −200 −100 0 100 200 300 400
(c) (d)
Cauchy Q−Q Plot for Image 43__8 Cauchy Q−Q Plot for Image 86__7
−3000 −2000 −1000 0 1000 2000 3000 4000 −600 −400 −200 0 200 400 600 800
DWT Coefficients Quantiles DWT Coefficinets Quantiles
(e) (f)
Fig. 6.9 Q–Q plots of DWT coefficients of sample images (Left: “Image43 1”, Right: “Image
86 7”) for different models (Top: GGD; Middle: Laplacian; Bottom: Cauchy.)
the Q–Q plot follow a straight line but with a significant deviation from the reference
line for Image 22 1 (Fig. 6.8c), Image 83 1 (Fig. 6.8d) and Image 86 7 (Fig. 6.9d).
However, the marks follow a curve shape for Image 43 8 (Fig. 6.9c). For the Cauchy
model, the ‘+’ marks in the plot have also curve like shape which does not follow
a straight line for all test images (Figs. 8e,f and 9e,f). In conclusion, the Q–Q plots
for all fingerprint images show that the GGD provide the best fit for the DWT coef-
ficients.
The results obtained for modelling the DWT coefficients reveal that the detector
based on the GGD is expected to yield better watermark detection performances
than those based on Laplacian and Cauchy models. Moreover, the Laplacian is
expected to provide good/acceptable detection results. It is worth noting that none of
the three used distributions model accurately the coefficients distribution of Image
43 8. The reason is that in this image, the region of interest (or the ridges’ area) is
somewhat small when compared to the overall size of the image (i.e. most of it is
composed by smooth area or background).
6.6.2 Experimental Watermarking Results
In these experiments, the watermarks are cast in all coefficients of the high-
resolution horizontal (H L), vertical (L H ) and diagonal (H H ) subbands at level 3.
Two main issues are considered here. First, the imperceptibility of the watermark
is quantitatively evaluated by using the Peak Signal-to-Noise Ratio (PSNR). Sec-
ond, the detection performance is assessed by the probability of false alarm and the
probability of true detection. It is worth mentioning that the watermark consists of
12,090 (4,030 coefficients/subband) random real numbers uniformly distributed in
the range [−1, +1].
6.6.2.1 Imperceptibility Analysis

First, the dispersion of the watermark in the spatial domain is assessed. Figure 6.10
shows the difference image between the original image and its corresponding water-
marked image. As can be observed from the difference image, the watermark is
concentrated in the ridges area. This is justified by the fact that the DWT provides
an image-dependent distortion which is mostly concentrated at edges and textured
areas.
The second set of experiments were conducted to evaluate the fidelity which
is a measure of the similarity between the watermarked data and the original one.
This is done by using the PSNR, which is the widely used distortions measure in
the literature. Figure 6.11 shows the PSNR of test images with different watermark
strength λ. As can be seen, bad quality images and images with less textures (i.e.
small ridges area) provide higher values of PSNR. This is justified by the fact that
the DWT coefficient of such images are smaller. Indeed, since the watermark cast-
ing depends on the coefficient amplitude in the multiplicative case, the watermark
magnitude added to the image will be smaller. However, this fact does not reflect
(a) (b)
(c) (d)
Fig. 6.10 Difference image between the original image and its corresponding watermarked one:
(a) “Image 22 1”, (b) “Image 83 1”, (c) “Image 43 8” and (d) “Image 68 7”
the true fidelity because it is well known that the human visual system is less sen-
sitive to changes in such regions compared to smooth and non textured areas. For
instance, the PSNR, as a perceptual model, suggests that the watermarked image
of Image 43 8 should be perceptually better than the watermarked image of Image
83 1; however, the watermarked image 43 1 shows more visible distortions when
compared to the watermarked Image 83 1.
6.6.2.2 Detection Performance

In order to evaluate the performance of the detectors, the test images were water-
marked using λ = 0.10. The Receiver Operating Characteristics (ROC) curves,
which are widely adopted in the literature, were used to assess the performance
Image 22__1
Image 83__1
Image 43__8
Image 86__7
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45

Watermark Strength
Fig. 6.11 PSNR of watermerked images
of the detectors. The ROC curves represent the variation of the probability of true
detection (PDet ) against the probability of false alarm (PF A ). A perfect detection
yields a point at coordinate (0,1) of the ROC space meaning that all given water-
marks were detected without any false alarm. The theoretical false alarm is set to
the range 10−4 to 10−1 . The experimental ROC curves are computed by measuring
the performance of the actual watermark detection system by calculating the proba-
bility of detection from real watermarked images. Experiments are then conducted
by comparing the likelihood ratio with the corresponding threshold for each value
of the false alarm probability and for 1000 randomly generated watermarks. If the
likelihood ratio is above the threshold under H1 , the watermark is detected and if it
is above the threshold under H0 , a false alarm occurs.
A blind detection is used so that the parameters of each detector are directly
estimated from the DWT coefficients of the watermarked image. It is worth noting
that the optimal parameter values for both the GGD and the Cauchy distribution
may be different for each DWT coefficient, but for practical purposes a constant
value over all coefficients suffices. The results on the sample images are plotted
in Fig. 6.12.
As can be seen and as it was expected for all images, the performance of the
GGD detector is significantly better than that obtained for the Laplacian and Cauchy
detectors. In addition, the Laplacian detector provides close results to the GGD one
for images (Image 22 1, 83 1 and 86 7). It is worth noting that all detectors generate
very high false alarm for Image 43 8. In general, images with large and well-defined
ridge area provide good detection performance because such images have higher
GGD GGD
Laplacian model Laplacian model
Cauchy model Cauchy model
10−2 10−1 100 10−2 10−1 100

Probability of False Alarm Probability of False Alarm
(a) (b)
GGD GGD
Laplacian model Laplacian model
Cauchy model Cauchy model
10−1 100 10−2 10−1 100

Probability of False Alarm Probability of False Alarm
(c) (d)
Fig. 6.12 Difference image between the original image and its corresponding watermarked one:
(a) “Image 22 1”, (b) “Image 83 1”, (c) “Image 43 8” and (d) “Image 68 7”
DWT coefficients allowing the embedding of watermarks with higher amplitudes.

Indeed, higher values for watermark amplitude make the hypotheses H0 and H1
more distinguishable.
6.7 Conclusions
This chapter addresses the 1-bit multiplicative watermark detection for fingerprint
images. Watermarking technique can be a solution for securing fingerprint data
and thwart against some attacks that may affect the reliability and the secrecy of
fingerprint-based systems. Watermarking system can be divided in two main pro-
cesses: embedding and detection. In this chapter, we have focussed on the water-
mark detection which aims to detect whether a given watermark was embedded into
the host data. The problem of detection is formulated theoretically based on a ML
References 139
estimation scheme requiring an accurate statistical modelling of the host data. This
theoretical formulation allows for the derivation of optimal detector structures; this
optimality of the detector structure depends on the accuracy of the statistical distri-
bution used to model the statistics of the host data.
The watermark is embedded into the DWT domain because the ridges and tex-
tures are usually well confined to the DWT coefficients of the high-frequency sub-
bands. In addition, watermarking in the DWT is very robust to compression methods
such as WSQ compression which is the standard adopted by the FBI and many other
investigation agencies.
First, the modelling of the DWT coefficients is carried out to determine the best
model. The generalised Gaussian, Laplacian and Cauchy models were investigated
and compared and the experimental results reveal that the GGD distribution provides
the best model that can represent the distribution of the DWT coefficients.
Then, the structures of the optimum detector of the three models were derived
and the performance of the detectors were assessed through extensive experiments.
It has been found that the detector based on the GGD outperforms the Laplacian-
based detector, which in turn, significantly outperforms the Cauchy detector. The
overall performance of the detectors is dependent on the fingerprint characteristics.
This dependence is related to the size of the ridge area relative to the size of the
fingerprint image. The bigger the ridge area, the higher the detection performance.
References
1. N. K. Ratha, H. H. Connell and R. M. Bolle, “An analysis of minutiae matching strength,” In
The 3rd International Conference on Audio-and Video-Based Biometric Person Authentica-
tion (AVBPA2001), vol. 2091, pp. 223–228, 2001.
2. B. Schneier, “The uses and abuses of biometrics,” Communications of the ACM The
3rd International Conference on Audio-and Video-Based Biometric Person Authentication
(AVBPA2001), vol. 42, no. 8, pp. 136, August 1999.
3. D. Maltoni, D. Maio, A. K. Jain and S. Prabhakar, “Handbook of Fingerprint Recognition,”
Springer, New York, 2003.
4. N. Hartung and M. Kutter, “Multimedia watermarking techniques,” Proceeding of IEEE, vol.
42, no. 8, pp.1079–1107, 1999.
5. M. D. Swanson, M. Kobayashi and A. H. Tewk, “Multimedia data-embedding and watermark-
ing technologies,” Proceeding of IEEE, vol. 86, pp. 1064–1087, 1998.
6. M. Yoshida, T. Fujita and T. Fujiwara, “A new optimum detection scheme for additive water-
marks embedded in spatial domain,” International Conference on Intelligent Information Hid-
ing and Multimedia Signal Processing (IIH-MSP 2006), pp. 101–104, December 2006.
7. I. G. Karybali and K. Berberidis, “Efficient spatial image watermarking via new perceptual
masking and blind detection schemes,” IEEE Transaction Information Forensics and Security,
vol. 1, no. 2, pp. 256–274, June 2006.
8. J. R. Hernandez, M. Amado and F. Perez-Gonzales, “Dct-domain watermarking techniques
for still images: Detector performance analysis and a new structure,” IEEE Transactions on
Image Processing, vol. 9, no. 1, pp. 55–68, January 2000.
9. A. Briassouli, P. Tsakalids and A. Stouraitis, “Hidden messages in heavy-tails: Dct-domain
watermark detection using alpha-stable models,” IEEE Transactions on Image Processing,
vol. 7, no. 4, pp. 700–715, August 2005.
10. T. M. Ng and H. K. Garg, “Wavelet domain watermarking using maximum-likelihood detec-

tion,” Proceeding of SPIE Security, Steganography, and Watermarking of Multimedia Con-
tents, vol. 5306, pp. 816–826, June 2004.
11. F. Kheli, A. Bouridane, F. Kurugollu and I. Thompson, “An improved wavelet-based image
watermarking technique,” Proceeding of IEEE International Conference on Advanced Video
and Signal Based Surveillance (AVSS2005), pp. 588–592, August 2005.
12. M. Barni, F. Bartolini, A. De Rosa and A. Piva, “A new decoder for the optimum recovery of
nonadditive watermarks,” IEEE Transaction on Image Processing, vol. 10, no. 5, pp. 755–765,
May 2001.
13. Q. Cheng and T. S. Huang, “Optimum detection and decoding of multiplicative watermarks in
dft domain,” Proceeding of IEEE International Conference on Acoustic, Speech, and Signal
Processing (ICASSP2002), pp. IV–3477–IV–3480, May 2002.
14. J. J. K. Ruanaidh and T. Q. Pun, “Rotation, scale and translation invariant spread spectrum
digital image watermarking,” Signal Processing, vol. 66, no. 3, pp. 303–318, 1998.
15. C. Y. Lin, M. Wu, J. A. Bloom, I. J. Cox, M. Miller and Y. M. Lui, “Rotation, scale, and trans-
lation resilient public watermarking for images,” IEEE Transactions on Image Processing,
vol. 10, no. 5, pp. 767–782, May 2001.
16. S. Pankanti and M. M. Yeung, “Verication watermarks on fingerprint recognition and
retrieval,” Proceeding SPIE, Security and Watermarking of Multimedia Contents, vol. 3657,
pp. 66–78, 1999.
17. S. Jain, “Digital watermarking techniques: A case study in fingerprints and faces,” Proceeding
of Indian Conference on Computer Vision, Graphics and Image Processing, pp. 139–144,
2000.
18. B. Gunsel, U. Umut and A. M. Tekalp, “Robust watermarking of fingerprint images,” Pattern
Recognition, vol. 35, no. 12, pp. 2739–2747, 2002.
19. A. K. Jain and U. Uludag, “Hiding biometric data,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 25, no. 11, pp. 1494–1498, November 2003.
20. F. Ahmed and I. S. Moscowit, “Composite signature based watermarking for fingerprint
authentication,” Proceeding of 7th Workshop Multimedia and Security, pp. 137–142, 2005.
21. F. Ahmed and I. S. Moscowit, “A correlation-based watermarking method for image authen-
tication application,” Optical Engineering Journal, vol. 43, no. 8, pp. 1833–1838, 2004.
22. K. Zebbiche, L. Ghouti, F. Kheli and A. Bouridane, “Protecting fingerprint data using water-
marking,” Proceeding of the 1st AHS conference, pp. 451–456, June 2006.
23. M. K. Khan, L. Xie and J. Zhang, “Robust hiding of fingerprint-biometric data into audio
signals,” Proceeding of the 2nd International Conference on Biometrics (ICB2007), vol.
4642/2007, pp. 702–712, August 2007.
24. G. F. Elmasri and Y. Q. Shi, “Maximum likelihood sequence decoding of digital image
watermarks,” Proceeding of SPIE Security and Watermarking of Multimedia Contents
pp. 425–436, 1999.
25. Q. Cheng and T. S. Huang, “An additive approach to transform-domain information hiding and
optimum detection structure,” IEEE Transaction on Multimedia, vol. 3, no. 3, pp. 273–284,
September 2001.
26. A. Papoulis, “Probability, Random Variables, and Stochastic Processes,” McGraw-Hill, New
York, 1991.
27. J. V. Di Franco and W. L. Rubin, “Radar Detection,” SciTech Publishing, Raleigh, January
2004.
28. T. Ferguson, “Mathematical Statistics: A Decision Theoretical Approach,” Academic Press,
New York, 1967.
29. X. G. Xia, C. G. Bonklet and G. R. Acre, “Wavelet transform based watermark for digital
images,” Optics Express, vol. 3, no. 12, pp. 497–511, December 1998.
30. G. C. Langelaar, I. Styawan and R. L. Lagendijk, “Watermark digital image video data:
A state-of-art overview,” IEEE Signal Processing Magazine, vol. 17, no. 5, pp. 20–46,
September 2000.
References 141
31. M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density
and Kullback-Leibler distance,” IEEE Transaction on image processing, vol. 11, no. 2, pp.
146–158, February 2002.
32. Y. Hu, S. Kwong and Y. K. Cha, “The design and application of dwtdomain
optimum decoders,” In First International Workshop, IWDW2003, vol. 2613/2003,
pp. 25–28, 2003.
33. T. M. Ng and H. K. Garg, “Maximum likelihood detection in dwt domain image watermarking
using Laplacian modeling,” IEEE Signal Processing Letters, vol. 12, no. 4, pp. 285–288, April
2005.
34. G. Tzagkarakis and P. Tsakalides, “A statistical approach to texture image retrieval via alpha-
stable modeling of wavelet decomposition,” In 5th International Workshop on Image Analysis
for Multimedia Interactive Services (WIAMIS), pp. 21–23, 2004.
35. J. P. Nolan, “Maximum Likelihood Estimation and Diagnostics for Stable Distributions,”
Technical report, American University, Washington June 1999.
36. R. Cappelli, D. Maio, D. Maltoni, J. L. Wayman and A. K. Jain, “Performance Evaluation
of Fingerprint Verification Systems,” IEEE Transactions on Pattern Analysis Machine Intelli-
gence, vol. 28, no. 1, pp. 3–18, January 2006.
Chapter 7
Shoemark Recognition for Forensic Science:
An Emerging Technology
SoleMate, Foster & Freeman’s database of shoeprints, proved

its worth recently during the investigation of a murder of a
woman in the kitchen of her Coventry home in the U.K. Officers
from West Midlands police station was confirmed by this
database that the suspect’s shoeprint had been produced by a
Mountain RidgeTM shoe, unique to JJB Sports, a UK nationwide
shoe retailer, and that there were two models using that
particular sole, and this footwear was first available during the
summer of 2002. With this information, police officers were able
to locate the store in which the footwear was purchased and to
trace the shop’s copy of the actual receipt issued to the suspect,
later confirmed when an empty shoe box was found at the
suspect s home.
Foster and Freeman Ltd.
It is generally understood that marks left by an offender’s footwear at a crime scene

may be helpful in the subsequent investigation of the crime. For example, shoeprints
can be useful to likely link crime scenes that have been committed by the same per-
son. They can also be used to target the most prolific offenders by allowing officers
to be alerted to watch out for particular shoemarks. Also, shoeprints can provide use-
ful intelligence during interviews so that other crimes can be brought to an offender.
Finally, they can provide evidence of a crime if a shoeprint and suspect’s shoe match.
This chapter is intended to introduce the emerging shoemark evidence for foren-
sic use. It starts by giving a detailed background of the contribution of shoemark
data to scene of crime officers including a discussion of the methods currently in
use to collect shoeprint data. Methods for the collection of shoemarks will also be
detailed and problems associated with each method highlighted. Finally, the chapter
gives a detailed review of existing shoemark classification systems.
7.1 Background to the Problem of Shoemark Forensic Evidence
A shoemark is a mark made when the sole of a shoe comes into contact with a sur-
face. People committing crimes inevitably leave their shoemarks at the crime scene.
Technology, DOI 10.1007/978-0-387-09532-5 7,

144 7 Shoemark Recognition for Forensic Science: An Emerging Technology
Each time a person takes a step, there is no doubt that some sort of interaction
between his (her) shoes and the surface occurs. This could be a deformation of
that surface or the exchange of trace materials and the residue from the shoe to the
surface. In the case where the surface is deformable e.g. snow and sand, a three-
dimensional impression is created as a result of the pressure exerted on that surface.
When the surface is solid a visible pattern may still be transferred to the surface
from a sole. This is as a result of an exchange of materials between the shoe and the
surface. It should be noted that not all of the shoemarks are visible or detectable with
the limit of the current technologies, but the chances are excellent that a great num-
ber of them will be. The author [6] claimed that there should be equal and perhaps
even greater chance that footwear impressions could be present at a crime scene,
compared with the presence of latent fingerprints. So far, the later has been widely
accepted as a powerful tool in forensic applications, while the footwear impressions
also are being realised as a potential assist in forensic investigations. A study in [2]
suggests that footwear impressions could be located and retrieved at approximately
35% of all crime scenes. Statistics from Home Office of United Kingdom show that
14.8% of crime scenes attended by crime scene investigators in 2004–2005 yielded
shoe print evidence. The crimes investigated consisted primarily of burglaries. It
is also reported that by emphasising the potential evidence of shoemarks to crime
scene personnel and by teaching the basics of locating and recovering footwear
impressions, the percentage of cases in which footwear impression evidence was
now being submitted to the laboratory had increased from less than 5% to approx-
imately 60% [6]. Figure 7.1 shows some examples of shoemarks recovered from
different crime scenes.
7.1.1 Applications of a Shoemark in Forensic Science

As a form of physical evidence, shoemarks provide an important link between the
criminals and the place where the crime occurred. In some cases, shoemarks can be
positively identified as having been made by a specific shoe to the exclusion of all
other shoes, which very often relies on a physical match of random individual char-
acteristics the shoe has acquired with those respective features in the impressions
(Fig. 7.2). Here, the physical match means that the questioned impression and the
known shoe share one or more confirmed random characteristics, such as accidental
cuts, tears, or other random features of varied size and shape. This match could not
be repeated on another outsole in the opinion of a qualified shoemark expert.
In other cases, the detail revealed in a shoemark may not be sufficient to pos-
itively identify a specific shoe, but the information may still be very significant,
and can reveal the type, make, description, and approximate or precise size of the
footwear that made them, so may still be very useful for forensic investigations. The
following is the list of some of the forensic applications of shoemarks:
7.1 Background to the Problem of Shoemark Forensic Evidence 145
Fig. 7.1 Examples of shoeprint images retrieved from scene of crimes (Foster & Freeman Ltd.)
Fig. 7.2 Examples of

random characteristics of a
shoemark
• Assisting in the process of developing a suspect. There is no doubt that, in many

cases, a positive identification can be applied as a sound evidence in the court of
justice. Furthermore, some less positive identification can also be used to verify
or rebut the information provided by witnesses or suspects. It can also provide
investigators with a clue of who may be considered as a suspect.
• Assisting in the reconstruction of a crime. The locations, characteristics, and the
orientations of a shoemark can often help in determining the number of suspects,
their path into, through, and away from the crime scene.
• Assisting in building the link of a number of crimes, especially for burglaries.
It is known that the majority of crimes are committed by repeat offenders and
that it is common for burglars to commit a number of offences in the same day
[11]. As it would be unusual for an offender to discard their footwear between
committing different crimes, timely identification and matching of shoemarks
allows various crime scenes to be linked. The linking of crime scenes provides
more information about the activities of an offender and therefore increases the
chances of identifying the responsible party, and secondly, it is more efficient to
investigate the crimes of an offender as a whole than individually [14].
7.1.2 The Need for Automating Shoemark Classification

In this section, justification is provided for the research works described in the fol-
lowing chapters. The section begins by describing some of the problems that occur
with existing systems. These problems result in shoemarks being poorly classified
and/or misidentified.
Many systems already exist for the purpose of classifying and identifying shoe-
marks. Some are computerised and some are manual, but no system, commercial or
otherwise, performs the identification process automatically.
7.1 Background to the Problem of Shoemark Forensic Evidence 147
Currently only about 14% of shoemarks collected from a crime scene are actu-
ally examined and a much smaller number of these actually identified. The best
existing systems only recognise about 25% of recovered shoemarks which is only
about 3.5% of the total available shoemarks at crime scenes [10]. The statistic pro-
vided relates to shoemarks collected and examined in Holland. Informal conver-
sation with British forensic scientists suggests that figures in the UK are roughly
the same.
There are several reasons why more impressions are not identified. They fall
mainly into one of three categories: inconsistent classification, importable classifi-
cation schema and insufficient recognition time.
7.1.3 Inconsistent Classification
Inconsistent classification primarily occurs because of the large number of shoemark

descriptors required to uniquely classify a shoemark. Shoemark descriptors are the
names used to describe the particular patterns seen in a shoemark image. This type
of classification is called feature-based classification and its success relies on finding
and identifying the key features of the shoemark pattern. Until the late 1970s shoe
soles did not contain much variation. Often several different manufacturers used the
same sole for a number of different models of shoes and the complexity of the sole
pattern was low. Different size soles were simply cut from the same sheet of rubber
rather than using scaled patterns. This meant that when classifying a shoemark it
was often impossible to distinguish between several different shoemarks, although
it did mean that the investigator responsible for classifying and cataloguing the mark
had a comparatively easy task. When subsequently searching for a shoemark, it was
likely that a large number of possible matches would be found. This required the
forensic scientist to compare accidental characteristics of each of them to achieve a
final match. Accidental characteristics are defects in the shoemark caused either by
the manufacturing process or shoe sole wear that occurs over time. When trying to
identify a particular suspect’s shoemark, rather than just a brand or model, accidental
characteristics must be examined. However, this process is very time consuming
and effective classification of shoemarks found at crime scenes will help reduce the
number of shoemarks that will have to be compared in this way.
With the advent of injection moulding and modern manufacturing techniques
the number of different shoe soles on the market rose dramatically. To be able to
distinguish between the different soles more and more descriptors are needed to be
added to the existing classification schema to prevent two different soles generating
the same classification. Whereas this theoretically increased the resolving power of
the classification scheme, it introduced a new source of error. When the number
of features was small an individual could consistently classify a shoemark. As the
number of features increased, the possibility of inconsistently describing the feature
also increased. For example, if the only descriptors available were straight and wavy
line it would be easier to consistently classify a sole with wavy lines patterns than if
the descriptors also included zigzag line and curvy line (Fig. 7.3).
Fig. 7.3 Example of features

and descriptors
While shoemark classification remains a human subjective process it is unlikely

to be consistent. When a single operator is responsible for classifying all shoemarks
that are submitted to a laboratory this problem is minimised as the operator learns to
classify consistently. However, in practice, a single laboratory often has to deal with
too many impressions for it to be practical for a single operator to classify them all.
7.1.4 Importable Classification Schema

In the past most forensic laboratories had paper catalogues containing stock images
of manufacturers’ shoe soles. These catalogues were often organised in a propri-
etary way making it difficult for different laboratories to share information. It also
required a skilled operator who knew the system making it difficult to operate if that
person is absent.
Later, as laboratories began to migrate their existing paper systems to computers,
experts acknowledged that there was a need to create a standardised classification
scheme. They saw this migration as an ideal time to implement it. As such, many
new schemes were proposed but universal agreement on a standard was not agreed.
Many of the new schemes were implemented and used in various forensic laborato-
ries around the world. A major problem now exists because the experts missed the
opportunity to create a single all encompassing standard. Most of the major police
7.2 Collection of Shoemarks at Crime Scenes 149
forces in Europe use their own proprietary classification scheme and in some coun-
tries they use more than one. In an age where there is very little restriction on where
people can travel, it has become ever more desirable that laboratories in different
locations can quickly and efficiently share data. For example if a crime is commit-
ted in Holland and the suspect crosses the border into Belgian, the Dutch police
force has no way of sharing their shoemark intelligence with the Belgian police
force other than to share the original shoemark and allow the Belgium police to
reclassify it. The problem of proprietary classification schemes has not been solved
but merely exacerbated. In addition where different laboratories use the same classi-
fication schema they are still unlikely to produce the same classification when given
identical shoemarks to classify [12].
7.1.5 Shoemark Processing Time Restrictions
In areas with a high crime rate the number of shoemarks produced is often too large
for the investigating team to practicably examine all of them. This is true for both the
number of shoemarks created at the crime scene and often for the number of shoe-
marks actually collected. For this reason valuable shoeprint evidence, especially in
lower profile cases such as property crime, is often not utilised. For example within
a couple of months of starting a new shoemark identification programme in one East
London borough a backlog of 1,500 shoemarks built up [13].
7.2 Collection of Shoemarks at Crime Scenes
In the previous section we identified several problems that occur during the identi-
fication of shoemarks. It can be seen that many of these arise from problems asso-
ciated with the initial classification of the shoemark.
One approach to minimising inconsistent classification due to human subjective
interpretation of the marks has been to limit the number of descriptors used during
classification. This is in stark contrast to the natural evolution of shoemark classifi-
cation schema that has tended to add more and more descriptors as the complexity of
shoemarks increased. Some modern classification systems under development have
invested time in determining the smallest set of descriptors that can be used and still
provide effective classification. However, reducing the number of descriptors gen-
erally does reduce the resolving power of the system and therefore its usefulness.
There are a number of researchers working on the problem of reducing classifi-
cation error and their particular approaches are discussed in Section 7.4 including
a description of a number of the most common existing shoemark classification
schemes.
Parallel to the effort being made by researchers in forensic science there is also
a considerable effort being made in the image processing domain to develop algo-
rithms for general purpose image classification and recognition. It seems natural that
tools being developed in image processing and computer vision could be applied to
the problem of shoemark classification and identification. The use of image pro-
cessing in real world situations outside of computer science is also increasing. This
includes its use in fingerprint and DNA identification. Unfortunately due to the
nature of shoemark evidence, in particular the fact that it is often only suitable to
be used as corroborative evidence in court [13], there has been less interest in the
media and scientific community about it than other areas of forensic imaging.
7.2.1 Shoemark Collection Procedures
The examination of shoemarks in manual systems consists of three activities: collec-

tion, classification and matching. A forensic scientist, or more commonly a Scene
of Crimes Officer (SOCO), will collect the shoemark from the crime scene. This
will then be classified and analysed in a controlled laboratory environment at a later
date. The techniques used for collecting shoemarks differ according to the process
involved in creating the mark such as
• Types of Shoemarks.
• Shoemarks can be broadly split into two types, impressions and transfer/contact
prints.
• Shoemark Impressions.
Impressions occur when the shoe comes into contact with a soft malleable surface
such as soil, snow or sand. The result of the contact will be an impression showing
details of the “tread pattern” of the shoe in three dimensions. In burglaries in the UK
it is common to find this type of shoemark in soil outside the point of entry.
7.2.2 Transfer/Contact Shoemarks
Transfer/contact prints are created by the transfer of a material such as blood, mud,
paint, and water, to the sole of the shoe and then in turn to the contact surface. In
burglaries this type of shoemark is often found on windowsills where a burglar has
placed his/her foot while climbing through an open or broken window. It is also
common to find this type of shoemark on fragments of broken glass that lie just
inside the point of forced entry through a window or door. This is often how partial
shoemark images are created.
Sometimes the action of the sole of the shoe on a surface may remove something
from that surface. This is often the case when walking on dusty surfaces.
Shoemarks can be found on many different surfaces. Tables 7.1 and 7.2 show the
likelihood of shoemarks being found on various floor surfaces.
Table 7.1 Likelihood of detectable marks occurring on different two-dimensional shoe/surface

combinations [5]
Shoe with
Damp and wet blood, grease, Dry shoes with
Surface shoes oil dust or residue Clean dry shoe
Carpet Unlikely Very likely Likely Unlikely

Dirty Floor with Likely Very likely Unlikely Unlikely
accumulation of
dust, dirt or
residue.
Relatively clean but Likely Very likely Very likely Unlikely
unwaxed floor.
Clean Waxed Tile or Likely Very likely Very likely Likely
Wood Floor
Waxed Bank Counter, Likely Very likely Very likely Likely
Desk Top, etc.
Glass Very likely Very likely Very likely Likely
Kicked in Door Very likely Very likely Very likely Likely
Paper, Cardboard. Very likely Very likely Very likely Likely
Table 7.2 Detection of footwear marks after walking on various floor surfaces for five minutes [3]
Premises Floor type Area Footware mark
Household Kitchen Vinyl General surfaces Detected

Fish and Chip shop Tiles Customer area and Detected
behind counter
Sandwich Bar Tiles Customer area and Detected
behind counter
Butchers Shop Paint with sawdust Customer area Detected
covering
House Carpet Dining room Nothing detected
House Carpet Living room Nothing detected
Kitchen Tiles General surfaces Detected
Garage Concrete Oily area Detected
Office Vinyl General surfaces Nothing detected
Specialised techniques have been developed for recording or “lifting” shoemarks

found on these surfaces and these are described in detail during the remainder of
this section.
7.2.3 Photography of Shoemarks
The most common technique used when collecting shoemarks is photography. It

is usual to photograph all visible shoemarks before attempting other collection
techniques. When photographing a shoemark, a slow speed (typically ISO 100)
black and white film is used. Slower films have typically higher resolution than
faster films that may appear “grainy”. The use of high-resolution film is necessary
as the characteristics that the forensic examiner may be interested in are often so
small that they are not visible to the naked eye [7]. The use of black and white
film is often preferred because some experts believe that the additional layers of
emulsion in coloured films result in a lower contrast, less easily examined image.
This consideration is still very important even when using computerised systems.
Although the digitisation of shoemark photographs before they are entered into a
computerised system results in a loss of resolution, wherever possible the original
images will be presented as evidence in court.
When photographing a shoemark a scale is positioned adjacent to, and on the
same plane as, the shoemark. This allows accurate measurements of the shoemark
to be made in the laboratory, or allow easier verification that the shoemark was
printed at 1:1 scale.
A label identifying the impression, orientation and location is also placed within
the image frame. This process minimises the possibility that different shoemarks
photographed in the same location become confused with each other. It also helps
to provide substantiation of continuity of evidence.
The physical process of taking the photograph requires that the camera be posi-
tioned, mounted on a tripod directly over the impression. The film’s plane should be
adjusted so that it is parallel to the plane of the shoemark. This helps to minimise the
amount of perspective distortion. The frame in the camera’s viewfinder is adjusted
so that it includes the shoemark, the scale and the label and the camera focus is set
to the shoemark and not the scale as often the scale is at a slightly different focal
depth.
Wherever possible strong sources of ambient lighting are disconnected to remove
shadow. When flash is needed it is diffused or positioned at a distance from the
shoemark so that it does not result in unwanted glare.
7.2.4 Making Casts of Shoemarks
When an impression is left in soil, snow or other easily deformable material, it is

common that a cast is made using Plaster of Paris1 or Dental Stone.2 In the case
where the shoemark is left in snow, the site of the impression may require “setting”
with Snow Print Wax3 before Plaster of Paris can be used; or the cast may be made
with liquid sulphur.
1 Plaster of Paris is a mixture of powdered and heat-treated gypsum that when mixed with water
flows freely but hardens to a smooth solid as it dries.
2 Dental Stone is a plyable material used in the dental industry for making impressions of teeth
and gums.
3 SnowPrint Wax is a commercial product used to add strength to an impression left in snow.
If any lightweight debris has fallen into the impression since it was created it
should be removed with tweezers. Care should be taken not to remove any extrane-
ous matter that is part of the impression itself.
In the case where the impression has been made in loose sandy soil a fixing agent
such as hairspray may be used as a means of “binding” loose particles together prior
to casting.
The liquid casting material should be poured into the impression from a height
of only a few centimetres; this helps prevent damage to the surface of the impres-
sion. The material should be poured from a position to the side of the part of the
impression caused by the shoe arch. This area contains the least useful information
when classifying the shoemark. The liquid should be poured until the mixture over-
flows the impression and onto the ground surface. As the material starts to harden
information pertinent to the case should be scratched into the cast surface. The cast
should be left for at least 20 min to dry in warm weather, longer when it is cold. The
cast may then be lifted using a thin bladed spatula inserted into the soil well beneath
the cast.
The cast should then be left to air dry for at least 48 h before it may be cleaned
of extraneous soil by soaking the cast in a solution of potassium sulphate.
In the case of a serious crime the cast may be stored and kept as evidence. How-
ever, it is more common that the cast is photographed and the photograph be used as
evidence. A technique often used when photographing a cast is to use oblique rather
than direct lighting. This throws a slight shadow that brings certain detail in the cast
into sharp relief helping the examiners to see small details.
7.2.5 Gelatine Lifting of Shoemarks
Some shoemarks may be collected using gelatine lifting techniques. A “Gelatine

Lift” consists of an inert backing material, usually a polyester film, coated with a
low adhesive gelatine layer.
This type of capture technique can be used to collect shoemarks from a wide
variety of surfaces including porous materials such as paper and cardboard; how-
ever, the surface must be smooth and hard, such as floor coverings and table tops.
Gelatine lifts are often able to capture shoemarks from surfaces where shoemarks
are not visible to the naked eye.
The process of lifting can take up to 10 min for the mark to transfer to the gelatine
depending on the substrate and the transfer material.
7.2.6 Electrostatic Lifting of Shoemarks
Electrostatic shoemark lifting operates by collecting the dust left behind by the shoe-
mark. An electrically charged sheet is placed over the area of the shoemark and the
dust particles are attracted to the surface. When the voltage is removed the dust
remains on the sheet and may be fixed there using sticky plastic sheeting.
The power supply generates an electric field, the stronger the field the stronger
the device’s ability to attract a dust mark. The devices are often used to recover
shoemarks left on paper, linoleum, wood, carpet and concrete but cannot be used to
collect shoemarks from wet surfaces.
The image resulting from electrostatic lifting is the negative of the impression,
i.e. the contact between the shoe sole and the substrate removed something (usually
dust) and the collection process has collected what was not removed.
7.2.7 Recovery of Shoemarks from Snow
Special mention of the procedures used for collecting shoemarks left in snow is
made here. This is because although they are not commonly found within the United
Kingdom shoemark impressions left in snow are usually of very high quality, and
are commonly found elsewhere in Europe. They show fine, clear and distinctive
detail.
The process for collecting shoemarks left in snow principally follows the proce-
dures already given for taking photographs or making casts of shoemarks. However,
a number of common problems can occur when trying to make a cast in snow using
the normal casting techniques. The most common of these is that the casting material
often freezes before it flows into every feature of the impression. Another problem
is that the walls of the impression are prone to collapsing, depositing snow into the
bottom of the impression and obscuring detail. Photography also has drawbacks, the
resulting images lacking contrast.
Spraying the impression with a coloured aerosol from a low angle will reveal
more detail in the resulting image when the impression is photographed. This tech-
nique produces an effect similar to using oblique lighting while photographing shoe-
marks. Spraying the impression with Snowprint Wax will seal the impression with
a thin wax coating that greatly increases its strength prior to making a cast. When
using Snowprint Wax care must be taken to prevent the pressure from the propellant
from damaging the impression.
If a cast has to be made using Plaster of Paris the addition of potassium sulphate
to the mixture lowers its freezing point. This prevents ice crystals from forming and
increases the setting time so that the liquid has time to flow before it hardens. In
some circumstances liquid sulphur may be used as the casting material as it exhibits
similar properties.
In the same way as described earlier with other types of casts it is common to
take photographs of shoemarks impression left in snow. As is the case with other
types of cast it is also common to use oblique lighting to highlight relief in casts
made from shoemarks in snow.
7.2.8 Recovery of Shoemarks using Perfect Shoemark Scan
A simple technique for capturing shoemarks directly from a shoe sole is to use
a commercial product called Perfect Shoemark Scan. The Perfect Shoemark Scan
equipment consists of a chemical drenched sponge and paper that is reactive to that
chemical.
The shoe is pressed into the sponge ensuring that the toe and heel are coated, this
may require that the shoe is rocked end to end, and then pressed on to the reactive
paper. After a few minutes an image of the shoemark will develop on the paper.
This technique is comparatively cheap and simple but is only useful if the actual
physical shoe is available.
7.2.9 Making a Cast of a Shoemark Directly from a Suspect’s Shoe
If the forensic scientist is comparing a photograph of an impression from a crime

scene to a suspect’s shoe sole, it is usual for a cast of the suspect’s shoe to be made.
This new cast can then be lit in the same manner as the recovered cast and pho-
tographed. When making a cast from a suspect’s shoe several techniques may be
used to create the impression. The shoe may be depressed into Birofoam4 ; the foam
records the indentations and ridges of the shoe sole. Zetalabor5 may also be used. It
is warmed by kneading in the palm of the hand; a hardening gel is added before the
shoe is pressed firmly into the substance. When the substance dries a cast may be
made. Vinylpolysiloxane and Microsil6 may also be used in a similar way [6].
7.2.10 Processing of Shoemarks

There are a number of possible objectives that an investigator tries to achieve when
he/she processes a shoemark. These include
• Determining if a particular shoemark was made by a particular shoe (in particular

matching the shoemark against a suspect’s shoe).
• Matching the shoemark with other shoemark impressions, possibly from other
crime scenes.
• Determining the make and model of a shoe that made the mark.
• Classifying the shoemark for archival and possible matching later (this job is
often performed by the SOCO).
The first task in this list can only be performed by a qualified forensic scientist.
Similar to how fingerprints are processed, a number of points of comparison must
be made between the shoe sole and the shoemark. These usually include accidental
characteristics, that is, characteristics of the sole that are caused by wear and tear.
4 Birofoam is a foam material similar to floral foam.

5 Zetalabor is a silicone substance used in the dental industry
6 Vinylpolysiloxane and Microsil are two other silicone dental products used in the creation of
dental impressions.
Unless the investigators have a suspect in custody the shoemark will be checked
against the database and a small number of the most likely matches selected. These
preliminary matches will then be passed to the forensic scientist for analysis and
hopefully a close match at this point will provide a suspect.
The process involved for 2, 3 and 4 each require that the shoemark is classified
and the classification is used to search the database. As such, these processes are all
subject to the problems associated with inconsistent classification.
The procedure used for classification and identification differs slightly in
detail between laboratories but generally follows the following sequence of
events:
• The shoemark is inspected visually and analytically measured.

• The descriptors that best match the pattern are selected and recorded.
Sometimes the shoemark is split into three sections, the heel, arch and toe. The
heel and toe may be classified separately while the arch may be ignored as it rarely
contains any useful information.
When comparing an unknown shoemark with a suspect’s shoe the procedure used
will be similar to the following. For manual shoemark recognition the examiner
needs
• The shoemark (the original mark, a 1:1 scale photograph or a cast) from the crime
scene.
• The suspect’s shoe or a second shoemark from either a photograph or cast.
When this information is available the examiner proceeds with the following
sequence:
• Compare the images/casts looking for dissimilarities.

• Take measurements.
• Compare overall wear.
• Mark any characteristics (place, size and form).
• Compare those characteristics (place, size and form).
• Check back to the shoe outsole to determine if the characteristics are class or
individual. That is are they accidental characteristics or part of the original pat-
tern.
• Draw conclusions, is this the suspect’s shoe or not?
The success of the last three items in the first list is dependent on how accurately
the shoemark has been classified by the SOCO.
7.3 Typical Methods for Shoemark Recognition 157
7.2.11 Entering Data into a Computerised System

Many forensic laboratories have migrated their paper shoemark storage systems to
computerised ones. It is important that when shoemark images are stored on a com-
puter, the amount of information lost during the digitisation should be minimal.
When the images are being processed certain artefacts may be introduced. For exam-
ple if images are stored in a compressed format such as JPEG, it must be ensured
that the compression applied does not,
• Introduce artefacts that affect the classification/identification of the shoemark.
• Prohibit the use of the image under local law.
For the evidence to be admissible in court any artefact must be predictable and
explainable. This, however, is not always possible. For this reason a lossless high-
resolution copy of the image should be kept in the database. A lower resolution
working copy is used by the image processing algorithms in the application. When
an image is to be used as evidence the original high-quality version is used.
7.3 Typical Methods for Shoemark Recognition
There are two main methods of classifying shoemarks. The first was already men-
tioned and relies on identifying features of the shoemark and labelling them from
a set of predefined descriptors. A common scheme of this type is the Birkett
Component Descriptors
A PLAIN/RE-HEEL
B RANDOM/IRREGULAR
C LATTICE/NETWORK
D STRAIGHT
E CURVED/WAVY
F ANGLED/ZIG-ZAGGED
G CIRCULAR - forming basic pattern or a section thereof
H CIRCULAR – interspersed with other components
I ANY OTHER SHAPE
J GEOMETRIC – with three to six straight sides
L ANY OTHER SHAPE
M LETTERS/NUMERALS – as part of a name or number
N “TARGETS” – concentric circles, ovals etc or part thereof
Q COMPLEX/DIFFICULT
R Same descriptor applies to different principal components.
Fig. 7.4 Shows the key features of two shoemarks and in Fig. 7.5 the relative importance of each
section the classification of the mark are indicated. Under each shoemark the Birkett classification
is given
Fig. 7.5 Indicates the key features of two shoemarks. The classification of these features is shown
in Fig. 7.6. This diagram is reproduced from the “Scottish Police Detective Training” manual
system. A second way of classifying shoemarks is to look for and record accidental
characteristics. An example of each is shown in Figs. 7.4 and 7.5.
7.3.1 Feature-Based Classification

Many of the current shoemark databases are based on the Birkett system of shoe-
mark feature classification. Figure 7.6 contains a list of the descriptors used in that
classification schema.
7.3 Typical Methods for Shoemark Recognition 159
Fig. 7.6 Shows how the shoemark may be divided into four sections for classification. Each
section has a classification priority indicated. The hand written letters below each shoemark show
the Birkett classification. This diagram is reproduced from the “Scottish Police Detective Training”
manual
7.3.2 Classification Based on Accidental Characteristics
In Holland the forensic scientists at the Netherlands Forensic Institute have been
using a classification system which includes descriptors representing “accidental
characteristics”, i.e. characteristics caused by damage and wear to the shoe sole.
This type of characteristic is very important when trying to identify a particular
instance of a shoe sole. This is because it is only the accidental characteristics that
differ between shoe sole instances, i.e. all shoe sole leave the factory more or less
identical and it is the random damage that occurs during normal wear that results
in differing patterns. The shoemarks are still classified manually and stored by clas-
sification in a computer database. When searching the database accidental charac-
teristics identified on the shoemark are used as well as the standard classification
patterns.
7.4 Review of Shoemark Classfication Systems
This section reviews some of the shoemark classification/retrieval systems, includ-

ing both semi-automatic systems, such as SHOE-FIT, SHOE©, Alexandre’s Sys-
tem, REBEZO, TREADMARKTM , and SICAR, and also some automated ones, like
SmART, deChazal’s System and Zhang’s System. It should be noted that some of
the semi-automatic systems have been used in real forensic investigation by various
agents, and some like TREADMARKTM , and SICAR even have already been suc-
cessfully commercialised. However, so far, all of the automated systems appearing
in the literature over the last few years are still in their initial stages, i.e. without any
real application examples in forensic investigations.
7.4.1 SHOE-FIT
SHOE-FIT is one of the earliest computerised shoemark databases, developed by

Sawyer and Monckton [15]. It is based on another existing system developed by
Birkett, which codes the shoeprint patterns with a number of letters, followed by a
numerical sequence. SHOE-FIT prefixes the coded letter with 2 numerical digits for
the year and suffixes it with a 3-digit number to uniquely identify a shoemark. A
typical code is as follows: 94FNM011, which means that the footwear is from 1994,
and has a zigzag (F) target (N) and letters or numbers (M). SHOE-FIT also concerns
itself with transferring a footwear impression to a PC in terms of various of forms,
such as faxing, scanning and photographing, and with combining a number of tools
for image handling, like format conversion, rotation, resizing, masking and so on.
Apart from these, the authors also identify the consistency and the compatibility
of any coding system of shoemarks as important.
7.4.2 SHOE©
SHOE© is a shoeprint capturing, recording, retrieving system developed by Victo-

ria Forensic Science Centre [4]. The system comprises two parts: SHOEView, and
7.4 Review of Shoemark Classfication Systems 161
SHOEAdmin, and has 4,000 shoemarks in its database. SHOE© codes a shoemark
based on a manual identification and recording of patterns found in that shoemark.
Some of the popular patterns are categorised into different groups, and can be dis-
played on the screen for reference when one records and searches a shoemark. One
of the attractive points for this system is that it combines the position information
into the processes of recording and retrieving by dividing a shoemark into four parts:
Toe, Ball, Instep, and Heel. This can increase the accuracy of the searching process.
Another advantage is that each of these partitions can be separately classified and
searched against independently, so it is possible to search for images that share
only characteristics seen in any combination of the four partitions. In this way their
system is able to search for partial shoemarks.
7.4.3 Alexandre’s System
In his paper, Alexandre [2] has described a shoemark classification system devel-
oped by the police department of Neuchatel. In this system, the SOCO (Scene of
Crime Officer) footwear manual was selected as the basic coding reference. The
system extends this reference with more letters, denoting groups, and some num-
bers, denoting subgroups, following the letters. Similar to SHOE©, this system
also divides a shoeprint into different partitions, sole, instep, and heel, so it can
search a partial shoemark. In total, the classification system has 12 groups and
40 subgroups, and it contains 12,000 shoemarks. However, the process of cod-
ing is still carried out manually, and a professional officer is needed to record a
shoemark.
7.4.4 REBEZO
REBEZO was designed by Geradts et al. in the National Forensic Science Labora-
tory of the Ministry of Justice in the Netherlands with the cooperation of the Dutch
Police. Similar to the systems described above, shoemarks in this system are also
classified using a set of pattern descriptors that the investigator selects from. One of
the problems with this system, like that of other manual systems, is the inconsistent
classification, which motivated Geradts et al. to develop an automatic classification
approach using Fourier analysis and a neural network system [10]. It thresholds
a shoemark first, and then applies some morphological techniques to segment the
patterns of the image, before the Fourier descriptors and the moments of each pat-
tern are imported into a neural network system for classification. However, their
experimental results [9] suggested that this attempt was not able to give a sound
classification because of the unreliable segmentation, caused by noise and artefacts
in a shoemark.
7.4.5 TREADMARK TM
TREADMARKTM is a shoemark analysis and identification system developed by

CSI Equipment Ltd. The manufacturer of this system claimed it as the only system
so far available today which utilises all four parameters of Patterns, Size, Damage
and Wear to identify individual footwear impressions, and compare them automat-
ically with impressions from both a suspect’s database and a SoC database. Here,
the automation refers only to the progress of matching and searching of a database.
Actually, it still needs users to manually code shoemarks by patterns or other charac-
teristics. One point different from other systems is that TREADMARKTM requires
that the user indicates the position of accidental or random characteristics on the
shoe sole. It records the positions of these characteristics and uses them to search
for other shoemarks with accidental characteristics in similar positions. More details
about TREADMARKTM are available at the website of CSI Equipment Ltd. (2006,
http://www.k9sceneofcrime.co.uk/systems.aspx).
7.4.6 SICAR
SICAR is one of the most successful commercial systems for shoemark archiv-
ing and classification/retrieval, developed by Foster and Freeman Ltd, London, UK.
It has been widely used in British police forces and forensic laboratories in the
UK. The most recent version is SICAR 6, which is claimed to be able to archive
shoemarks from both suspects and SoC. Combined with SoleMate – a reference
database of shoemarks from shoemakers developed by the same company, SICAR
can be used to identify the information of a scene image, such as the manufac-
turers, the release date, and so on. Like other semi-automatic systems, this sys-
tem requires an operator to classify the shoemark by assigning codes to individual
features in the shoemark. The classification is then stored in the database and can
be searched again later. SICAR adopts a simple coding technique to characterise
shoeprints which forms the basis of many of the database search and match oper-
ations. The process enables the operator to create a coded description of the pat-
tern of a shoe sole by identifying elemental pattern features such as lines, waves,
zigzags, circles, diamonds and blocks, etc., each of which bears a unique code. Like
SHOE©, this is a straightforward selection process, as each type of elemental pat-
tern is displayed, with variants, for the operator to choose from. SICAR has also
been extended to other databases like tyre tread, (Foster and Freeman Ltd., 2008,
http://www.fosterfreeman.co.uk/sicar.html).
7.4.7 SmART
Although the Geradts’ attempt to develop an automated shoemark classification sys-

tem was not successful, the work in this area continued. Alexander et al. in their
References 163
paper [1] propose a fully automated shoemark classification system. As the first
automatic shoemark classification system, SmART can automatically search against
a database of shoeprints. The authors apply fractal codes to represent a shoeprint,
and a mean square noise error method is used for determining the match results. The
algorithm has been tested on a database of shoemarks, containing 32 marks.
7.4.8 De Chazal’s System

DeChazal et al. in their paper [8] describe another fully automated shoemark clas-
sification system, which can automatically search against a database of shoemarks
based on the outsole patterns in the Fourier domain. The algorithm proposed in
this paper utilises the Power Spectral Density (PSD) function of a shoemark as
the pattern descriptor. For orientation invariance consideration, a set of PSDs are
computed for different rotations of the query shoemark. Meanwhile, low- and high-
frequency components are removed to reduce the grey-level variance and keep the
unique information of a shoemark. DeChazal et al. have tested their approach on
a database of shoemarks, containing more than 1,000 marks from Forensic Sci-
ence Laboratory, Garda Headquarters, Dublin, Ireland. They also generated a partial
shoemark database to test the robustness of their approach. A reported 67% accu-
racy, for when the first ranked mark is in the same category of the query mark, has
been obtained on their database.
7.4.9 Zhang’s System
In their paper [16], Zhang et al. have proposed to represent a shoemark based on sole
pattern edge information. The demo system developed by Zhang et al. can automati-
cally archive and retrieve a shoemark in terms of an edge direction histogram, which
is claimed to be inherent scale, translation and rotation invariance. The system pro-
vides a user with a short ranked list of best-matched shoemarks in response to a
query impression. The retrieval test of their approach is performed on a database of
512 full shoemarks. The authors also generate some degraded images like rotated,
scaled, and Gaussian noisy images, to evaluate the robustness of their approach to
different degradations. They reported that the accuracy of 85.4% can be obtained on
their databases.
References
1. A. G. Alexander, A. Bouridane and D. Crookes, “Automatic classification and recognition of
shoeprints,” Special Issue of the Information Bulletin for Shoeprint/Toolmark Examiners, vol.
6. no. 1, pp. 91–104, 2000.
2. G. Alexandre, “Computerized classification of the shoeprints of burglar’s soles.” Forensic
Science International, vol. 82, pp. 59–65, 1996.
3. R. Ashe, R. M. E. Griffin and B. Bradford, “The enhancement of latent footwear marks present
as grease or oil residues on plastic bags,” Science and Justice, vol. 40 no. 3, pp. 183–187, 2000.
4. W. Ashley, “What shoe was that the use of computerised image database to assist in identifi-
cation,” Forensic Science International, vol. 82, pp. 67–79, 1996.
5. W. J. Bodziak, “Footwear Impression Evidence,” New York: Elsevier, 1990.
6. W. J. Bodziak, “Footwear Impression Evidence, Detection, Recovery, and Examination,” 2nd
Edition. CRC Press, 2000, ISBN: 0-8493-1045-8.
7. Davis, M. (1998) Details on shoeprint photography provided in communication by email with
Davis, M., MSSC Regional Crime Lab, Joplin MO, US., Newton County Sheriff’s Depart-
ment, Neosho MO.
8. P. D. De Chazal, J. Flynn and R. B. Reilly, “Automated processing of shoeprint images based
on the Fourier transform for use in forensic science,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 27, no. 3, pp. 341–350, 2005.
9. Z. Geradts, “Content-Based Information Retrieval from Forensic Image Databases,” PhD The-
sis, The Netherlands Forensic Institute of the Ministry of Justice in Rijswijk, The Netherlands,
2002.
10. Z. Geradts and J. Keijzer, “The image-database REBEZO for shoeprints with developments
on automatic classification of shoe outsole designs,” Forensic Science International, vol. 82,
pp. 21–31, 1996.
11. A. Girod, “Computerised classification of the shoeprints of burglars’ shoes,” Forensic Science
International, vol. 82 pp. 59–65, 1996.
12. H. Majamaa, “Survey of the conclusions drawn of similar foorwear cases in various crime
laboratories,” Forensic Science International, vol. 82, pp. 109–120, 1996.
13. R. Milne, “Operation Bigfoot – a volume crime database project,” Science and Justice, vol.
41, no. 3, pp. 215–217, 2001.
14. T. J. Napier, “Scene linking using footwear mark databases,” Science and Justice, vol. 42, no.
1, pp. 39–43, 2002.
15. N. E. Sawyer and C. W. Monckton, “SHOE-FIT a Computerised Shoe Print Image Database.”
IEE European Convention and Security and Detection, Brighton, UK, pp. 86–89, 1995.
16. L. Zhang and N. M. Allinson, “Automatic Shoeprint Retrieval System for use in Forensic
Investigations,” 5th Annual UK Workshop on Computational Intelligence, 2005.
Chapter 8
Techniques for Automatic Shoeprint
Classification
8.1 Current Approaches
Research in automatic shoeprint classification has been reported a little over a

decade. Some of the techniques reported have focused on representing the small
shape components which make up a pattern, whereas others extract features from
the shoeprint without any subdivision. Similarly, some techniques are deployed in
the spatial domain, while others operate in the transform domain, or a combination
of the two. Early research used techniques such as Fourier descriptors to model the
pattern components, or fractals to model the complete shoe pattern [1–3].
More recent work by a group at University College Dublin (Ireland) [4] has used
the Power Spectral Density (PSD) coefficients of the image, which are calculated
using the Fourier Transform and used as features. Rotation invariance is achieved
using a “brute force” approach in which the query image is rotated in 1◦ steps. The
group at Sheffield University (UK) uses the technique of matching Edge Directional
Histograms (EDH) [5]. The authors describe the interior shape information of a
shoeprint image in terms of its significant edges, and use a histogram of the edge
direction as the signature of a shoeprint image. This method first extracts the edges
using a Canny edge detector. To obtain rotation invariance, they compute the 1D
Fast Fourier Transform (FFT) of the normalised edge direction histogram and take
it as the final signature of the shoeprint image. However, tests on partial shoeprints
using this technique have not been reported.
In this chapter, methods for automatically classifying shoeprints for use in
forensic science are presented. In particular, we propose two correlation-based
approaches to classify low-quality shoeprints: (i) Phase-Only Correlation (POC)
which can be considered as a matched filter and (ii) Advanced Correlation Fil-
ters (ACFs). These techniques offer two primary advantages: the ability to match
low-quality shoeprints and translation invariance. Experiments were conducted on
a database of images of 100 different shoes available on the market. For the
experimental evaluation, challenging test images including partial shoeprints with
different distortions (such as noise addition, blurring and in-plane rotation) were
generated. Results have shown that the proposed correlation-based methods are very
practical and provide high performance when processing low-quality partial-prints.
Technology, DOI 10.1007/978-0-387-09532-5 8,

166 8 Techniques for Automatic Shoeprint Classification
8.2 Using Phase-Only Correlation

In this section, a phase-based method, which has been developed by the team
at Queen’s University Belfast, is presented. The technique uses the POC tech-
nique for shoeprint matching. The main advantage of this method is its capa-
bility to match low-quality shoeprint images accurately and efficiently. In order
to achieve superior performance, the use of a spectral weighting function is also
proposed.
The use of the phase information was mainly motivated by the fact that, in the
Fourier domain, the phase is much more important than the magnitude in preserving
the features of image patterns, as proved by Oppenheim and Lim [6]. A simple
illustration for shoeprint images is given in Fig. 8.1.
Fig. 8.1 (a) Original

shoeprint image A. (b)
Original shoeprint image B.
(c) Image synthesised from
the Fourier transform phase
of image B and the magnitude
of image A. (d) Image
synthesised from the Fourier
transform phase of image A
and the magnitude of image B
(a) (b)
(c) (d)
8.2.1 The POC Function
Consider two images g1 (x,y) and g2 (x,y). The Fourier transform of g1 and g2 are
G1 (u,v)=A(u,v)ejφ(u,v) and G2 (u,v)=B(u,v)ejθ(u,v) where A(u,v) and B(u,v) are ampli-
tude spectral functions while φ(u,v) and θ (u,v) are phase spectral functions, respec-
tively. The POC function qg1 g2 (x, y) of the two images g1 and g2 is defined as
8.2 Using Phase-Only Correlation 167
)
−1 G (u, v)G 2 ∗ (u, v)
qg1 g2 (x, y) = F 1 (8.1)
G 1 (u, v)G 2 ∗ (u, v)

= F −1 e j(φ(u,v)−θ(u,v)) (8.2)
where F-1 denotes the inverse Fourier transform and G2 ∗ is the complex conjugate of
G2 . The term Q g1 g2 (u, v) = e j(φ(u,v)−θ(u,v)) is termed cross-phase spectrum between
g1 and g2 [7].
If the two images g1 and g2 are identical, their POC function will be a Dirac δ-
function centred at the origin and having the peak value 1. When matching similar
images, the POC approach produces a sharper correlation peak compared to the
conventional correlation as shown in Fig. 8.2.
(a) (b)
0.2
0.15
2.38
0.1
2.375
0.05
0 2.37
-0.05
N/2
N/2 N/2
0 N/2
0
0
0
-N/2 -N/2
-N/2 -N/2
(c)
(d)
Fig. 8.2 (a) Original shoeprint image A. (b) Noisy partial shoeprint B generated from A. (c)
Phase-only correlation (POC) between A and B. (d) Conventional correlation between A and B
8.2.2 Translation and Brightness Properties of the POC Function

Consider an image g3 that differs from g2 by a displacement (x0 , y0 ) and a brightness
scale a>0. Then, g3 and g2 will be related by
g3 (x, y) = ag2 (x − x0 , y − y0 ) (8.3)
In the frequency domain, this will appear as a phase shift and a magnitude
scaling:
G 3 (u, v) = ae− j2π(x0 u+y0 v) G 2 (u, v) (8.4)
According to (8.1), (8.2) and (8.4), the POC function between g1 and g3 is given
by

qg1g3 (x, y) = F −1 e− j2π(x0 u+y0 v) e j(φ(u,v)−θ(u,v)) (8.5)
= qg1g2 (x − x0 , y − y0 ) (8.6)
Equation (8.6) shows that the POC function between g1 and g3 is only a translated
version of the POC function between g1 and g2 . The two POC functions have the
same peak value which is invariant to translation and brightness change.
8.2.3 The Proposed Phase-Based Method
The proposed method uses the POC approach combined with a spectral weighting
function.
8.2.3.1 Spectral Weighting Function

Spectral weighting functions have already been used with the POC technique in
image registration to enhance the registration accuracy [7]. In this work, we pro-
pose to use a band-pass-type spectral weighting function (Fig. 8.3) to improve the
recognition rate by eliminating high frequency components which have low reliabil-
ity, but without significantly decreasing the correlation peak sharpness as very low
frequency components will also be eliminated. The proposed weighting function
W(u,v) has the same shape as the spectrum of a Laplace of Gaussian (LoG) function
and is given by
−u
2 +v2
u 2 + v2 2β 2
W (u, v) = e (8.7)
α
where β is a parameter which controls the function width and α is used for nor-
malisation purposes. Thus, the modified phase-only correlation (MPOC) function
q̃g1 g2 (x, y) of images g1 and g2 is given by
1
0.9
0.9
0.8
0.8
0.7 0.7
0.6
0.6
0.5
0.4 0.5
0.3
0.4
0.2
0.1
0.3
0
N/2 0.2
N/2
0.1
0
0
0
-N/2 0 N/2
-N/2 -N/2
(a) (b)
Fig. 8.3 The proposed band-pass-type spectral weighting function with β=50. (a) 3D representa-
tion. (b) 2D representation
)
G 1 (u, v)G 2 ∗ (u, v)
q̃g1 g2 (x, y) = F −1 W (u, v) (8.8)
G 1 (u, v)G 2 ∗ (u, v)
The peak value of the MPOC function q̃g1 g2 (x, y) is also invariant to translation
and brightness change.
8.2.3.2 Shoeprints Matching Algorithm

A schematic of the proposed shoeprint matching algorithm is shown in Fig 8.4.
In response to an unknown shoeprint image gi , the algorithm matches gi to each
database image gn (n=1 . . . M where M is the size of the database) and determines
the corresponding matching score. The matching algorithm consists of the following
steps:
g1 G1
FFT G1/ |G1|
e jφ ~
Qg1g2 Qg1g2 q~g1g2
Database image Matching
IFFT Peak value
score
g2 G2
FFT * * e –jθ W
G2/ |G2|
Weighting
Input image
Fig. 8.4 Diagram of the proposed matching algorithm

1. Calculate the Fourier transform of gi and gn using the FFT to obtain Gi and Gn .
2. Extract the phases of Gi and Gn and calculate the cross-phase spectrum Q gn gi .
3. Calculate the modified cross-phase spectrum Q̃ gn gi by modifying Q gn gi using the
spectral weighting function W.
4. Calculate the inverse Fourier transform of Q̃ gn gi using the Inverse FFT (IFFT) to
obtain the MPOC function q̃gn gi .
5. Determine the maximum value of q̃gn gi . This value will be considered as the
matching score between images gi and gn .
The use of the band-pass-type weighting function W (defined in Eq. (8.7)) will
eliminate meaningless high frequency components without significantly affecting
the sharpness of the correlation peak (since very low frequency components will be
also attenuated).
In this work, the peak value of the MPOC function has been considered as the
similarity measure for image matching: if two images are similar, their MPOC func-
tion will give a distinct sharp peak, if they are dissimilar, then the peak drops signif-
icantly.
After matching the input image gi against all database images, using the algo-
rithm described above, the resulting matching scores are used to produce a list of l
shoeprints (l<<M) from the database, ranked from the best match (with the highest
matching score) to the worst (the lowest matching score). This list can be reviewed
later by a forensic scientist to determine the correct match visually.
8.2.4 Experimental Results
The algorithm was extensively assessed using a database containing 100 complete
shoeprint images (256 grey scale images of size 512×512), provided by Foster &
Freeman Ltd [http://www.fosterfreeman.co.uk]. To evaluate the robustness of the
method to different alterations, 52 test images (all of them were partial prints) were
generated from each complete shoeprint image, giving a total of 5200 test images.
We considered only partial shoeprints, since this is the most challenging problem in
shoeprint classification. Figure 8.5 shows some example of noisy shoeprint images
generated.
The generated test images were grouped into four main sets:
Set1- contains 400 clean partial shoeprint images obtained by dividing each orig-
inal complete shoeprint (from the original database) into four quarters: (i) left toes
and midsole, (ii) right toes and midsole, (iii) left heel and (iv) right heel.
Set2- contains 1600 noisy partial shoeprint images obtained by adding a white
Gaussian noise (with zero mean and standard deviations σ = 20, 40, 60 and 80) to
each partial shoeprint image from set1 (using the MATLAB function imnoise).
Set3- contains 1600 blurred partial shoeprint images obtained by blurring each
partial shoeprint image from set1. We considered a motion blur of length L (L = 10,
20, 30, 40 pixels) and angle θ = 90◦ (vertical blur) to simulate shoeprint blurring
(a) (b) (c)

Fig. 8.5 Examples of generated test images and their original shoeprint. (a) Original shoeprint
image. (b) Noisy partial-print (σ = 80). (c) Blurred partial-print (L = 40)
caused by foot slippage in the real world. The MATLAB functions fspecial and
imfilter were used to generate the blurred images.
Set4- contains 1600 rotated partial shoeprint images obtained by digitally
rotating each partial shoeprint image from set1 by an angle θ (θ = 2.5◦ , 5◦ ,
7.5◦ , 10◦ ).
During the evaluation process, each test image was used as input to the algo-
rithm and matched against all 100 original images and the rank of the correct match
determined. This process was performed 5200 times. Then, for each type of pertur-
bations, the proportion of times during tests a correct match appeared first (first rank
recognition) is determined.
In order to compare the method to the PSD-based algorithm [4], since the
database used in [4] was not available, the PSD-based algorithm was implemented
and tested using the same procedure as the proposed method. The results obtained
are depicted in Table 8.1. MPOC and POC denote the POC algorithms with and
without the spectral weighting function, respectively. The parameters of the weight-
ing function used during the tests are β = 10, 20, 30, 40, 50 and 60, with α =
4πβ 4 (to normalise the maximum of the MPOC function to 1, when matching two
identical images). Only results corresponding to β = 50 (the best value) are shown
in Table 8.1.
From these results, it can be seen that the phase-based algorithms (POC and
MPOC) outperform the PSD method even without the use of the spectral weighting
function. It can be also observed that the PSD-based algorithm is very sensitive to
blurring and rotation. For the phase-based approaches, the use of a weighting func-
tion (MPOC algorithm) introduced clear improvements in the recognition rate for
blurred and rotated partial-prints without affecting the performance of the method
when processing clean or noisy images. The best results were obtained for a weight-
ing function with β = 50, where 100% of the time a correct match was ranked first
for clean, noisy and blurred images.
However, the main disadvantage encountered so far with the POC-based method
is that it is not rotation invariant. Methods of addressing this issue may include
Table 8.1 First rank recognition rates (%) using PSD- [4], POC- and MPOC-based algorithms
Algorithms
PSD [4] POC MPOC
β=50
Test images
Clean partial prints 96.25 100 100
σ=20 95.75 100 100
Noisy par- σ=40 93.5 100 100
tial prints σ=60 88.5 100 100
σ=80 76.5 100 100
L=10 28.5 100 100
Blurred
L=20 11 100 100
partial
L=30 13 100 100
prints
L=40 13 97.75 100
θ=2.5° 60.25 96.75 98.75
Rotated θ =5° 35.5 43.25 52.75
partial
θ =7.5° 25.5 27.5 29.75
prints
θ=10° 19 15.75 21.25
the “brute force” approach. In this case, one MPOC function is required for each
possible orientation of a shoeprint, making the use of the method computationally
demanding. Another solution consists in using “advanced correlation filters” as pre-
sented in the following section.
8.3 Deployment of ACFs
The ACFs, also called composite filters, have been mostly used for Automatic Tar-
get Recognition (ATR) applications [8] and have been successfully applied in recent
years to other applications such as biometrics [9, 10], document analysis [11] and
road sign detection [12]. The success of these filters is due to their attractive prop-
erties: shift invariance, robustness to noise, distortion-tolerance (e.g. in-plane and
out-of-plane rotations, Scale changes and illumination variations) and the ability to
detect and classify partial objects. To achieve the distortion-tolerance of a correla-
tion filter, most of the design techniques that have been proposed in the literature
are based on the use of a set of training images which represent the expected dis-
tortions. Examples of ACFs’ designs include the Minimum Average Correlation
Energy (MACE) filter [13], the Maximum Average Correlation Height (MACH)
filter [14], the Optimal Trade-off Synthetic Discriminant Function (OTSDF) filter
[15], the Distance Classifier Correlation Filter (DCCF) [16] and the Polynomial
DCCF (PDCCF) [17].
An evaluation of the performance of advanced correlation filters for automatic
shoeprint classification is discussed. In particular, the OTSDF filter and the Uncon-
strained OTSDF (UOTSDF) filter are used for the classification of low-quality par-
tial shoeprints.
8.3 Deployment of ACFs 173
8.3.1 Shoeprint Classification Using ACFs

A block diagram illustrating the correlation process for shoeprint classification using
ACFs in general is given in Fig. 8.6.
During the design phase, and for each reference shoeprint image gm (x,y), a set of
training images that best represent the anticipated set of distortions is constructed
and used to synthesise a correlation filter Hm (u,v), where u and v denote the spatial
frequencies. Thus, for each class m (containing the generated training images from
gm (x,y)) a correlation filter Hm (u,v) is designed in the frequency domain using the
Fourier Transform of the generated training images. This training is carried out off-
line and the resulting correlation filters are stored and used later to classify unknown
shoeprints.
During the on-line stage, when an input shoeprint image gn (x,y) (an unknown
shoeprint) is presented to the system, its 2D Fourier Transform Gn (u,v) is computed
and multiplied by each synthesised correlation filter Hm (u,v). Then, the inverse 2D
Fourier Transform of the resulting product is taken to obtain the correlation output
cm (x,y), i.e.
cm (x, y) = F −1 {G n (u, v)Hm ∗ (u, v)} (8.9)
where F–1 denotes the inverse Fourier transform and Hm ∗ is the complex conjugate
of Hm .
Training images from class m
•• •
Filter design
Correlation Filter hm
0.45
0.4
0.35
0.3
0.25
0.2
FFT IFFT 0.15

0.1
0.05
0
N/2
N/2
0
0
Input Image –N/2 –N/2
Correlation output
Fig. 8.6 Block diagram of correlation process

(a) (b) (c)
1 1
0 0
N/2 N/2
N/2 N/2
0 0
0 0
–N/2 –N/2 –N/2 –N/2
(d) (e)
Fig. 8.7 (a) Shoeprint image A. (b) Shoeprint image B. (c) Noisy rotated partial-print C generated
from A. (d) Correlation between C and a correlation filter designed using image A and its rotated
versions. (e) Correlation between C and a correlation filter designed using image B and its rotated
versions
The correlation output cm (x,y) is searched for the largest value (correlation peak)
and the height of the peak, as well as other metrics such as Peak-to-Correlation
Energy (PCE) or Peak-to-Sidelobe Ratio (PSR), are computed and used as the
matching score related to the class m.
For a well-designed correlation filter, it is expected that its cross correlation with
an input test image will produce a distinct sharp peak if the input image is similar to
the training images used to synthesise the filter (as shown in Fig. 8.7). Furthermore,
if the test image is translated with respect to the training images, the correlation
peak will be also translated by the same amount. Of course, there will be no large
distinct peaks in the correlation output, if the input image and the training ones are
dissimilar.
After cross correlating the input image with all stored filters, as described above,
the resulting matching scores are used to produce a list of l shoeprints from the ref-
erence database (where l<<M and M is the size of the reference database), ranked
from the best match (with the highest matching score) to the worst (the lowest
matching score). This output list of candidates can be reviewed later by a foren-
sic expert to determine the final match visually.
8.3.2 Matching Metrics

As mentioned in the previous section, after cross correlating an input image with
a stored filter, different metrics can be used as matching scores. In this work, we
compare the use of three different metrics: Peak Height (PH), PCE and PSR.
For an output correlation c(x,y), the PCE is defined by
P H2 |c(x0 , y0 )|2
PC E = = (8.10)
Energy |c(x, y)|2
x y
where (x0 ,y0 ) and PH indicate the peak location and the largest value in the correla-
tion output, respectively.
P H = max {|c(x, y)|} = |c(x0 , y0 )| (8.11)

x,y
The PSR is defined as
P H − μ A,B
P S R A,B = (8.12)
σ A,B
where μA,B and σ A,B are the mean and standard deviation, respectively, which are
computed in the sidelobe area: an annular region around the peak (as shown in
Fig. 8.8).
PH is the most widely used metric mainly due to its computation simplicity.
However, it will change if the illumination in the input image changes. The PCE
and the PSR parameters measure the sharpness of the correlation peak. They are
expected to give better classification performance than the PH, since they are com-
puted using multiple points from the correlation output. Further, unlike the PH met-
ric, they are insensitive to uniform brightness changes (uniform amplification or
attenuation) of the input image.
B
Sidelobe Area
Fig. 8.8 Illustration of the A
sidelobe area used for Peak Peak Area
Peak-to-Sidelobe Ratio (PSR)
computation. A is the inner
width of the sidelobe region
and B is its outer width
8.3.3 Optimum Trade-Off Synthetic Discriminant Function Filter

In this section we briefly describe the design of the first filter we used for shoeprint
classification.
The OTSDF filter [15] optimally trades off between noise tolerance and sharp-
ness of the correlation peak while ensuring that the correlation outputs at the origin
(due to training images) take pre-specified values. This is performed by finding a
compromise between minimising the Average Correlation Energy (ACE) for peak
sharpness and the Output Noise Variance (ONV) which controls noise tolerance.
The ACE and the ONV are defined in Eqs. (8.15) and (8.16), respectively.
In the following analysis, all vectors and matrices are projected in the frequency
domain unless specified otherwise. Consider N training shoeprint images, each of
size d1 × d2 , corresponding to a class m. The goal is to construct an OTSDF fil-
ter Hm (u,v) in the frequency domain using these N images. Let S1 (u,v), S2 (u,v),. . .
SN (u,v), represent the 2D Fourier transforms of the training images. Let d be the
total number of pixels contained in each training image (i.e. d = d1 × d2 ). Instead
of using a matrix representation, we use a vector notation by means of lexicographic
ordering. A d-element column vector si is obtained for each matrix Si by rearranging
the rows of the matrix. This operation is performed from left to right and from top
to bottom. Then, we construct a training data matrix S, of size d × N, having the
vector si as its ith column (S = [s1 s2 . . . sN ]). The filter Hm (u,v) is also rearranged
and represented by a d-element column vector hm .
Using Parseval’s Theorem, the energy Ei of the ith correlation (between the filter
and the ith training image) can be written as
d1 −1 d
2 −1
1
Ei = |Hm (u, v)|2 |Si (u, v)|2 = h m + Di h m (8.13)
d u = 0v = 0
where + is the transpose conjugate operation and Di is a d × d diagonal matrix

containing the power spectrum of training image i along its diagonal, i.e.
Di (k, k) = |si (k)|2 (8.14)
The ACE is defined as follows:
1
N
AC E = E i = h m + Dh m (8.15)
N i=1
1 N
where D = Di is a d × d diagonal matrix containing the average power
N i=1
spectrum of the training images placed along its diagonal. Minimising the ACE
measure will provide sharp correlation peaks, thereby making peak detection and
location relatively easily [13].
If the noise in the input images is of zero mean, additive and stationary, then the
ONV [15] is as follows:
O N V = h m + Ch m (8.16)
where C is a d × d diagonal matrix containing the elements of the input noise power
spectral density along its diagonal. In many applications where noise power spectral
density is unknown a good model uses white noise which assumes C = I (i.e. the
identity matrix).
The OTSDF filter [15] finds a compromise between reducing the ACE and reduc-
ing the ONV by minimising the following energy function:
E(h m ) = α(AC E) + β(O N V ) = h m + (α D + βC)h m (8.17)
where α 2 + β 2 = 1 and 0 ≤α, β ≤ 1.

Finally, and by using the method of Lagrange multipliers as described in [13],
the OTSDF filter hm that minimises the energy function E(hm ) while ensuring that
the correlation outputs at the origin take pre-specified values is given by
h m = T −1 S(S + T −1 S)−1 w (8.18)

√
where T = α D + 1 − α 2 C and w is a N × 1 column vector (i.e. w =
[w1 w2 . . .wN ]T ) containing the desired peak values for the training images. Vary-
ing the value of the parameter α allows us to trade-off between noise tolerance and
peak sharpness. When α = 1, the OTSDF filter reduces to the minimum average
correlation energy (MACE) filter [13] and when α = 0, it leads to the minimum
variance synthetic discriminant function (MVSDF) filter [8].
8.3.4 Unconstrained OTSDF Filter
The OTSDF filter described above constrains the correlation outputs at the origin to
be the same for the training images from the same class. However, these imposed
pre-specified values are designed only for training images and not for test images;
the correlation peaks are expected to decrease considerably for non-training images.
Further imposing such hard constraints will decrease the number of possible solu-
tions during the filter design and will reduce the chances of finding a filter with
better performances [14]. Thus, using these constraints may be counterproductive
and unjustifiable. For this reason, other filter designs have focused on removing
these constraints on the correlation peaks and requiring instead that the average cor-
relation height (ACH) [14], defined below, be large.

1 N
+
AC H = h m si = h m + m s (8.19)
N
i =1
where ms is the average of the N vectors s1 , s2 , . . ., sN . The UOTSDF filter is

obtained by maximising
|AC H |2 hm +ms ms +hm

J (h m ) = = (8.20)
α AC E + β O N V h m + (α D + βC)h m
where α 2 + β 2 = 1 and 0 ≤α,β ≤1. The maximisation of J(hm ) results in a filter of

the form [14]:

* −1
hm = α D + 1 − α2C ms (8.21)
Similar to an OTSDF filter, varying the value of the parameter α in Eq. (8.21)
allows us to trade-off between the correlation peak sharpness and noise tolerance of
the UOTSDF filter. Additionally, by comparing Eqs. (8.18) and (8.21), one can note
that Eq. (8.21) is simpler to implement since it only involves inverting a diagonal
matrix, while Eq. (8.18) requires the inversion of a N × N matrix, which makes the
UOTSDF filter more attractive from a computational standpoint.
8.3.5 Tests and Results

To gauge the filter’s effectiveness, the same reference database and the same test
images, as employed in Section 8.2.4, were used. The filters’ training stage was
performed using complete shoeprint images while the testing stage was carried out
using the generated partial shoeprints.
During the training, each complete shoeprint image is rotated in increments of
1◦ from –10◦ to 10 giving a set of 21 training images. These 21 images were used
to construct both the OTSDF and UOTSDF filters using Eqs. (8.18) and (8.21),
respectively (with α close to 1 and C = I, the identity matrix).
During the evaluation process, and for each type of filters, every test image was
used as input and matched against all 100 filters. Then, a list of reference shoeprints
ranked from the best match (with the highest matching metric) to the worst, was
produced and the rank of the correct match determined.
Finally, for each type of filters and for each matching metric (PH, PCE and PSR)
the proportion of times a correct match appeared first (first rank classification) was
calculated. For the PSR parameters (the sidelobe area widths), we have used the
following values: A = 5,20,40,60,80, B = 20,40,60,80,100 and A<B. The best per-
formance was obtained for A = 20 and B = 100 for both filters.
For different types of alterations and the three different metrics, the filter’s per-
formances are detailed in Table 8.2. From this, it can be seen that the PSR yields the
best performance for both filters. The results are also compared against the MPOC
and PSD-based [4] methods. From these results, it can be seen that the ACFs outper-
form the PSD-based method for all types of distortions/alterations when using the
appropriate metric (ACFs–PSR). Also, they provide a comparable performance to
the MPOC method for noisy shoeprints but are outperformed in the case of blurred
8.4 Conclusion 179
Table 8.2 Rank-1 recognition rates (%) using ACFs-, MPOC- and PSD [4]-based methods for
different alterations
Methods ACFs
PSD MPOC
OTSDF UOTSDF
[4] β=50
Test Images PH PCE PSR PH PCE PSR
Set1: clean partial 96.25 100 91.75 99.75 100 99.5 100 100
Set2: σ = 20 95.75 100 90.5 99.5 100 99.5 100 100
Noisy σ = 40 93.5 100 89.5 99 100 99.25 99.75 100
σ = 60 88.5 100 85.75 97.25 100 98.25 98.5 100
partial
σ= 80 76.5 100 75.75 95 99.25 95.75 97.5 100
Set3: L = 10 28.5 100 12.5 41.25 84.75 22 62.5 92.5
Blurred L = 20 11 100 5 15 49.25 8 25 63.5
L = 30 13 100 3.75 10 30.75 5 12.7 5 42.25
partial
L = 40 13 100 3.5 7.5 19.25 4.25 9 28.5
Set4: θ = 2.5° 60.25 98.75 72.5 97 99.5 85.5 96.25 99.5
Rotated θ = 5° 35.5 52.75 94.25 99.75 100 97.25 99.75 100
θ = 7.5° 25.5 29.75 76 97.75 99.5 87.5 97 99.75
partial
θ = 10° 19 21.25 96 100 100 93.25 99.75 100
Overall average 50.65 84.80 61.28 73.75 83.25 68.84 76.75 86.61
shoeprints. As expected, both filters provide better results than the MPOC method
when matching rotated shoeprints: the filters had recognition rates of over 99% for
all rotated shoeprints when using the best metric. It can also be observed that the
performance of the UOTSDF filter is generally better than that of an OTSDF filter.
The best overall performance was obtained when using the unconstrained filter with
the PSR metric, where 86.61% of the time a correct match was ranked first for all
the 5200 test images.
8.4 Conclusion
In this chapter, we have described how correlation-based techniques can be used

for automatic shoeprint classification problems. In particular, we have demonstrated
that the MPOC method and ACFs (i.e. OTSDF and UOTSDF filters) can be suc-
cessfully used for classifying low- quality partial shoeprints. The two approaches
are shift invariant and have high noise immunity. In the noisy case, 100% of the time
a correct match was ranked first for all noisy test images when using the MPOC or
the UOTSDF-PSR methods.
Furthermore, the MPOC method is very robust to blurring distortions but sensi-
tive to small rotations. The ACFs on the other hand, are less robust to blurring than
the MPOC but are tolerant to small rotations.
Both MPOC and ACFs-PSR outperform the PSD-based method regardless of the
degradations’ type. Finally, as future work, we propose to extend these methods on
larger databases and to investigate the use of other more advanced correlation filters.
Shoeprint’s alignment and/or enhancement before the matching process can also be
considered.
References
1. G. Alexandre, “Computerized classification of the shoeprints of burglar’s soles”, Forensic
Science International, vol. 82, pp. 59–65, 1996.
2. Z. Geradts and J. Keijzer, “The image data REBEZO for shoeprint with developments for
automatic classification of shoe outsole designs”, Forensic Science International, vol. 82, pp.
21–31, 1996.
3. A. G. Alexander, A. Bouradine and D. Crookes, “Automatic classification and recognition of
shoeprints,” Special Issue of the Information Bulletin for Shoeprint/Toolmark Examiners, vol.
6. no. 1, pp. 91–104, March 2000.
4. P. D. Chazal, J. Flynn and R. B. Reilly, “Automated processing of shoeprint images based on
the Fourier transform for use in forensic science,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 27, no. 3, pp. 341–350, March 2005.
5. L. Zhang and N. M. Allinson, “Automatic Shoeprint Retrieval System for use in Forensic
Investigations,” 5th Annual UK Workshop on Computational Intelligence, 2005.
6. A. V. Oppenheim and J. S. Lim, “The importance of phase in signals,” IEEE Proceedings, vol.
69, no. 5, pp. 529–541, 1981.
7. K. Takita, T. Aoki, Y. Sasaki, T. Higuchi and K. Kobayashi, “High-accuracy subpixel
image registration based on phase-only correlation,” IEICE Transactions on Fundamentals,
vol. E86-A, no. 8, pp. 1925–1934, August 2003.
8. B. V. K. V. Kumar, “Tutorial survey of composite filter designs for optical correlators,”
Applied Optics, vol. 31, pp. 4773–4801, 1992.
9. K. Venkataramani, S. Qidwai and B. V. kumar, “Face authentication from cell phone cam-
era images with illumination and temporal variations,” IEEE Transactions on Systems, Man,
Cybernetics, vol. 35, no. 3, pp. 411–418, August 2005.
10. P. Hennings, B. V. Kumar and M. Savvides, “Palmprint classification using multiple advanced
correlation filters and palm-specific segmentation,” IEEE Transaction on Information Foren-
sics and Security, vol. 2, no. 3, pp. 613–622, September 2007.
11. Y. Li, Z. Wang and H. Zeng, “Correlation filter: an accurate approach to detect and locate
low contrast character strings in complex table environment,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 26, no. 12, pp. 1639–1644, December 2004.
12. E. Perez and B. Javidi, “Nonlinear distortion-tolerant filters for detection of road signs in
background noise,” IEEE Transactions on Vehicular Technology, vol. 51, no. 3, pp. 567–576,
May 2002.
13. A. Mahalanobis, B. V. Kumar and D. Casasent, “Minimum average correlation energy filters,”
Applied Optics, vol. 26, pp. 3633–3640, 1987.
14. A. Mahalanobis, B. V. Kumar, S. R. F. Sims and J. F. Epperson, “Unconstrained correlation
filters,” Applied Optics, vol. 33, pp. 3751–3759, 1994.
15. B. V. Kumar, D. Carlson and A. Mahalanobis, “Optimal trade-off synthetic discriminant func-
tion filters for arbitrary devices,” Optics Letters, vol. 19, no. 19, pp. 1556–1558, 1994.
16. A. Mahalanobis, B. V. K. V. Kumar and S. R. F. Sims, “Distance classifier correlation filters
for multi-class target recognition,” Applied Optics, vol. 35, pp. 3127–3133, 1996.
17. M. Alkanhal and B. V. K. V. Kumar, “Polynomial distance classifier correlation filter for
pattern recognition,” Applied Optics, vol. 42, pp. 4688–4708, 2003.
Chapter 9
Automatic Shoeprint Image Retrieval
Using Local Features
9.1 Motivations
Currently, only a very few advanced techniques on shoeprint image noise reduction,
robust thresholding (segmentation) and pattern descriptors have been proposed for
use in shoeprint image matching and retrieval. However, a number of existing and
elegant techniques of pattern (shape) descriptors have been ruled out due to the
difficulty of the segmentation, i.e. to separate shoe mark profiles from backgrounds,
and to further separate patterns in a profile from each other.
Local image features are computed from distinctive local regions and do not
require a priori segmentation. They have proved to be very successful in applica-
tions such as image retrieval and matching [6, 9, 15, 22], object recognition and
classification [5, 12, 14, 18, 21] and wide baseline matching [24]. Consequently,
many different scale and affine invariant local feature detectors, robust local fea-
ture descriptors and their evaluations have been widely investigated in the literature
[1, 2, 4, 8, 12, 13, 16, 17, 19, 20, 22].
This chapter is concerned with the retrieval of scene-of-crime (or scene)
shoeprint images from a reference database of shoeprint images by using a new
local feature detector and an improved local feature descriptor. Similar to most other
local feature representations, the proposed approach can also be divided into two
stages: (i) a set of distinctive local features is selected by first detecting scale adap-
tive Harris corners where each corner is associated with a scale factor. This allows
for the selection of the final features whose scale matches the scale of blob-like
structures around them and (ii) for each feature, an improved Scale Invariant Fea-
ture Transform (SIFT) descriptor is computed to represent it. Our investigation has
led to the development of two novel methods which are referred to as the Modified
Harris–Laplace (MHL) detector and the Modified SIFT descriptor, respectively.
9.2 Local Image Features

K. Mikolajczyk et al. [20] have given a detailed comparison of six state-of-the-
art local feature detectors and have concluded that all detectors involved were
complementary, and if combined, the performance can be improved. They have also
Technology, DOI 10.1007/978-0-387-09532-5 9,

182 9 Automatic Shoeprint Image Retrieval Using Local Features
stated that the Harris–Affine [17] and Hessian–Affine [20] detectors provide more
features than other detectors. This can be particularly useful when matching scenes
with occlusion and clutter, though the Maximally Stable Extremal Region (MSER)
detector [13] achieves the highest score in many cases in terms of repeatability. In
our case, an affine invariant local feature refers only to the translation-, rotation- and
scale-invariant (covariant) local regions. We shall not consider other general affine
or even perspective invariant cases, which rarely happen in the case of shoeprint
images. Furthermore, K. Mikolajczyk and Schmid in [16, 19] have evaluated the
performance of ten state-of-the-art local feature descriptors in the presence of real
geometric and photometric transformations. They have claimed that an extension of
the SIFT descriptor [11, 12], called Gradient Location and Orientation Histogram
(GLOH) [20] performs slightly better than the SIFT itself while both outperform the
other descriptors. The authors have also suggested that local feature detectors, such
as Hessain–Affine and Hessian–Laplace, which mainly detect the blob-like struc-
tures, can only perform well with larger neighbourhoods. However, this conflicts
with one of the basic properties of local image features – the locality.
Typically, a local image feature should have four properties: locality, repeatabil-
ity, distinctiveness and robustness against different degradations. The above studies
suggest that none of the current local feature representation can outperform others
in terms of all the above four properties. Therefore an efficient local feature rep-
resentation should be a trade-off of these properties. The work described in this
chapter aims to firstly detect a set of distinctive local features from an image by
combining a scale adaptive Harris corner detector with an automatic Laplace-based
scale selection. Here, the location of the features is determined by the scale adaptive
Harris corner detector where the characteristic size of a local feature depends on
the scale of the blob-like structure around this corner, which is determined by the
automatic Laplace-based scale selection. Then, for each local feature, an improved
SIFT descriptor is computed to represent the feature. This descriptor actually further
enhances the GLOH method by using a circle binary template for rotation invari-
ance, and by binning the SIFT histogram into a range of 180◦ rather than the original
360◦ for complement image robustness. Finally, the matching of the descriptors is
carried out by combining nearest neighbour measure with threshold-based screen-
ing; two descriptors match only if one is the nearest neighbour of the other (i.e. they
are distant by a value smaller than a threshold). Then, the distance between two
shoeprint images is computed from the matched pairs.
9.2.1 New Local Feature Detector: Modified Harris–Laplace

Detector
A local feature here refers to any point and its neighbourhood in an image where
the signal changes significantly in two dimensions. Conventional “corners”, such
as L-corners, T-junctions and Y-junctions satisfy this, but so do isolated points,
the endings of branches and any location with significant 2D texture. Also, all of
9.2 Local Image Features 183
these local structures have a characteristic size. Mikolajczyk and Schmid [15] have
extended the Harris corner detector to a multiscale form to detect the corners at
different scales [7]. In an earlier work [10], Lindeberg has presented in detail a fea-
ture detector with an automatic scale selection where a Laplace of Gaussians (LoG)
transform has been demonstrated to be successful in scale selection of blob-like
structures. Likewise in [15], the authors have proposed a new Harris–Laplace detec-
tor by exploiting (i) the high accuracy of location of a Harris corner detector and
(ii) the robust scale selection of the LoG detector. However, the way in which they
are combined does not necessarily result in an accurately located and stable scaled
local feature detector, since the detector is actually required to determine when the
response of the Harris measure reaches a maximum in the spatial domain and so
does the response of the LoG at the same location but in the scale direction. In most
cases, the unstable component of such a detector is related to the scale selection
since the stability of a scale selection based on LoG is conditional upon this mea-
sure being computed at the centre of a blob structure, rather than at locations with
the Harris maxima. In this section, we will propose a solution to this problem. Fol-
lowing the name of the Harris–Laplace detector, we call this detector a Modified
Harris–Laplace detector.
9.2.1.1 Modified Harris–Laplace (MHL) Detector

A scale adaptive Harris detector is based on an extension of the second moment
matrix of Eq. (9.1), where, σi , σd and f α are the integration scale, the differen-
tiation scale and the derivative computed in the direction of α, respectively [15].
The strength of a scale adaptive corner can be measured by det( A(x, σi , σd )) −
κ · t r ace2 ( A(x, σi , σd )).

f 2x (x, σd ) f x f y (x, σd )
A(x, σi , σd ) = σd2 · g(σi ) ∗ (9.1)
f y f x (x, σd ) f 2y (x, σd )

LoG(x, σ ) = σ 2 L x x (x, σ ) + L yy (x, σ ) (9.2)
An example of a scale-adaptive Harris corner detection on a synthetic image is

shown in Fig. 9.1, where the figure suggests that for one corner, there might be a
series of points detected by this approach depending on the diffusion and the spatial
extent of the corner. Obviously, only a few of them (normally one or two) represent
the characteristic size of this corner. Here, a LoG-based automatic scale selection
can be applied to remove these redundant points which are actually the drift of the
same corner due to scale-space smoothing. The normalised LoG response of an
image is defined by Eq. (9.2), where σ is the scale and L αα is the second derivative
computed in the direction of α. The principle of this technique stems from the fact
that the characteristic scale of a local structure very often corresponds to the local
extremum of the LoG along the scale direction at its centre (see the right curves of
Fig. 9.2). However, it must be noted that, in most cases, this principle does not work
well for structures like corners. Figure 9.2 shows the LoG responses at the centre of
Fig. 9.1 Example of adaptive

Harris corner detection, and
LoG-based scale selection
a blob structure (red cross) and a corner (blue cross) over the scales on two synthetic
128 × 128 images, where the sizes of white squares for (a) and (b) are 11 × 11 and
21 × 21, respectively. The maxima of the red curves clearly reflect the scales of the
white square in (a) and (b). The figure also illustrates why the scale selection of
the Harris–Laplace detector is unstable, i.e. the middle curves (blue) have too many
extrema, thus leading to redundant and unstable scales.
It is also noted from Fig. 9.1 that the scale of a blob structure (red circle) selected
by LoG can be related to the scale of the corners around the structure of this blob.
Actually, in most cases, it is reasonable to assume that a corner can be associated
with a blob structure around this corner. Based on this assumption, only those can-
didate corners whose scale has a predefined relationship with the scale of a blob
structure around them can be selected as a characteristic scale of that corner. There
are two factors which should be considered for this strategy: (i) the search region
and (ii) the relationship between the scale of the blob structure and that of the cor-
ner. Figure 9.3 illustrates this strategy, where the red solid circle (r ad i us = r)
denotes a blob structure while the red dashed circles denote the search region with
the radius of r 1 and the reference circle with the radius of r 0 . The green circles are
the candidates of the same corner located at the top-left of the square, and the blue
circle represents the selected characteristic scale of this corner. (here only the can-
didate scale whose value is nearest to the reference radius (r 0 ) will be selected as
the characteristic scale of the corner). In all of our experiments, Eq. (9.3) is applied
to relate the reference scale r 0 , the search region r 1 with the blob scale r.
√
√ 2
2 · r0 = r = · r1 (9.3)
2
Fig. 9.2 The responses of the 1 1

LoG measures at different
0.9 0.9
locations along the scale
direction, where the x-axis 0.8 0.8
denotes the scale direction.
The middle curves correspond 0.7 0.7
to the top-left corner of the
0.6 0.6
white square, and the right
curves correspond to the 0.5 0.5
centre of the white square
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
5 10 15 5 10 15
(a)
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
5 10 15 5 10 15
(b)
9.2.1.2 Repeatability Evaluation

The repeatability score of a local feature detector is given by computing the ratio
between the number of correct matches and the smaller number of features detected
in one of the images. A typical definition of repeatability has considered the over-
lap error, which is defined as the error of the corresponding features in terms of
area [20]. Two features are claimed to correspond if the overlap error is less than
a predefined threshold. Here, to put our detector into the context of the studies
in this problem area, we apply the codes and the benchmark images from [5] to
Fig. 9.3 Illustration of

automatic scale selection
based on the scale of a
blob-like structure
r r0
r1
measure the repeatability of detectors, and compare the performance of our detector
with three other similar detectors (Fig. 9.4), namely Harris Laplace (HarLap), Hes-
sian Laplace (HesLap) and Harris Hessian Laplace (HarHesPal) detectors. Two sets
of images (Boats and Bikes) have been tested in this experiment, each containing
six images with either scale decreasing (zoom out) or blur increasing. For each set
of image sequence, five repeatability scores have been computed between the first
image and the reminders, and the results are shown in Fig. 9.4. It is worth noting
that Harris and Hessian measures can detect two different structures (corner-like
and blob-like). Therefore, by simply combining them one can build another detec-
tor, referred to as the Harris–Hessian–Laplace detector, which considers the most
significant responses of the Harris and Hessian measures, since the spatial location
of a feature, while the scale is still determined by LoG measure.
Figure 9.4 suggests that, in most cases, our proposed detector outperforms the
other three in terms of repeatability. Here, it should be noted that we have limited
the number of the raw features to under 400 by a universal significance measure of
a feature (this measure is defined as the multiplication of the response of LoG and
the area of the local region).
Figure 9.5 shows an example of image matching based on the proposed MHL
detector. The local feature descriptor and the matching strategy used in this matching
process will be detailed in the following sections. The main transformations between
two images are a scale change (scale ratio = 2.8) and an in-plane rotation [20]. In
this example, 23 out of 32 matches are correctly computed thus outperforming the
Harris–Laplace detector (where only 6 out of 26 matches are correctly computed),
provided that all other conditions are same.
9.2.2 Local Feature Descriptors

The local photometric descriptors computed in this work, as mentioned in the first
section, are a further extension of the GLOH, originally from SIFT descriptors. Our
method is different from GLOH in that
70
HarLap
HesLap
60 HarHesLap
MHL
50
repeatability (%)
40
30
20
10
1 1.5 2 2.5 3 3.5 4 4.5 5
scale change
(a)
85
HarLap
80 HesLap
HarHesLap
MHL
75
70
repeatability (%)
65
60
55
50
45
1 1.5 2 2.5 3 3.5 4 4.5 5
increasing blur
(b)
Fig. 9.4 The repeatability comparison of four detectors. (a) is on the images with scale and
rotation changing. (b) is on the images with increasing blur. (Referring to the Boat and the Bike
images from [5])
Fig. 9.5 Matching result of two images, 23 out of 32 matches are correct. (Refer to the Boat
images from [5])
(i) First, we apply a circular binary template on each normalised local region to
increase the rotation invariance of the descriptor. Both SIFT and GLOH obtain
the rotation robustness by weighting the local region with a Gaussian window.
However, very often, when one chooses a larger sigma for the Gaussian kernel,
the descriptors computed for the region are distinctive but rotation sensitive.
On the other hand, when one chooses a smaller sigma for the Gaussian kernel,
the descriptors are rotation robust but not distinctive. In most cases, it is hard
to choose a proper sigma. In this work, we apply a binary template to limit the
region to a circular one, and meanwhile use a larger sigma for the Gaussian
window to keep the distinctiveness of the descriptors.
(ii) Second, for complement image robustness, we bin the histogram with the orien-
tation range of 180◦ rather than the original 360◦ . In our application of shoeprint
image matching, the complement image robustness is very important, since
often the query shoeprint image from scenes of crime is the complement of
the shoeprint image in the reference database. Complement robustness can be
easily obtained by binning the histogram with the orientation range of 180◦ , i.e.
without considering the polarity of the gradients.
The construction of our local descriptors is similar to GLOH, i.e. we bin the gradi-
ents in a log-polar location grid with three bins in the radial direction and four bins
in the angular direction (the central grid does not apply angular binning), resulting
in a nine location grid. Noting the orientation range of 180◦ , four bins are applied
in the gradient orientations. Finally, the descriptors of an image comprise a N × 36
(4 × 9) matrix, where N is the number of the local features detected in the working
image.
9.2.3 Similarity Measure
The similarity between two images depends on the matching strategy of the local
features. For the sake of retrieval speed, we apply the nearest neighbour and
thresholding jointly to compute the distance between two images, i.e. for each
descriptor in one image, the nearest neighbour in another image is found as a poten-
tial match, then only those matches whose distance is below a threshold are selected
as the final matches. The similarity of two images is computed from the summation
of exp(−d), where d denotes the distance of a match. Of course, there are many
other strategies for computing the similarity or matching score between two images.
The example of image matching in Fig. 9.5 applies the nearest neighbour to obtain
the initial matches, and then the RANSAC (Random Sample Consensus) algorithm
is used to reject mismatches. RANSAC is a general algorithm for robustly fitting
models in the presence of many data outliers. Here, the model is a 3 × 3 fundamen-
tal matrix. The final matches/correspondences are shown in Fig. 9.5.
9.3 Experimental Results
9.3.1 Shoeprint Image Databases

The test databases of shoeprint images used in this work are based on images
provided by Foster and Freeman Ltd (UK) [http://www.fosterfreeman.com] and
Forensic Science Services (UK) [http://www.forensic.gov.uk]. A small selection of
scene-of-crime shoeprint images was also supplied. These are of very poor quality,
and it is not known if the shoe soles which made the scene impression are contained
in the databases.
The starting point for the databases is a dataset of high-quality images collected
from shoemakers, called dClean, consisting of 500 shoeprint images. dClean con-
tains prints from shoes made by most shoe sellers, such as Nike, Adidas, Fila, etc.
and these images are different from each other in terms of patterns (or structures).
With dClean as the base data set, we have generated a number of degraded data
sets using different types of degradations which attempt to emulate some of the
deformations which result in real scene images:
• Gaussian noise shoeprint data set: dGaussian

This data set, called dNoise, consists of 2500 noisy prints. Five different levels
of Gaussian noise is added to each of the 500 shoeprints in the base data set. The
noise level (σ ) varies from 5 to 25 with a step of 5, considering the range of the
grey level is from 0 to 255.
• Partial shoeprint data set: dPartial
This data set, called dPartial, is produced to emulate the fact that scene images
often only contain part of a shoemark. It consists of 2500 partial prints – 5 partial
shoemarks generated from each shoemark in dClean. To create a partial shoeprint
image, the silhouette of a complete shoeprint is extracted first, and then two
points on the silhouette perimeter are chosen to generate a cross-sectional line
joining the two points. To avoid the end points being too close to each other
(resulting in a meaningless partial print), a minimum ratio can be set to bind the
‘BC’
‘S’
‘L’ ‘BP’
Fig. 9.6 ‘S’ – sample, ‘L’ – line, ‘BC’ – complete boundary, ‘BP’ – partial boundary. Left image
on the second row is a “complete” one, while right is a “partial” one
length of the complete boundary. Then several random points around the line are
selected as the samples of the partial boundary. (Figure 9.6 illustrates complete
and partial boundaries.) With these samples, a spline interpolation is applied to
produce the partial boundary. The pixels on one side of the curve are set to 1 or 0
thus producing a partial mask, which is then used to generate a partial shoeprint.
Five partial shoeprint images are generated for each of shoeprint in the dClean
data set. The percentage of the partial shoeprint which remains is varied from
40% to 95%. An illustration of the partial shoeprint creation and one example
are shown in Fig. 9.6.
• Rescaled shoeprint data set: dRescale
This data set, termed dRescale, consists of 2500 rescaled prints where each
shoeprint in dClean has been rescaled with five random scale ratios in the range
of 0.35–0.65. Here, we did not use a scale ratio larger than 1.0, because (i) up-
sampling always has a similar influence to down-sampling on the scale robust-
ness of an approach; and (ii) the original size of the shoeprint images in dClean is
large enough, so any expansion will bring much trouble in the feature extraction
computation.
• Rotated shoeprint data set: dRotate
This data set, called dRotate, is used to test algorithms for rotation invariance
and consists of 2500 rotated prints. Each shoeprint in the base data set has been
rotated with five random orientations in the range of 0◦ –90◦ . The selection of this
range (rather than 0◦ –360◦ ) is based on the fact that the algorithms developed in
this thesis are flip invariant both in horizontal and vertical directions.
• Scene shoeprint data set: dScene

Unfortunately, the scene images from F&F Ltd. do not have corresponding ref-
erence images in the dClean data set. So to test the algorithm on shoemarks with
some of the background and artefacts which occur in real scene images, we have
simulated scene images by combining the actual scene images with the base
data set dClean. For each image in the dClean, we randomly select one of our
small set of actual scene images as the background, and superimpose the “clean”
shoeprint on the background scene, resulting in 500 scene images, called dScene.
(Figure 9.7 shows some examples of the resulting scene images.)
• Complex scene shoeprint image data set: dComplexDegrade
To simulate other kinds of real scene images with several degradations, a small
set of 50 degraded images has been generated. The combined degradations,
totalling 50 images, are produced as follows: (i) Scene background additions
(complex) – the weights for a scene and a clean image are 0.7 and 0.3, respec-
tively, (ii) Partial prints + scene background additions, (iii) Rotations + scene
background additions, (iv) Rescales + scene background images and (v) Patterns
+ scene background images – the patterns are manually extracted from clean
images (normally the size of a pattern is less than 30% of the total size of the
print). The weights for the scene and the clean image (pattern) are set to 0.6 and
0.4 for all of the above combined degradations, respectively. Figure 9.8 shows
some examples of the complex degraded shoeprint images.
The following experiments were carried out to compare the performance, in

terms of cumulative matching curve (CMC), of four signatures: i.e. Edge Direc-
tional Histogram (EDH), Power Spectral Distribution (PSD), Pattern and Topolog-
ical Spectra (PTS), and our Local Image Feature (LIF) method. The experiments
are conducted as follows: we first consider a shoeprint image from the six degraded
data sets as a query image. However, because of the computation requirements, we
randomly choose just 50 images from each of the degraded data sets, except for
dComplexDegrade, where we take all 50 (therefore, 50 trails for each degraded data
set). Then we proceed with a search against the reference data set dClean (contain-
ing 500 shoeprints). For each data set, we compute the CMC curve and the results
are depicted in Fig. 9.9. Besides the above quantitative comparisons, a few retrieval
examples are shown in Fig. 9.10, and Table 9.1 also lists the signature sizes of the
compared methods.
Edge Directional Histogram (EDH) – (Zhang and Allinson [25]) – this technique
assumes that the interior shape information of a shoeprint image is described in
terms of its significant edges. Therefore, the authors have applied a histogram of the
edge direction as the signature of a shoeprint image. This method first extracts the
edges using a Canny edge detector followed by a total of 72 bins which are used
to bin the edge directions quantised at 5◦ intervals. To achieve rotation invariance,
a 1D FFT of the normalised edge direction histogram is computed and used as the
final signature of the shoeprint image.
Power Spectral Distribution (PSD; deChazal et al. [3]) – this method considers
the power spectral distribution as the signature of shoeprint image. To compute the
Fig. 9.7 Examples of shoeprint images from dScene
PSD, one needs to first down-sample an input image, and second take 2D DFT on
the down-sampled image; then the power spectral distribution is computed. Finally,
a masking step is taken to obtain the signature. In the similarity computation, the
(a) (b)
(c)
Fig. 9.8 Examples of synthetic scene shoeprint images; the images of (a), (b) and (c) correspond
to (scale + scene), (scene+complex), and (pattern + scene), respectively
dNoise
1
0.9
1
0.8
0.7
cumulative match score
0.95
0.6
0.9
0.5 0 5 10 15
0.4
EDH
0.3
PTS
0.2 PSD
LIF
0.1 RAND
0
0 20 40 60 80 100
percentage
dPartial
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
EDH
PTS
0.2
PSD
0.1 LIF
RAND
0
0 20 40 60 80 100
percentage
Fig. 9.9 (continued)

dRecale
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
EDH
0.2 PTS
PSD
0.1 LIF
data5
0
0 20 40 60 80 100
percentage
dRotate
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
EDH
PTS
0.2
PSD
0.1 LIF
RAND
0
0 20 40 60 80 100
percentage
Fig. 9.9 (continued)

dScene
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
EDH
0.2 PTS
PSD
0.1 LIF
RAND
0
0 20 40 60 80 100
percentage
dComplexDegrade
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
EDH
PTS
0.2
PSD
0.1 LIF
RAND
0
0 20 40 60 80 100
percentage
Fig. 9.9 Performance evaluation of four signatures (EDH, PTS, PSD and LIF) in terms of cumu-
lative matching score on six degraded image data sets. RAND here is the worst case, i.e. the rank
of the images in the reference data set is randomly assigned
d = 0.0304 d = 0.1116 d = 0.1255 d = 0.1434 d = 0.1434
d=0.0149 d = 0.2511 d = 0.3347 d = 0.3347 d = 0.3348
d = 0.0061 d = 0.0066 d = 0.0100 d = 0.0107 d = 0.0120

Fig. 9.10 Examples of shoeprint image retrieval. In each row, the leftmost image is a noisy
query shoeprint from dComplexDegrade, and the rest of the row shows the top ranked shoeprint
images in dClean. The distance is shown under each retrieved image, and the red squares denote
the corresponding patterns contained in the query images
Table 9.1 The signature size for the four techniques
Methods EDH PTS PSD LIF
Signature size 72 120 24,099 19,131
PSD of a query image has to be rotated 30 times, with a 1◦ step, and the largest
similarity value is considered over the 30 rotated versions. In our experiments, the
step of rotation is removed because it is computationally intensive and also a rotation
range of 30◦ is not suitable in most practical situations.
Pattern and Topological Spectra (PTS; Su et al. [23]) – this method consid-
ers the problem of automatic classification of noisy and incomplete shoeprint
images and employs the principle of topological and pattern spectra. A Topo-
logical Spectrum for a shoeprint image, based on repeated open operations with
increasing size of structuring elements, giving a distribution of Euler numbers
is computed. The normalised differential of this produces the topological spec-
trum and a hybrid algorithm, which uses a distance measure based on a combi-
nation of both spectra as the feature of a shoeprint image, is proposed and applied
successfully.
The above results suggest that
(i) For the data sets degraded with Gaussian noise, cutting-out, and rescaling, the
signatures of PSD and LIF can achieve almost perfect results. Further, LIF can
achieve similar performance for the data sets degraded by rotation and scene
background addition.
(ii) The performance of EDH and PTS is marginally worse than that of PSD and
LIF for the degradations of Gaussian noise, cutting-out, rescaling and rotation.
However, both methods are efficient, noting that the cost (signature size) of the
two signatures is significantly smaller than for the other two. Further it can be
observed that PTS outperforms EDH in most cases (with the exception on the
rescaled database).
(iii) The signature of LIF works very well for all kinds of degradations. It clearly
outperforms other signatures on the data set with the most complex degrada-
tions. However, LIF is more computationally intensive than both EDH, and
PTS. (For instance, it takes about 40 sec, on average, to compute the LIF of
a shoeprint image with the size of 768 × 280 on our machine – Pentium 4
CPU 2.40 GHz, 760 MB of RAM, while it takes less than 1 sec and around
2 seconds for computing the EDH and PTS of an image with the same size,
respectively).
Two further shoeprint matching examples based on local features are given in Fig.
9.11. The synthetic scene images contain degradations of rotation, rescaling, pattern
segmentation and scene addition. It can be seen from Fig. 9.11(a,b) that more than
80 percent of feature matches are correct.
9.4 Summary 199
Fig. 9.11 Examples of

shoeprint matching based on
local image features
(a)
(b)
9.4 Summary
This chapter has discussed a local feature detector (Modified Harris–Laplace detec-
tor) which employs a scale-adaptive Harris corner detector to determine the local
feature candidates and a Laplace-based automatic scale selection strategy in order
to select the final local features. We have further improved the widely used local fea-
ture descriptors to be more robust to rotation and complement operations. To assess
the performance of the system, a set of synthetic scene shoeprint images (modelling
real world degradations) were used. A number of experiments on shoeprint image
matching and retrieval were also conducted.
The experimental results have indicated that (i) compared with the Harris–
Laplace detector, the Modified Harris–Laplace detector provides more stable local
regions, (ii) the local image descriptors perform significantly better than the global
descriptors on shoeprint image matching and retrieval.
Further issues to be investigated include
• to further reduce the dimensions of a local feature descriptor. Even though we

have taken some measures to reduce this, such as applying 36 dimensions instead
of the original 128 dimensions, and limiting the number of detected local features
to under 400;
• to develop a fast and accurate matching strategy to deal with a large shoeprint
image database. The current matching based on nearest neighbour and thresh-
olding is fast but not accurate enough, while a more accurate matching strategy
based on RANSAC is computationally intensive;
• to develop more advanced local feature detectors and descriptors;
• to extend the evaluation of shoeprint image retrieval and matching using local
image features with real scene images.
References
1. H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speed-Up Robust Features,” ECCV’06,
pp. 404–417, 2006.
2. G. Carneiro, and A. D. Jepson, “The distinctiveness, detectability, and robustness of local
image features,” CVPR’05, vol. 2, pp. 296–301, 2005.
3. P. D. deChazal, J. Flynn and R. B. Reilly, “Automated processing of shoeprint images based
on the Fourier transform for use in forensic science, ” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 27, no. 3, pp. 341–350, 2005.
4. G. Dorko and C. Schmid, “Maximally stable local descriptor for scale selection,” ECCV’06,
pp. 504–516, 2006.
5. R. Gal and D. Cohen-Or, “Salient Geometric Features for Partial Shape Matching and Simi-
larity,” ACM Transactions on Graphics, vol. 25, no.1, pp. 130–150, 2006.
6. V. Gouet and N. Boujemaa, “Object-based queries using colour points of interest,” IEEE
Workshop on Content-Based Access of Image and Video Libraries (CVPR/CBAIVL) Hawai,
USA, 2001.
7. C. Harris and M. Stephens, “A combined corner and edge detector,” Alvey Vision Conference,
Manchester, UK, pp. 147–151, 1988.
8. T. Kadir, A. Zisserman and M. Brady, “An affine invariant salient region detector,” ECCV’04,
pp. 404–416, 2004.
9. L. Ledwich and S. Williams, “Reduced SIFT Features for Image Retrieval and Indoor Local-
isation,” Australasian Conference on Robotics and Automation, Canberra, 2004.
10. T. Lindeberg, “Feature Detection with Automatic Scale Selection,” IJCV, vol. 30, no. 2,
pp. 77–116, 1998.
References 201
11. D. Lowe, “Object Recognition from Local Scale-Invariant Features,” ICCV’99,

pp. 1150–1157, 1999.
12. D. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, vol. 60, no. 2,
pp. 91–110, 2004.
13. J. Matas, O. Chum, M. Urban and T. Pajdla, “Robust wide baseline stereo from maximally
stable extremal regions,” BMVC, vol. 1, pp. 384–393, 2002.
14. X. Ma and W. Eric, L. Grimson, “Edge-based rich representation for vehicle classification,”
ICCV’05, pp. 1550–5499, 2005.
15. K. Mikolajczyk and C. Schmid, “Indexing based on scale invariant interest points,” ICCV’01,
pp. 525–531, 2001.
16. K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” CVPR’03,
vol. 2, pp. 257–264, 2003.
17. K. Mikolajczyk and C. Schmid, “Scale and Affine invariant interest point detectors,” IJCV,
vol. 60, no. 1, pp. 63–86, 2004.
18. K. Mikolajczyk, B. Leibe and B. Schiele, “Local Features for Object Class Recognition,”
ICCV’05, pp. 1792–1799, 2005.
19. K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Trans-
actions on PAMI, vol. 27, no. 10, pp. 1615–1630, 2005.
20. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky,
T. Kadir and L. Gool, “A comparison of affine region detectors,” IJCV, vol. 65, no. 1–2,
pp. 43–72, 2005.
21. B. Platel, E. Balmachnova, L. M. J. Florack and B. M. Ter Harr Romeny, “Top-points as
interest points for image matching,” ECCV’06, Part I, LNCS 3951, pp. 418–429, 2006.
22. C. Schmid, R. Mohr and C. Bauckhage, “Evaluation of interest point detectors,” IJCV,
vol. 37, pp. 151–172, 2000.
23. H. Su, C. Crookes and A. Bouridane, “Shoeprint image retrieval based on pattern and topo-
logical spectra,” Proceedings of the Irish Machine Vision and Image Processing Conference,
IMVIP 2007, Dublin, Ireland, Aug. 2007.
24. T. Tuytelaars and L. Van Gool, “Matching widely separated views based on affine invariant
regions,” IJCV, vol. 59, no. 1, pp. 61–85, 2004.
25. L. Zhang and N. M. Allinson, “Automatic shoeprint retrieval system for use in forensic
investigations,” 5th Annual UK Workshop on Computational Intelligence, 2005.
Index
A shoeprint matching algorithm in,

ACE, see Average correlation energy 169–170
ACH, see Average correlation height spectral weighting functions in,
Acquisition module, 12–13 168–169
Advanced Correlation Filters (ACFs), in translation and brightness properties of
shoeprints classification, 165, POC function, 168
172–174 See also Shoeprint image detection
matching metrics in, 175 Automatic target recognition applications, 172
OTSDF filter, 176–177 Average correlation energy, 176
outcomes of, 178–179 Average correlation height, 177
unconstrained OTSDF filter, 177–178
See also Phase-Only Correlation (POC) B
technique, in shoeprint classification 2n -Band DFB, 35–36
Alexandre’s shoemark system, 161 Band-pass-type spectral weighting function,
See also Shoemark classification systems usage, 168–170
Alpha stable model, in watermark detection, Between-class scatter matrix, 28
129–130 Biometric-based security, 1
See also Watermarking technique Biometric-based systems, attack types, 117
Amplitude modulation watermarking method, Biometric data
for fingerprint images, 123–124 limitations and threats of, 118
See also State-of-the-art security, watermarking techniques in,
Appearance-based approaches 118–119
for feature extraction, 25 Biometric identification system, phases
of recognition system, 7, 8 data acquisition process, 12–15
Aqueous humor, 52 feature extraction process, 15–16
ATR applications, see Automatic target matcher algorithms, 16
recognition applications performance evaluation, 17–18
Augmented Gabor-face vector, 32–33 testing of, 17
Automated finger imaging systems (AFIS), 3 Biometrics
Automatic shoeprint classification techniques, characterictics, 2
approaches, 165 defined, 1
ACF method in modalities, 2–5
matching metrics in, 175 recognition, 1, 5
OTSDF filter, 176–177 recognition application
outcomes of, 178–179 databases, 9
unconstrained OTSDF filter, 177–178 data localisation, 6–7
POC technique in methods to extract distinguishing
experimental outcomes of, 170–172 features of an image, 8–9
POC function, 166–167 normalisation techniques, 7–8
203
204 Index
Biometrics (cont.) See also Shoemark classification systems

pre-processing techniques, 7–8 Denial of service, 118
template matching, 9 DFT, see Discrete Fourier Transform
verification (or authentication), 5 DGaussian, see Gaussian noise shoeprint
watch-list applications, 6 data set
Birkett system, in shoemark feature Digital watermarking encoder and
classification, 158–159 decoder, 120
See also Forensic science See also Generic watermarking system
1-Bit watermarking, role, 121 Directional filter bank (DFB), 22, 33–37
Blind watermarking, usage, 122 Directional images, 36–37
See also Generic watermarking system Discrete circular active contour (DCAC),
54–55
C Discrete Cosine Transform (DCT), 79, 122
Chaotic parameter modulation, 124 Discrete Fourier Transform (DFT), 79, 122
Chinese Academy of Sciences – Institute of Discrete Wavelet Transform (DWT), 79, 122
Automation (CASIA) eye image Distance Classifier Correlation Filter, 172
database, 65, 72 DOS, see Denial of service
Chou’s method, of watermarking schemes, DPartial, see Partial shoeprint data set
87–92 DRescale, see Rescaled shoeprint data set
CMC, see Cumulative matching curve DRotate, see Rotated shoeprint data set
Combined multiresolution feature extraction DScene, see Scene shoeprint data set
techniques, 72–73 DWT, see Discrete Wavelet Transform
Complex scene shoeprint image data set, 191 DWT coefficients, modelling, 132–135
See also Local image features (LIF)
Composite filters, see Advanced correlation E
filters (ACFs), in shoeprints Edge directional histograms, 165, 191
classification Edge map detection, 57–62
Contour detection process, 54–55 for iris boundaries, 62
Contrast masking, 86 EDH, see Edge directional histograms
Cornea, 52 Eigenfaces’ method, 21
Correct (or Genuine) identification rate (CIR), Electrostatic shoemark lifting, 153–154
5, 23 See also Forensic science
Correlation-based approaches, in shoeprints Equal error rate (ERR), 18
classification, 165 Euclidean distance decoder, 96
Costa’s optimal random codebook, 96 Even Symmetry Gabor filters, 52
Covert systems, 13 Exclusive OR (XOR) operation, 71
CPM, see Chaotic parameter modulation Eyelashes, isolation, 63
Cumulative matching curve, 191 Eyelid regions, isolation, 63
D F
Data hiding method, for fingerprint Face detection, 23, 25
images, 123 Face localisation, 23
See also State-of-the-art Face recognition basics
Data storage, 14 application of Watch-List task, 23
Daugman’s algorithm, 65 with FERET database, 37, 43–45
Daugman’s Integro-Differential Operator, independent component analysis (ICA),
53–54 27–28, 39–41
Daugman’s method, for iris recognition, 73–74 linear discriminant analysis (LDA),
Daugman’s Rubber sheet model, 64 28–29, 41
DCCF, see Distance classifier correlation filter principal component analysis (PCA),
DComplexDegrade, see Complex scene 26–27, 38–39
shoeprint image data set recognition test, 22–23
DCT, see Discrete Cosine Transform steps
De Chazal’s system, 163 feature extraction, 25–26
Index 205
localisation and detection, 23–24 from crime scenes, 149–150

matching, 26 casts making of shoemarks, 152–153
normalisation and pre-processing, computerised system, data entry
24–25 into, 157
subspace discriminant analysis (SDA), electrostatic shoemark lifting, 153–154
29–31, 41–43 perfect shoemark scan in, 154–155
using filter banks photography of shoemarks, 151–152
directional filter bank, 33–37 procedures of, 150
Gabor filter bank, 31–33 processing of shoemarks, 155–156
verification test, 22 recovery of shoemarks from snow, 154
with YALE Face database, 37 shoemarks, gelatine lifting of, 153
Face recognition vendor test (FRVT), 22 suspect’s shoe, cast making in, 155
Failure to capture rate (FCR), 18 transfer/contact prints, 150–151
Failure to enrol (FTE), 18 limitations in, 143–144
False acceptance rate (FAR), 4, 5, 18, 22, 118 in applications of, 144–146
False alarm rate, 6, 18 automating shoemark classification,
False identification rate (FIR), 5, 23 146–147
False rejection rate (FRR), 5, 18, 22 importable classification schema,
FAR, see False acceptance rate (FAR) 148–149
Fast Fourier Transform, 165 inconsistent classification, 147–148
FastICA method, 28 shoemark processing time
Feature-based approaches restrictions, 149
for feature extraction, 25 shoemark recognition, methods in,
of recognition system, 8 157–158
Feature extraction process, 15–16 accidental characteristics, classification
categorisation, 26 on, 159–160
Feature extractor algorithm, 16 feature-based classification, 158–159
Feature sets, of biometric systems, 14–15 Four-band DFB, 35
FERET database, 37, 43–45 Fourier transform-based matchers, 16
FFT, see Fast Fourier Transform Fourier transforms, 55
Fingerprint authentication, signature-based Frequency Domain Matchers, 16
watermarking technique for, 124 FVC, see Fingerprint Verification Competition
See also State-of-the-art
Fingerprint biometrics, 3–4 G
Fingerprint data protection, watermarking, Gabor filter bank (GFB), 21, 31–33,
117–119, 130–131 67–70, 71
DWT coefficients, modelling of, 132–135 Gabor filter dictionary design, 32
generic watermarking system, 119–123 Gabor wavelets function, 31
optimum watermaking detection, 124–127 Gabor wavelet transformation, 32
outcome of, 135–138 Gait recognition, 4
state-of-the-art, 123–124 Gaussian noise shoeprint data set, 189
statistical data modelling in, 127–128 See also Local image features (LIF)
alpha stable model, 129–130 Gaussian smoothing function, 54
laplacian and GGD models, 128–129 Gelatine lifting, of shoemarks, 153
Fingerprint images See also Forensic science
DWT coefficients of, 132–135 Generalised Gaussian distribution, 119
generic watermarking system for, 119–123 Generic biometric-based system, attacks
verification, state-of-the-art for, 123–124 in, 118
Fingerprint recognition, 3 Generic watermarking system, for fingerprint
Fingerprint Verification Competition, 131 images, 119–123
Fisherface algorithm, 28 Genuine individual type decision, 17
Fisher Linear Discriminant (FLD), 28 Geometrical alignment, of image, 7–8
Forensic science, shoemark collections GGD, see Generalised Gaussian distribution
206 Index
GGD models, in watermark detection, 128–129 L

See also Watermarking technique Laplace of Gaussians, 168, 183
Global Feature Extractors, 15 Laplacian pyramid, 51
GLOH, see Gradient location and orientation Learning neural network (LVQ), 51
histogram Likelihood-ratio test, usage of, 125
Gradient location and orientation See also Watermarking technique
histogram, 182 Linear discriminant analysis (LDA), 21,
28–29, 41
H Local image features (LIF), in shoeprint image
Hamming distance matching algorithm, 51, 71 detection,
Hand geometry recognition, 4 181–182, 191
HAS, see Human auditory system experimental outcomes
Hough transform, 54, 62, 65 degraded data sets, 189–191
Human auditory system, 121 EDH and PSD techniques,
Human visual system, 120 191–192, 198
HVS, see Human visual system PTS method, 198–199
Hybrid approaches image similarity measurement, 188–189
for feature extraction, 25 local photometric descriptors, 186–188
of recognition system, 9 modified Harris–Laplace Detector,
182–186
I Local photometric descriptors, in shoeprint
Image authentication, phasemarkTM image retrieval, 186–188
watermarking technique for, 124 Local singularities, 55
Image enhancement, 15 LoG, see Laplace of Gaussians
Image formatting, 15 LoG-based automatic scale selection, usage,
Image localisation methods, 6–7 183–185
Image registration and alignment, 15 Loo’s model, of watermarking schemes, 93–94
Image segmentation, 15 Luminance masking, 86
Image size normalisation, 8
Imperceptibility analysis, in watermarking, M
135–136 MACE, see Minimum Average Correlation
See also Watermarking technique Energy filter
Impostor type of decision, 17 MACH filter, see Maximum Average
Independent component analysis (ICA), 21, Correlation Height filter
27–28, 39–41 Masking, 8
Integro-differential operator, 53 Matcher accuracy, 18
Iris recognition, 4 Matching process
approach for iris segmentation in face recognition system, 26
edge detector usingwavelets, 55–57 of iris, 71–72
multiscale method, 57–65 Maximally stable extremal region, 182
localisation, 52–55 Maximum average correlation height filter, 172
research, 51–52 Maximum-likelihood (ML) scheme, 119
texture analysis and feature extraction, MHL detector, see Modified Harris–Laplace
67–71 detector
Minimum average correlation energy filter, 172
J Minimum variance synthetic discriminant
Just-noticeable-distortion (JND) model, for function filter, 177
watermark embedding, 86–87 Modified Harris–Laplace detector, 181–186
See also Local image features (LIF)
K Modified phase-only correlation function, 168
Karhunen–Loeve (KL) transformation, 27 Modulation transfer function (MTF), 88
Keystroke recognition, 4 MPOC funtion, see Modified phase-only
Kullback–Leibler (K–L) divergence, role, 132 correlation function
Kurtosis maximisation process, 28 MSER, see Maximally stable extremal region
Index 207
Multi-bit watermarking, role, 121 See also Automatic shoeprint classification

See also Generic watermarking system techniques
Multichannel Gabor filters, 52 Polar transformation, of iris, 63–65
Multiscale edge detection, 56–57 Polynomial DCCF, 172
MVSDF filter, see Minimum variance synthetic Power spectral density, 163, 165
discriminant function filter Power spectral distribution, 191–192, 198
Principal component analysis (PCA), 21,
N 26–27, 38–39
NDFT, see Non-uniform Discrete Fourier PSD, see Power spectral density; Power
Transform spectral distribution
Nearest feature line method, 52 PSNR, see Peak Signal-to-Noise ratio
Nearest neighbour (NN) method, 30 PSR, see Peak-to-Sidelobe ratio
Clustering approach, 30–31 PTS, see Pattern and topological spectra
Non-blind watermarking, usage of, 121 Pupil, 52
See also Generic watermarking system
Non-redundant complex wavelet transform Q
(NRCWT), 83–86 QIM, see Quantisation index modulation
Non-uniform Discrete Fourier Transform, 124 Quantile–Quantile (Q–Q) plot, role, 132
Normalisation process, of iris, 63–65 Quantisation index modulation (or QIM),
96–97, 124
Quincunx downsampling, 35
O
ONV, see Output noise variance
R
Optimal Trade-off Synthetic Discriminant Radial resolution, 64
Function filter (OTSDF filter), 172, Random Sample consensus algorithm, 189
176–177 RANSAC, see Random sample consensus
Output noise variance, 176 algorithm
Overt systems, 13 Raw images, 14
REBEZO system, 161
P See also Shoemark classification systems
Partial shoeprint data set, 189–190 Receiver operating characteristics, 136
See also Local image features (LIF) Recognition tests, 5
Pattern and topological spectra, 191, 198 Region of interest (ROI), 15
Pattern recognition-based matching, 16 Relative entropy, definition, 132
PCE, see Peak-to-correlation energy Reply attack, defined, 117
PDCCF, see Polynomial DCCF See also Fingerprint data protection
Peak Signal-to-Noise ratio, 135 Rescaled shoeprint data set, 190
Peak-to-correlation energy, 174 See also Local image features (LIF)
Peak-to-Sidelobe ratio, 174 ROC, see Receiver operating characteristics
Perfect shoemark scan, 154–155 Rotated shoeprint data set, 190
See also Forensic science
PhasemarkTM watermarking technique, for S
image authentication, 124 SAWGN attack, 107
Phase-only correlation (POC) technique, in Scale-adaptive Harris corner, in synthetic
shoeprint classification, 165 image detection, 183–184
experimental outcomes of, 170–172 Scale invariant feature transform, 181
POC function, 166–167 Scene of Crimes Officer, 150
shoeprint matching algorithm in, 169–170 Scene shoeprint data set, 191
spectral weighting functions in, 168–169 See also Local image features (LIF)
translation and brightness properties of Sensor module, 13
POC function, 168 SHOE-FIT system, 160
See also Automatic shoeprint classification Shoemark classification systems
techniques Alexandre’s system, 161
POC function, defined, 166–167 De Chazal’s system, 163
208 Index
Shoemark classification systems (cont.) Spatial domain watermarking method, for

REBEZO, 161 fingerprint images, 123
SHOE, c 160–161 See also State-of-the-art
SHOE-FIT, 160 Spectral weighting functions, in POC
SICAR, 162 technique, 168–169
SmART, 162–163 See also Automatic shoeprint classification
TREADMARKTM , 162 techniques
Zhang’s system, 163 State-of-the-art, for fingerprint image
Shoemark forensic evidence, limitations, verification, 123–124
143–144 Subclass discriminant analysis (SDA), 21
in applications of, 144–146 Subspace discriminant analysis (SDA), 29–31,
automating shoemark classification, 41–43
146–147
importable classification schema, 148–149 T
inconsistent classification, 147–148 Tan’s method, 73
shoemark processing time restrictions, 149 Technology evaluation, 17
Shoemark recognition, methods, 157–158 Template matching methods, of recognition
accidental characteristics, classification on, system, 7
159–160 Test data set, 17
feature-based classification, 158–159 Time Domain Matchers, 16
Shoemarks collection in, crime scenes, TREADMARKTM system, 162
149–150 See also Shoemark classification systems
casts making of shoemarks, 152–153 Trojan horse programme, 117
computerised system, data entry into, 157
U
electrostatic shoemark lifting, 153–154
Unconstrained OTSDF filter, 172, 177–178
perfect shoemark scan in, 154–155
Unwrapping of iris, 64
photography of shoemarks, 151–152
UOTSDF filter, see Unconstrained OTSDF
procedures of, 150
filter
processing of shoemarks, 155–156
US Department of Defense Counter- drug
recovery of shoemarks from snow, 154
Technology Development
shoemarks, gelatine lifting of, 153
Program, 37
suspect’s shoe, cast making in, 155
transfer/contact prints, 150–151 V
Shoeprint image detection, local image Valley-seeking algorithm of Koontz and
features, 181 Fukunaga, 30
experimental outcomes of, 189–199 Verification tests, 5
image similarity measurement, 188–189 Voice recognition, 4
local photometric descriptors, 186–188
modified Harris-Laplace detector, 182–186 W
Shoeprint matching algorithm, in POC Watch-list task, 6
technique, 169–170 Watermarking schemes, 79
See also Automatic shoeprint classification Chou’s method, 87–92
techniques as communication with side information,
SICAR system, 162 94–98
See also Shoemark classification systems decoding process, 100–102
SIFT, see Scale invariant feature transform distortion compensated spread spectrum
Signature-based watermarking technique, for (DC-SS), 110–111
fingerprint authentication, 124 document-to-watermark-ratios (DWR), 110
See also State-of-the-art encoding process, 99–100
Signature recognition, 4 as game, 105–109
SmART system, 162–163 general data hiding capacities in bits,
See also Shoemark classification systems 108–109
SOCO, see Scene of Crimes Officer Hybrid model, 94
Index 209
just-noticeable-distortion (JND) model for outcome of, 135–138

watermark embedding, 86–87 state-of-the-art, 123–124
Loo’s model, 93–94 statistical data modelling in, 127–128
Mean Squared Error (MSE) distortion alpha stable model, 129–130
measure, 101 laplacian and GGD models, 128–129
parallel Gaussian channels, 102–105 Wavelet maxima components, 68, 70
proposed algorithm, 98–100 Wavelets-based matchers, 16
Quantisation Index Modulation (or QIM), Wavelet Scalar Quantisation, 123
96–97 Wavelet transforms, 55
SAWGN attack, 107 dual tree complex, 80–83
spread transform, 97–98 non-redundant complex, 83–86
theoretical capacity limits of algorithms, Wildes methods, 65–66
100–113 Within-class scatter matrix, 28
total spread transform data hiding WSQ, see Wavelet Scalar Quantisation
capacities in bits, 108–109
watermark-to-noise ratio (WNR)
advantage, 98 Y
Watermarking technique, in fingerprint data YALE Face database, 37
protection, 117–119, 130–131
DWT coefficients, modelling of, 132–135 Z
generic watermarking system, 119–123 Zhang’s system, 163
optimum watermaking detection, 124–127 See also Shoemark classification systems
(continued from page ii)
Topics in Acoustic Echo and Noise Control Electronic Noise and Interfering Signals
Selected Methods for the Cancellation Principles and Applications
of Acoustical Echoes, the Reduction G. Vasilescu
of Background Noise, and ISBN 3-540-40741-3
Speech Processing
E. Hänsler and G. Schmidt (Eds.) DVB
ISBN 3-540-33212-x The Family of International Standards for Digital
Video Broadcasting, 2nd ed.
EM Modeling of Antennas and RF U. Reimers
Components for Wireless Communication ISBN 3-540-43545-X
Systems
F. Gustrau, D. Manteuffel Digital Interactive TV and Metadata
ISBN 3-540-28614-4 Future Broadcast Multimedia
A. Lugmayr, S. Niiranen, and S. Kalli
Interactive Video Methods and Applications ISBN 3-387-20843-7
R. I Hammoud (Ed.) Adaptive Antenna Arrays
ISBN 3-540-33214-6 Trends and Applications
S. Chandran (Ed.)
ContinuousTime Signals
ISBN 3-540-20199-8
Y. Shmaliy
ISBN 1-4020-4817-3 Digital Signal Processing with Field
Programmable Gate Arrays
Voice and Speech Quality Perception U. Meyer-Baese
Assessment and Evaluation ISBN 3-540-21119-5
U. Jekosch
ISBN 3-540-24095-0 Neuro-Fuzzy and Fuzzy Neural Applications
in Telecommunications
Advanced ManMachine Interaction P. Stavroulakis (Ed.)
Fundamentals and Implementation ISBN 3-540-40759-6
K.-F. Kraiss
ISBN 3-540-30618-8 SDMA for Multipath Wireless Channels
Limiting Characteristics and Stochastic Models
Orthogonal Frequency Division Multiplexing I.P. Kovalyov
for Wireless Communications ISBN 3-540-40225-X
Y. (Geoffrey) Li and G.L. Stüber (Eds.)
Digital Television
ISBN 0-387-29095-8
A Practical Guide for Engineers
Circuits and Systems Based on Delta W. Fischer
Modulation ISBN 3-540-01155-2
Linear, Nonlinear and Mixed Speech Enhancement
Mode Processing J. Benesty (Ed.)
D.G. Zrilic ISBN 3-540-24039-X
ISBN 3-540-23751-8
Multimedia Communication Technology
Functional Structures in Networks Representation, Transmission and Identification of
AMLn—A Language for Model Driven Multimedia Signals
Development of Telecom Systems J.R. Ohm
T. Muth ISBN 3-540-01249-4
ISBN 3-540-22545-5
Information Measures
RadioWave Propagation for Information and its Description in Science and
Telecommunication Applications Engineering
H. Sizun C. Arndt
ISBN 3-540-40758-8 ISBN 3-540-40855-X
Processing of SAR Data Structured Cable Systems
Fundamentals, Signal Processing, Interferometry A.B. Semenov, S.K. Strizhakov,
A. Hein and I.R. Suncheley
ISBN 3-540-05043-4 ISBN 3-540-43000-8
Chaos-Based Digital Communication Systems UMTS

Operating Principles, Analysis Methods, and The Physical Layer of the Universal Mobile
Performance Evalutation Telecommunications System
F.C.M. Lau and C.K. Tse A. Springer and R. Weigel
ISBN 3-540-00602-8 ISBN 3-540-42162-9
Adaptive Signal Processing Advanced Theory of Signal Detection

Application to Real-World Problems Weak Signal Detection in Generalized
J. Benesty and Y. Huang (Eds.) Obeservations
ISBN 3-540-00051-8 I. Song, J. Bae, and S.Y. Kim
ISBN 3-540-43064-4
Multimedia Information Retrieval and
Management Technological Wireless Internet Access over GSM and UMTS
Fundamentals and Applications M. Taferner and E. Bonek
D. Feng, W.C. Siu, and H.J. Zhang (Eds.) ISBN 3-540-42551-9
ISBN 3-540-00244-8

Imaging For Forensics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Imaging For Forensics PDF

Uploaded by

Copyright:

Available Formats

Springer Series on

S IGNALS AND C OMMUNICATION T ECHNOLOGY

(continued after index)

Imaging for Forensics

Library of Congress Control Number: 2009927770

c Springer Science+Business Media, LLC 2009

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

of users, the field of digital watermarking has witnessed an extremely fast-growing

Chapter 2: “Data Representation and Analysis”

Chapter 3: “Improving Face Recognition Using Directional Faces”

Chapter 4: “Recent Advances in Iris Recognition: A Multiscale Approach”

Chapter 5: “Spread Transform Watermarking Using Complex Wavelets”

Chapter 6: “Protection of Fingerprint Data Using Watermarking”

Chapter 7: “Shoemark Recognition for Forensic Science: An Emerging

Chapter 8: “Techniques for Automatic Shoeprint Classification”

Chapter 9: “Automatic Shoeprint Image Retrieval Using Local Features”

Belfast, United Kingdom, 2008 Ahmed Bouridane

1 Introduction and Preliminaries on Biometrics

2 Data Representation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Improving Face Recognition Using Directional Faces . . . . . . . . . . . . . . . . 21

3.2 Face Recognition Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Recent Advances in Iris Recognition: A Multiscale Approach . . . . . . . . 49

5 Spread Transform Watermarking Using Complex Wavelets . . . . . . . . . . 79

6 Protection of Fingerprint Data Using Watermarking . . . . . . . . . . . . . . . . 117

7 Shoemark Recognition for Forensic Science: An Emerging

7.1.4 Importable Classification Schema . . . . . . . . . . . . . . . . . . . . . . . 148

8 Techniques for Automatic Shoeprint Classification . . . . . . . . . . . . . . . . . . 165

8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9 Automatic Shoeprint Image Retrieval Using Local Features . . . . . . . . . . 181

1.2 Definition of Biometrics

A. Bouridane, Imaging for Forensics and Security, Signals and Communication 1

1.2.1 Biometric Characteristics

Furthermore, the acquired raw data should be amenable to processing in order to

• Performance: The recognition accuracy and the resources required to achieve

1.2.2 Biometric Modalities

In order to establish the identity of an individual, a variety of physical and

fication applications. This technology is still at an early stage of research and

It is commonly known that a typical biometric recognition scenario, as all biometric

1.3.1 Verification: Am I Who I Claim to Be?

• Clients: people trying to gain access using their own identity.

1.3.2 Recognition: Who Am I?

1.3.3 The Watch-List: Are You Looking for Me?

1.4 Steps of a Typical Biometric Recognition Application

In general, biometric recognition applications, regardless of the specific method

1.4.1 Biometric Data Localisation

• Feature-invariant approaches: These algorithms aim to find the structural features

Fig. 1.2 Steps of a typical Image

• Template matching methods: Several standard patterns of a biometric image are

1.4.2 Normalisation and Pre-processing

1.4.3 Feature Extraction

• Feature-based approaches: They are based on the extraction of the properties of

loss of privacy and performance limitations. To address these issues, appropriate

A. Bouridane, Imaging for Forensics and Security, Signals and Communication 11

Fingerprint, Biometric Database

Fig. 2.1 Block diagram of a biometric identification system

2.2 Data Acquisition

2.2.1 Sensor Module

2.2.2 Data Storage

2.2.2.1 Raw Images

2.2.2.2 Feature Sets

Fig. 2.2 A visual spectrum

2.3 Feature Extraction

gmn (x, y) = a −m G(x , y ) (3.12)

x = a −m (x cos θ + y sin θ ) (3.13)

y = a −m (−x sin θ + y cos θ ) (3.14)