You are on page 1of 70

DIMENSIONALITY REDUCTION TECHNIQUES FOR

FACE RECOGNITION
A MAJOR PROJECT REPORT
Submitted by
ADITYA RITESH B090609EC
ATUL KUMAR B090739EC
BHARAT RAJ MEENA B090780EC
GAURAV SINGH B090559EC
GOURAB RAY B090402EC
PRABHAT PRAKASH VERMA B090829EC
In partial fulllment for the award of the Degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Under the guidance of
DR. PRAVEEN SANKARAN
DEPARTMENT OF
ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY CALICUT
NIT CAMPUS PO, CALICUT
KERALA, INDIA 673601.
APRIL 2013
ACKNOWLEDGEMENT
At the outset, we are indeed very grateful to Dr. G. Abhilash, Project Coordina-
tor, for allowing us to undertake the project and for his useful guidance. We would
also like to extend our heartfelt gratitude to Dr. Praveen Sankaran, our Project
Guide, for guiding us throughout the project, and with his deep knowledge in the
eld, providing us a correct orientation to our ideas which led to the successful ac-
complishment of the project. Next, we would like to thank Mr. R. Suresh , Mr.
Ameer P. M and Mr V Sakthivel for evaluating our reports and giving us necessary
feedbacks. Finally, we also thank the ECE Department, NIT Calicut, for providing
us the necessary facilities and resources. For all the mistakes that remain, the blame
is entirely ours.
ii
DECLARATION
We hereby declare that this submission is my our work and that, to
the best of my knowledge and belief, it contains no material previously
published or written by another person nor material which has been
accepted for the award of any other degree or diploma of the university
or other institute of higher learning, except where due acknowledge-
ment has been made in the text.
Date:
Name Roll no. Signature
ADITYA RITESH B090609EC
ATUL KUMAR B090739EC
BHARAT RAJ MEENA B090780EC
GAURAV SINGH B090559EC
GOURAB RAY B090402EC
PRABHAT PRAKASH VERMA B090829EC
iii
CERTIFICATE
This is to certify that the MAJOR PROJECT entitled:
Dimensionality Reduction Techniques For Face Recogni-
tion submitted by Aditya Ritesh B090609EC , Atul Kumar
B090739EC , Bharat Raj Meena B090780EC , Gaurav Singh
B090559EC , Gourab Ray B090402EC , Prabhat Prakash
Verma B090829EC to the National Institute of Technology Calicut
towards partial fulllment of the requirements for the award of the
Degree of BACHELOR of Technology in Electronics and Commu-
nication Engineering is a bona de record of the work carried out
by them under my supervision and guidance.
Signed by MAJOR PROJECT Supervisor
Dr.Praveen Sankaran
Assistant Professor ECED NIT Calicut
Place:
Date :
Signature of Head of the Department
(Oce seal)
iv
Copyright, 2013, by Aditya Ritesh B090609EC
Atul Kumar B090739EC
Bharat Raj Meena B090780EC
Gaurav Singh B090559EC
Gourab Ray B090402EC
Prabhat Prakash Verma B090829EC, All Rights Reserved
v
TABLE OF CONTENTS
Page
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
CHAPTERS
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 A demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Linear Dimensionality ReductionTechniques . . . . . . . . . . . . . . . . 8
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Statistics involved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Mean image from the face database . . . . . . . . . . . . . . . 10
3.2.2 Covariance matrix from the face database . . . . . . . . . . . 10
3.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Mathematical Analysis of PCA . . . . . . . . . . . . . . . . . 12
3.3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . 16
3.3.3 Calculation of eigenvalues and eigenvectors . . . . . . . . . . . 17
3.3.4 Formation of a Feature Vector . . . . . . . . . . . . . . . . . . 18
3.3.5 Derivation of a new data set . . . . . . . . . . . . . . . . . . . 18
3.3.6 Eigen Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
vi
vii
3.4 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . 21
3.4.1 Mathematical Analysis of LDA . . . . . . . . . . . . . . . . . 22
3.5 Independent Component Analysis (ICA) . . . . . . . . . . . . . . . . 24
3.5.1 Mathematical analysis of ICA . . . . . . . . . . . . . . . . . . 24
3.5.2 ICA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Non-Linear Dimensionality Reduction Techniques . . . . . . . . . . . . . 27
4.1 Kernel Methods for face recognition: . . . . . . . . . . . . . . . . . . 27
4.1.1 Kernel Principal Component Analysis: . . . . . . . . . . . . . 27
4.1.1.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1.2 Dimensionality Reduction and Feature Extraction: . 28
4.1.2 Kernel Fisher Analysis: . . . . . . . . . . . . . . . . . . . . . . 28
4.1.2.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . 29
5 Face recognition using Gabor Wavelets . . . . . . . . . . . . . . . . . . . 31
5.1 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Gabor Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.1 Extracting features with Gabor lters . . . . . . . . . . . . . . 32
6 Distance Measures , ROC and ORL database . . . . . . . . . . . . . . . 33
6.1 Distance measures for Face Recognition . . . . . . . . . . . . . . . . . 33
6.1.1 Euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1.2 Mahalanobis distance . . . . . . . . . . . . . . . . . . . . . . . 33
6.1.3 Cosine Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.4 City Block distance . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.5 Chessboard Distance . . . . . . . . . . . . . . . . . . . . . . . 34
6.2 ROC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3 ORL Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7 Real Time Face Recognition System using OpenCV . . . . . . . . . . . . 38
7.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.2 Preprocessing Facial Images . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
viii
8 Method of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.1 PCA and LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.2 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.3 KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.4 Gabor PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.5 Gabor LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9 Results and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.2 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9.3 ICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.4 Kernel PCA and Kernel LDA . . . . . . . . . . . . . . . . . . . . . . 46
9.5 Gabor Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.6 Gabor PCA and Gabor LDA . . . . . . . . . . . . . . . . . . . . . . . 47
9.7 Real time Face Recognition System . . . . . . . . . . . . . . . . . . . 47
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
ABSTRACT
DIMENSIONALITY REDUCTION TECHNIQUES FOR
FACE RECOGNITION
Aditya Ritesh B090609EC
Atul Kumar B090739EC
Bharat Raj Meena B090780EC
Gaurav Singh B090559EC
Gourab Ray B090402EC
Prabhat Prakash Verma B090829EC
National Institute of Technology Calicut 2013
Project Guide: Dr. Praveen Sankaran
Data dimensionality algorithms try to project input high dimensional data to a lower
dimensional space providing better data classication ability. In this work, we aim
to study various data dimensionality reduction algorithms with specic application
to face recognition techniques. Our main aim would be to treat the problem from
a purely mathematical or computational point of view. The input of a face recog-
nition system is always an image or video stream, which is then projected from the
original vector space to a carefully chosen subspace. The next step, feature extrac-
tion, involves obtaining relevant facial features from the data. We implement linear
(like PCA, LDA) and non linear and wavelet based algorithms and do a comparative
study under dierent circumstances simulated with standard face databases. The
output is an identication or verication of the subject or subjects that appear in
the image or video. In an identication task, the system would report an identity
from a database.
ix
LIST OF FIGURES
Page
2.1 curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 1 dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 2 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 3 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Linear subspace technique general algorithm . . . . . . . . . . . . . . 9
3.2 Matrix representation of N images present in the database . . . . . . 10
3.3 PCA classication of given data (a) Worse classication of data (b)
The best classication of data . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Principal component analysis algorithm for face recognition . . . . . . 19
3.5 Fishers linear discriminant algorithm for face recognition . . . . . . 21
3.6 Flow chart of linear discriminant analysis algorithm . . . . . . . . . 22
3.7 ICA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.8 Explanation of KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.9 KDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.1 Typical Receiver Operating Characteristic Curve . . . . . . . . . . . 36
6.2 Sample images of a single person from ORL database . . . . . . . . . 37
8.1 Schematic of the Face Recognizer . . . . . . . . . . . . . . . . . . . . 41
8.2 Blind Source Separation model . . . . . . . . . . . . . . . . . . . . . 41
8.3 Finding statistically independent basis images . . . . . . . . . . . . . 42
9.1 Mean Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.2 Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.3 Identifying similar faces . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.4 Performance of PCA based Face Recognition with ORL Face Database 45
9.5 ROC for PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.6 Recognition rate vs number of eigen faces for PCA . . . . . . . . . . 47
9.7 Recognition rate vs number of training images for PCA . . . . . . . . 48
9.8 Fisher faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.9 Performance Of LDA Based Face Recognition With Orl Face Database 48
9.10 ROC for LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.11 Recognition rate vs number of training images for PCA . . . . . . . . 49
9.12 Source images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
x
9.13 Aligned faces-features extracted . . . . . . . . . . . . . . . . . . . . . 50
9.14 Mixed images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.15 Independent Components (Estimated Sources) . . . . . . . . . . . . . 51
9.16 ROC for KPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.17 ROC for KLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.18 Magnitude response with no downsampling . . . . . . . . . . . . . . . 52
9.19 Magnitude response with downsampling factor 64 . . . . . . . . . . . 53
9.20 ROC for Gabor PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.21 ROC for Gabor LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.22 Opencv implementation . . . . . . . . . . . . . . . . . . . . . . . . . 54
xi
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
During the past decades, advances in data collection and storage capabilities have
led to an information overload in most sciences. Researchers working in domains as
diverse as engineering, astronomy, biology, remote sensing, economics, and consumer
transactions, face larger and larger observations and simulations on a daily basis.
Such data sets, in contrast with smaller, more traditional data sets that have been
studied extensively in the past, present new challenges in data analysis. Traditional
statistical methods break down partly because of the increase in the number of ob-
servations, but mostly because of the increase in the number of variables associated
with each observation. This is also seen in the case with face recognition, as the
resolution of images have increased drastically it has become dicult to for pattern
recognition in these images.
Face recognition is one of the most ecient biometric techniques used for iden-
tifying humans from a database. Biometrics use physical and behavioral properties
of the human. Face plays a primary role in identifying the person and also face is
the more easily rememberable part of the human.This skill is very robust because
we identify a person even after so many years with dierent aging, dierent light
conditions and viewing conditions. The biometric techniques like nger print, iris
and signature need some cooperation from the person for identication. The major
advantage with face recognition is, it does not need any physical cooperation of the
human at the time of recognition.
1.2 MOTIVATION
Denitely, the wide applicability of face recognition is the main motivating factor.
Face recognition plays an important role in various applications (e.g. computer vi-
sion, image processing and pattern recognition). The ideal computer vision models
work more or less like humans vision. In computer vision related applications, face
recognition is useful in taking the decision based on the information present in the
1
images or video. There are many computer vision based applications on image anal-
ysis like recognizing and tracking humans in some public and private areas. Even in
future driver assistance systems, driver face observation will play an important role.
In many security related applications like banking, border check, etc, the person
identication and verication is one of the main issues . In these applications per-
sons must be recognized or identied. Also, face recognition could be employed in
intelligent PCs, for instance to automatically bring up a users individual desktop
environment. Even more advanced, when employed in a gesture and facial expres-
sion recognition environment, it could turn personal computers into personalized
computers able to interact with the user on a level higher than just mouse clicks and
key strokes. Such recognition would make possible intelligent man-machine interfaces
and, in the future, intelligent interaction with robots.
The face recognition system is very useful in criminal identication. In this appli-
cation, the images of a criminal can be stored in the face recognition system database.
In recognition algorithms based on matching methods, image acquisition is one of the
important tasks. The image of a person must be taken directly from a digital cam-
era or from the video sequence such that it contains maximum possible information
about that person. The images must be taken quickly with a small resolution or size
in order to speed the algorithms. If we take the high resolution images it takes much
time to recognize the persons. Then the matching algorithms compare the acquired
image with the images in the database to identify the criminal. In real time face
recognition, the system must analyze the images and recognize the person very fast.
The face recognition system only recognizes the person stored in the database.
As such, face recognition is an extensive and ever evolving research area, with
wide range of algorithms for dimensionality reduction. All these motivated us to
carry out this project.
1.3 PROBLEM STATEMENT
While developing a face recognition system, the performance depends upon the given
input data or image [6]. However, there are general problems faced in real time
face recognition; like illumination, head pose, facial expressions etc. In dierent
environmental conditions like lighting conditions, background variation images vary;
we cannot extract the correct features and so the recognition rate is less. Similarly,
in dierent face expressions like smiling, tearing etc., and in dierent poses, it is
2
dicult to recognize the person.
As such, it requires an ecient method to represent the face image. Deriving
a discriminative and compact representation of a face pattern is of paramount im-
portance for the success of any face recognition approach. Here is where various
dimensionality reduction methods comes to play. Overall, the problem is of choosing
the best method that gives ecient image representation and in turn good recogni-
tion.
1.4 OBJECTIVES
Summing up, the overall objectives of the project are:
Study of dierent linear techniques PCA and LDA , non linear methods
(KPCA, KLDA) and Wavelet based methods(Gabor)
Implementation of PCA,LDA,KPCA,KLDA,Gabor based face recognition Sys-
tem in Matlab.
Comparison of verication rate of dierent algorithms by varing parameters
like number of eigen faces,number of sher faces, number of images in training
set and the distance metric used.
Development of wavelets based linear techniques(Gabor) which have very high
verication rates and their comparison with the other techniques.
Computation of Receiver operating characteristics for dierent algorithms.
Implementation of Real-Time Face Recognition System (PCA based) using
OpenCV functions.
1.5 LITERATURE REVIEW
Worldwide progressive eorts are being made for the ecient storage dimension and
retrieval of images using dierent techniques. Several techniques exist to tackle the
curse of dimensionality out of which some are linear methods and others are non-
linear. PCA, LDA, LPP are some popular linear methods and non-linear methods
include ISOMAP and Eigenmaps. This technique makes it possible to use the facial
3
images of a person to authenticate him into a secure system, for criminal identi-
cation, for passport verication. Face recognition approaches for still images can
be broadly categorized into holistic methods and feature based methods. Holistic
methods use the whole face region as the raw input to a recognition system. One
of the most widely used representations of the face region is eigen pictures which
are based on principal component analysis. In feature-based (structural) matching
methods local features such as the eyes, nose, and mouth are rst extracted and their
locations and local statistics (geometric and/or appearance) are fed into a structural
classier. There is third kind of methods known as Hybrid methods . Just as the
human perception system uses both local features and the whole face region to rec-
ognize a face, a machine recognition system should use both. One can argue that
these methods could potentially oer the best of the two types of methods.
The usage of Gabor lters for face recognition is presented in-depth by Vitomir
Struc, Rok Gajek and Nikola Paveisic in their paper, Principal Gabor Filters for Face
Recognition.
1.6 OUTLINE
Chapter 2: Why are we going for dimensionality reduction?
Chapter 3: Linear techniques for dimensionality reduction.
Chapter 4: Non-linear techniques for dimensionality reduction.
Chapter 5: Face recognition using Gabor Wavelets.
Chapter 6: Distance Measures , ROC and ORL database.
Chapter 7: Real Time Face Recognition System using OpenCV.
Chapter 8:Method of Implementation.
Chapter 9: Results and Observations.
Chapter 10: Conclusion.
4
CHAPTER 2
DIMENSIONALITY REDUCTION
2.1 CURSE OF DIMENSIONALITY
Putting it simply, dimensionality reduction is the process of reducing the number of
random variables under consideration. There is an exponential growth with dimen-
sionality to accurately estimate a function. This is called the curse of dimensionality.
In practice, the curse of dimensionality means that for a given sample size, there is a
maximum number of features above which the performance of a classier will degrade
rather than improve. In most cases, the information that was lost by discarding some
features is compensated by a more accurate mapping in lower-dimensional space.
2.2 A DEMONSTRATION
Consider three types of objects shown in gure 2.2 have to be classied based on the
value of a single feature:
A simple procedure would be to
a) Divide the feature space into uniform bins
b) Compute the ratio of examples for each class at each bin and,
c) For a new example, nd its bin and choose the predominant class in that bin
We decide to start with one feature and divide the real line into 3 bins. Notice
that there exists a lot of overlap between classes , i.e. to improve discrimination, we
Figure 2.1: curse of dimensionality
5
Figure 2.2: 1 dimension
Figure 2.3: 2 dimensions
decide to incorporate a second feature.
Moving to two dimensions increases the number of bins from 3 to 3
2
=9. So, which
should we maintain constant?
a)The density of examples per bin? This increases the number of examples from
9 to 27
b)The total number of examples? This results in a 2D scatter plot that is very
sparse.
Moving to three features . . .
a)The number of bins grows to 3
3
=27
b)To maintain the initial density of examples, the number of required examples
grows to 81
c)For the same number of examples the 3D scatter plot is almost empty
So, there is an exponential growth with dimensionality in the number of examples
6
Figure 2.4: 3 dimensions
required to accurately estimate a function.
2.3 DIMENSIONALITY REDUCTION
Two approaches to perform dimensionality reduction from N to M dmensions(M is
less than N)
a) Feature selection: choosing a subset of all the features.
b) Feature extraction: creating new features by combining existing ones.
In either case, the goal is to nd a low-dimensional representation of the data
that preserves (most of) the information or structure in the data.
In the following chapters we would look into various linear and non linear tech-
niques.
7
CHAPTER 3
LINEAR DIMENSIONALITY REDUCTIONTECHNIQUES
3.1 INTRODUCTION
Several techniques exist to tackle the curse of dimensionality out of which some
are linear methods and others are nonlinear. For reasons of computational and
conceptual simplicity, the representation of low-dimension is often sought as a linear
transformation of the original data. That is, each component of the representation is
a linear combination of the original variables. Such a representation seems to capture
the essential structure of the data in many applications, including feature extraction
and signal separation.
Thus, to reduce the dimensionality and reducing redundancy without sacricing
accuracy, linear subspace techniques are useful. It takes only the important features
from the data set. If we select the inadequate features the accuracy will be reduced.
So we need to acquire full knowledge about similarities and dierences in data. In our
project, linear techniques are implemented and some statistical methods like mean
and covariance are used to reduce the redundancy. In these methods, the average
image of all the persons (also of each person) in the database is calculated. Each
image has been translated to average face by subtracting average image from the
each face image.
Some well-known linear transformation methods include:
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Independent Component Analysis (ICA)
3.2 STATISTICS INVOLVED
Statistical measurements like mean and covariance are used. Statistics are based on
analyzing the high amount of data in terms of relationship between the variables
in the data set. Each technique contains train database and test database. This
train database contains dierent images of the same person. Test database contains
one image for one person. Each technique calculates the basis vector by using some
8
Figure 3.1: Linear subspace technique general algorithm
statistical properties. Take an image of size LxM , the pixel array can be represented
as a point or vector in an LM dimensional space or image space. If we apply face
recognition techniques in this image space, it is computationally expensive because
of matching algorithms. Further the number of parameters or features used which
are exponentially increasing (some time more than the number of images).
The basis vector is of high dimensional, so we need to reduce the dimensionality.
After forming the basis vector, the feature vector is calculated by projecting the train
database images into the basis vector. Then the matching is done by using distance
measures. In our project, two data sets are taken. The rst set is for training and
the second set is for test. The training set contains many folders depending upon the
selection of the data base. Each folder contains dierent pose varying images of the
same person. Test set contains one image for one person. For this test images are in
dierent poses from train set images. The basic working of the linear technique for
face recognition is shown in gure 3.1
Many faces contain similar features because generally the face is smooth and
regular texture. So many faces frontal view appearances are similar to eyes, nose
and mouth etc. So we classify the similar and dierent features in all the features.
First we take an image in the training set or from the image space. Convert this
image into column vector. Doing so, this convert all images in the train set to column
vectors. For example our training set contains N images and size of each image is
LxM.
We convert each image into column vector of size LM x1. After converting all
9
Figure 3.2: Matrix representation of N images present in the database
images into column vectors we append all columns. It forms a matrix called data
matrix X with size LMxN as shown in gure 3.2 . LM xN means a very high
dimensionality. So we have to reduce this dimensionality by using linear subspace
techniques. After forming data matrix, we have to calculate most used statistical
measures (mean and covariance).
3.2.1 Mean image from the face database
The mean of the random vector x is calculated from:
m = E[x] (3.1)
E[x] are the expected values of the argument x
Where x is the random sample corresponding column vector in the data matrix.
Here columns of the data matrix x(i) use the expression as shown:
m =
1
N
N

i=1
x
i
(3.2)
Where N is the number of images in the training set. m represents the mean
vector from the data matrix. It also represents the mean face (when it is converted
from column vector to matrix) in the training set
3.2.2 Covariance matrix from the face database
Covariance measures the linear relationship between the two variables. So the di-
mension of the covariance is two. If we have a data set with more than 2 dimensions,
10
then we have so many dierent covariance values. If we have n dimensional data set
then we have dierent covariance values. For example we have 3 dimensional data
set with dimensions x, y, z. We calculate the covariance of x, y and the covariance of
y, z and the covariance of z, x. The covariance matrix C is a matrix containing each
entry as a covariance value. High covariance value indicates the high redundancy
and low value indicates low redundancy.
The covariance matrix C of the random vector x is calculated using the equation
3.3 or 3.4
C = E[(x m)(x m)
T
] (3.3)
C =
1
N
N

i=1
(x m)(x m)
T
(3.4)
C = AA
T
(3.5)
Where A is a matrix formed using the technique given in 3.2.1 and is of order
LM x LM.If we calculate the covariance matrix by using given equation 3.5, it takes
high memory because of the dimensions of C . The size of A is LM x N . The size
of C is LM x LM which is very large. It is not practical to calculate C as shown in
equation 3.5. Let us consider the matrix L instead of C
L = A
T
A (3.6)
The dimension of L is N xN which is much smaller than the dimensions of C.
3.3 PRINCIPAL COMPONENT ANALYSIS
PCA is a primary technique and is regarded as the theoretical foundation of many
dimension reduction techniques. It seeks a linear projection that best ts a data set
in the least-square sense and has been widely used for feature extraction in pattern
classication due to its computational and analytical simplicity [13]. It is also called
as Karhunen-Loeve transform (KLT) [12, 21]. It uses the covariance or correlation
matrix of given data, the basics of which were mentioned earlier in the statistics
involved.
11
3.3.1 Mathematical Analysis of PCA
let x be a m dimensional random vector and E [x] = 0 but in practical data it may
not be zero.In that case mean is subtracted from each of the term and a zero mean
random vector is obtained
q: m dimensional unit vector
Consider A be the projection of x on q
A = x
T
.q = q
T
.x
subject to
||q|| = 1
in the projected space E[A] = q
T
E[x] = 0
and

2
= E[A
2
] = E[
_
q
T
.x
_
.
_
q.x
T
_
]
= q
T
E[x.x
T
]q
E[x.x
T
] is the correlation matrix and denoted by R
E[A
2
] = q
T
.R.q
Also R is symmetric i.e R
T
= R
now E[A
2
] should be minimized with an proper choice or q
let us dene a function
(q) = q
T
Rq =
2
(3.7)
but at the point of extremal(minimal) we have even if we slightly very q the above
function would not change much
(q +q) = (q) (3.8)
(q +q) = (q +q)
T
R(q +q)
12
= q
T
Rq + 2 (q)
T
Rq + (q)
T
Rq
(q)
T
Rq is not of much signicant since qis very small we can write
(q +q) = q
T
Rq + 2 (q)
T
Rq (3.9)
using equation 3.7 and 3.9 it can be inferred
(q)
T
Rq = 0 (3.10)
we have changed q to q +q but still q +q will be a unit vector
||q +q|| = 1
(|q +q)
T
(|q +q) = 1
q
T
q + (q)
T
q +q
T
(q) + (q
T
)(q) = 1
again neglecting (q
T
)(q) and using the property that q
T
q = 1and (q)
T
q =
q
T
(q)
(q)
T
q = 0 (3.11)
q and q are orthogonal so only change in direction of q is possible
from equation 3.10 and 3.11 we infer that there is a way to combine both the terms
so a scale term is introduced in equation 3.11 so as to match the dimensionality
and both the equations are combined as
(q)
T
Rq (q)
T
q = 0
(q) [Rq q] = 0
since q is not equal to zero we have
Rq = q (3.12)
Given a matrix R some combinations of q and will satisfy the relation 7
13
:eigenvalue of matrix R there will be m solutions i.e
1
,
2
, .........
m
and
q
1
, q
2,.........
q
m
are the corresponding eigen vectors
There are m solutions to eq 3.12
Rq
j
= q
j
j = 1, 2, 3, ...m (3.13)
place the eigenvalues in decreasing order such that
1

2

3

.........................
m
where
1
=
max
now we dene a vector Q such that
Q = [q
1
, q
2
...........q
m
]
in a compact form equation 3.13 can be written as
RQ = Q (3.14)
where = diag[
1
,
2
,
3
.................
m
]
the matrix Q is an orthogonal matrix satisfying
q
i
.q
j
=
_
_
_
1 if i = j
0 otherwise
therefore we can write
Q
T
Q = I
QQ
1
= I
which gives Q
T
= Q
1
and using equation 3.14 we can write
Q
T
RQ = (3.15)
in the expanded form
q
T
j
Rq
k
=
_
_
_

j
for k = j
0 otherwise
from equation 3.7 we have
14
(q
j
) =
j
for j = 1, 2, 3...m (3.16)
pre multiply Q and post multiply Q
T
to equation 3.10 we get
R = [
1
q
1
q
T
1

2
q
2
q
T
2
....
m
q
m
q
T
m
] (3.17)
eliminating low values of
j
means eliminating terms with low variance
there are m possible solutions for unit vector q
let a
j
be the projection of x on q
j
i.e the projection along principle directions
a
j
= (q
T
j
x) = (x
T
q
j
) for j = 1, 2, ...m (3.18)
this a

j
s are called the principle components
equation 3.18 represents the analysis i.e derivation of principle components from
the input vector x
from these a

j
s we should be able to recover x
we dene a projection vector such that
a = [a
1
, a
2
, a
3
, ........a
m
]
T
a = [x
T
q
1
, x
T
q
2
, ......x
T
q
m
]
T
a = Q
T
x
pre-multiply both sides Q we get
Qa = x
x = [a
1
q
1
a
2
q
2
... a
m
q
m
] (3.19)
equation 3.19 represents the synthesis equation
q
j
acts as basis vector for synthesis .Here we are forming basis vectors out of the
random process.
For reducing the dimensionality of the input vector we need top take l (l < m)
largest eigenvalues of the correlation matrix and the reconstructed signal x is given
by:
x = [a
1
q
1
a
2
q
2
... a
l
q
l
]
15
so at the encoder side we have
_

_
x
1
x
2
x
3
.
.
x
m
_

_
q
T
1
q
T
2
q
T
3
.
.
q
T
l
_

_
a
1
a
2
a
3
.
.
a
l
_

_
(3.20)
it can be seen that the dimension of input vector is reduced from R
m
to R
l
total variance of m components of is given by x is given by
m

i=1

2
i
=
m

i=1

i
total variance of l components of is given by x is given by
l

i=1

2
i
=
l

i=1

i
total variance of m-l components of is given by x is given by
m

i=l+1

2
i
=
m

i=l+1

i
3.3.2 Dimensionality Reduction
For reducing the dimensionality of the input vector we need top take l (l < m) largest
eigenvalues of the correlation matrix and the reconstructed signal x is given by:
x = [a
1
q
1
a
2
q
2
... a
l
q
l
]
so at the encoder side we have
_

_
x
1
x
2
x
3
.
.
x
m
_

_
q
T
1
q
T
2
q
T
3
.
.
q
T
l
_

_
a
1
a
2
a
3
.
.
a
l
_

_
(3.21)
16
Figure 3.3: PCA classication of given data (a) Worse classication of data (b) The
best classication of data
it can be seen that the dimension of input vector is reduced from R
m
to R
l
total variance of m components of is given by x is given by
m

i=1

2
i
=
m

i=1

i
total variance of l components of is given by x is given by
l

i=1

2
i
=
l

i=1

i
total variance of m-l components of is given by x is given by
m

i=l+1

2
i
=
m

i=l+1

i
3.3.3 Calculation of eigenvalues and eigenvectors
Eigenvectors and eigenvalues give the important information regarding our data.
Eigenvectors give the uncorrelated variables. These uncorrelated variables are called
principal components. The rst principal component describes the high amount of
variation [4].
The eigenvalues of the covariance matrix describes the variance of the correspond-
ing principal component i.e. it describes that the rst principal component exhibits
the highest amount of variation and the second principal component exhibits the
second highest amount of variation and so on. Figure 3.3 shows the worse selection
of principal components and best selection of principal components.
17
3.3.4 Formation of a Feature Vector
After calculating eigenvectors of the covariance matrix, the dimensionality reduction
takes place. Here we do not consider all the eigenvectors as principal components.
We arrange all eigenvalues in the descending order and we take rst few highest
eigenvalues and corresponding eigenvectors. These eigenvectors e
1
, e
1
and so on are
the principal components as shown in equation 3.22
W
pca
= [e
1
e
2
....e
n
] (3.22)
We neglect or ignore the remaining less signicant eigenvalues and corresponding
eigen- vectors. These neglected eigenvalues represent a very small information loss
[16]. The principal component axis pass through the mean values. With these
principal components (eigenvectors) we form a matrix called feature vector (also
called eigen space)[21, 4].
3.3.5 Derivation of a new data set
To derive a new data set with reduced dimensionality, we take the transpose of the
feature vector matrix (now each row of the matrix represents the eigenvector) and
we project this matrix on to the original data set with subtracted mean. Then the
data set is formed with new representation called face space. Based on the covariance
matrix we try to minimize the mean square error between the feature vectors and
their projections [2] . This transformation has few important properties. The rst
property is that the mean of the new data set is zero and the covariance matrix is a
diagonal matrix containing elements equal to eigenvalues of the covariance matrix of
the original data set. The second property is that the covariance matrix of the original
data set is real and symmetric (image contains only real values) so eigenvalues of the
covariance matrix are real. These corresponding eigenvectors form a orthonormal
basis. The orthonormal basis is shown in equation 3.22
W
1
= W
T
(3.23)
Using this property the original data set is reconstructed using the new data
[4]. Here also we are not using all eigenvectors. We take only highest eigenvalues
corresponding eigenvectors.
18
Figure 3.4: Principal component analysis algorithm for face recognition
3.3.6 Eigen Faces
Eigenfaces are a set of eigenvectors used in the computer vision problem of human face
recognition. Eigenfaces assume ghostly appearance. They refer to an appearance-
based approach to face recognition that seeks to capture the variation in a collection
of face images and use this information to encode and compare images of individual
faces in a holistic manner. Specically, the eigenfaces are the principal components
of a distribution of faces, or equivalently, the eigenvectors of the covari-ance matrix
of the set of face images, where an image withMbyMconsidered as a point in M
2
dimensional Space.
Eigenfaces are mostly used to:
1) Extract the relevant facial information, which may or may not be directly
related to human intuition of face features such as the eyes, nose, and lips. One way
to do so is to capture the statistical variation between face images.
2) Represent face images eciently. To reduce the computation and space com-
plexity, each face image can be represented using a small number of dimensions The
eigenfaces may be considered as a set of features which characterize the global vari-
ation among face images. Then each face image is approximated using a subset of
the eigenfaces, those associated with the largest eigenvalues. These features account
for the most variance in the training set. In the language of information theory, we
19
want to extract the relevant information in face image, encode it as eciently as pos-
sible, and compare one face with a database of models encoded similarly. A simple
approach to extracting the information contained in an image is to somehow cap-
ture the variations in a collection of face images, independently encode and compare
individual face images [21].
20
Figure 3.5: Fishers linear discriminant algorithm for face recognition
3.4 LINEAR DISCRIMINANT ANALYSIS
The objective of LDA is to perform dimensionality reduction while preserving as
much of the class discrim-inatory information as possible.It seeks to nd directions
along which the classes are best separated.It does so by taking into consideration
the scatter within-classes but also the scatter between-classes.It is also more capable
of distinguishing image variation due to identity from variation due to other sources
such as illumination and expression [17].
The classical Linear Discriminant Analysis (LDA) is also called as Fishers Linear
Discriminant (FLD). This method was developed by Robert Fisher in 1936. In
this method, training and test sets are projected into the same subspace and the
similarities between these data sets are identied. The Fishers linear discriminant
algorithm is explained in gure 3.5
LDA gives a better classication of data sets when compared to the principal
component analysis. Main dierences between the principal component analysis and
linear discriminant analysis are
LDA uses class information. PCA is not paying any attention to class infor-
mation.
PCA takes complete data set as one entity. PCA compresses information e-
ciently but not the discriminatory information.
In PCA the shape and location of the data set changes due to the translation
21
Figure 3.6: Flow chart of linear discriminant analysis algorithm
of the original space into a new space and LDA only tries to separate the class
by drawing a decision region between the given classes (does not change the
location of the data set)
3.4.1 Mathematical Analysis of LDA
The classical Linear Discriminant Analysis (LDA) is also called as Fishers Linear
Discriminant (FLD). This method was developed by Robert Fisher in 1936. In
this method, training and test sets are projected into the same subspace and the
similarities between these data sets are identied. The Fishers linear discriminant
algorithm is explained in gure 3.6
The objective of LDA is to perform dimensionality reduction while preserving as
much of the class discriminatory information as possible.It seeks to nd directions
along which the classes are best separated.It does so by taking into consideration
the scatter within-classes but also the scatter between-classes.It is also more capable
of distinguishing image variation due to identity from variation due to other sources
such as illumination and expression.
22
suppose there are C classes
Let
i
be the mean of the vector of class i , i = 1, 2, 3, .., C
Let M
i
be the samples within class i , i = 1, 2, 3, .., C
Let

C
i=1
M
i
be the total number of samples.
the within class matrix is given by:
S
W
=
C

i=1
M
i

j=1
(y
j

i
)(y
j

i
)
T
Between-class scatter matrix:
S
B
=
C

i=1
(
i
)(
i
)
T
= 1/C
C

i=1

i
LDA computes a transformation that maximizes the between-class scatter while
minimizing the within-class scatter.
If S
W
is nonsingular the optimal projection W
lda
is chosen such that which max-
imizes the ratio of determinant of between class scatter matrix of the projected
samples to determinant of within class scatter matrix of the projected samples
W
lda
= arg max
|W
T
S
B
W|
|W
T
S
W
W|
= [w
1
, w
2
, .., w
m
]
where upper bound on m is c-1.Where w

i
s are the generalized eigen vectors of
S
W
and S
B
corresponding to set of decreasing generalized eigen values.
S
B
w
i
=
i
S
W
w
i
But In practice, S
W
is often singular since the data are image vectors with large
dimensionality while the size of the data set is much smaller (M N )
To alleviate this problem [15, 11], we can perform two projections:
1. PCA is rst applied to the data set to reduce its dimensionality.
2. LDA is then applied to further reduce the dimensionality of C - 1.
23
In the nal step of calculation the optimal transform matrix is given by
W
opt
= W
lda
.W
pca
The PCA reduces the dimension of the feature space rst and then by apply the
LDA on this the feature space dimensionality is further reduced. This technique is
called subspace LDA.
3.5 INDEPENDENT COMPONENT ANALYSIS (ICA)
Independent component analysis (ICA) is a statistical method, the goal of which is to
decompose multivariate data into a linear sum of non-orthogonal basis vectors with
coecients (encoding variables, latent variables, hidden variables) being statistically
independent.The independent components are latent variables, meaning that they
cannot be directly observed.Also the mixing matrix is assumed to be unknown.The
concept of ICA may be seen as an extension of Principal Component Analysis, which
only imposes independence up to second order and consequently denes directions
that are orthogonal. Applications of ICA include data compression, detection and
localization of sources, or blind identication and deconvolution [7].
3.5.1 Mathematical analysis of ICA
Let us denote the random observed vector X= [X1,X2,. . . ..,Xm] whose m elements
are mixtures of m independent elements of a random vector S=[S1,S2,. . . ...,Sm]
given by :
X = AS
where A represents an mm mixing matrix. The goal of ICA is to nd the unmix-
ing matrix W (i.e. the inverse of A) that will give Y, the best possible approximation
of S:
Y = WX
1) MINIMIZATION OF MUTUAL INFORMATION
The conditional entropy of Y given X measures the average uncertainty remaining
about y when x is known, and is:
H(Y |X) =

P(x, y)log
2
P(y|x)
The mutual information between Y and X is:
I(Y, X) = H(Y ) +H(X) H(Y |X) = H(Y ) H(Y |X)
24
Figure 3.7: ICA algorithm
Entropy can be seen as a measure of uncertainty. By having an algorithm that
seeks to minimize mutual information, we are searching for components that are max-
imally independent. Maximizing the joint entropy consists of maximizing individual
entropies while minimizing mutual information.
2) Maximum Likelihood Estimation
It is possible to formulate directly the likelihood in the noise-free ICA model and
then estimate the model by a maximum likelihood method:
L =

logf
i
(w
T
i
x(t) +Tlog|detW|,
where the f
i
are the density functions of the s
i
(here assumed to be known), and
the x(t),t = 1, ..., T are the realizations of x.
3) Infomax Algorithm
Algorithm implied in Infomax to compute the unmixing matrix W
1. Initialize W(0) (e.g. random)
2.W(t + 1) = W(t) + (I f(Y )Y
T
)W(t), where t represents a given approxi-
mate step and f(Y) is a nonlinear function usually chosen according to the type of
distribution(generally exponential)[3, 7].
3.5.2 ICA Algorithm
The ICA algorithm is shown in gure 3.7
We obtain the overall transformation matrix using
25
W
ica
= W
pca
W
k
(3.24)
The size of W
ica
is nxN where n is the number of pixels in the image and N is
the total ica number of images in the data set. The test image set is taken and they
are mean centered by subtracting the mean image m of the data matrix X and then
project onto the new subspace W
ica
Y = W
T
ica
X (3.25)
We compare the projections of test images and training images and nd the best
match using a appropriate classier.
26
CHAPTER 4
NON-LINEAR DIMENSIONALITY REDUCTION TECHNIQUES
Like linear subspace techniques, this is also one of the appearance based technique. A
spatially sampled image representation can be fully nonlinear. The nonlinear nature
is dierentiated by inner nonlinearity in the data or nonlinearity due to the choice
of parameters . We must know that mathematics related to nonlinear techniques are
not applicable to all types of data. We have many nonlinear techniques for principal
manifolds, one such method is the nonlinear PCA.
4.1 KERNEL METHODS FOR FACE RECOGNITION:
So far, we have seen that PCA and LDA methods for face recognition have demon-
strated their success. However, in both of them, the representation is based on
second order statistics of the image set, without considering higher order statistical
dependencies among three or more pixels. Now, we move to the Kernel methods:
Kernel PCA and Kernel Fisher Analysis, which provide higher order correlations [25].
Presently, we give the detailed algorithm of these methods and implement them.
4.1.1 Kernel Principal Component Analysis:
A nonlinear form of Principal Component Analysis, which eciently computes prin-
cipal components in higher dimensional feature spaces via a nonlinear mapping.
In some high dimensional feature space F (bottom right), we are performing linear
PCA just as a PCA in input space (top). Since F is nonlinearly related to input space
via the contour lines of constant projections onto the principal Eigenvector drawn as
an arrow become nonlinear in input space [9].
4.1.1.1 Algorithm:
1. Considering a set of M centered observations x
k
, k varying from 1 to M, we rst
compute the dot product matrix,
K
ij
= (k(x
i
, x
j
))
ij
2. Next, we solve by diagonalizing K:
27
Figure 4.8: Explanation of KPCA

k
(
k
.
k
) = 1
3. compute projections onto the Eigenvectors:
(kPC)
n
= (V
n
.(x)) =

n
i
k(x
i
, x)
4.1.1.2 Dimensionality Reduction and Feature Extraction:
Here in KPCA, we can exceed the input dimensionality. Suppose that the number
of observations M exceeds the input dimensionality N. Linear PCA even when it is
based on the MxM dot product matrix can nd at most N nonzero Eigenvalues. In
contrast kernel PCA can nd up to M nonzero Eigenvalues [24].
4.1.2 Kernel Fisher Analysis:
Kernel Fisher Analysis (KFA), also known as generalized discriminant analysis and
kernel discriminant analysis, is a kernelized version of linear discriminant analysis
28
Figure 4.9: KDA
[24]. Using the kernel trick, LDA is implicitly performed in a new feature space,
which allows non-linear mappings to be learned. KPCA and GDA are based on the
exactly same optimization criteria to their linear counterparts, PCA and LDA.
The algorithm generalizes the strengths of D-LDA and the kernel techniques.
Following the SVM paradigm, the algorithm rst nonlinearly maps the original input
space to an implicit high-dimensional feature space, where the distribution of face
patterns is hoped to be linearized and simplied. Then, a new variant of the D-
LDA method is introduced to eectively solve the SSS problem and derive a set of
optimal discriminant basis vectors in the feature space. The kernel machines provide
an elegant way of designing nonlinear algorithms by reducing them to linear ones in
some high-dimensional feature space nonlinearly related to the input sample space.
4.1.2.1 Algorithm:
1. Calculate kernel matrix K, as above:
K
ij
= (k(x
i
, x
j
))
ij
2. Next, the objective is to nd a transformation , based on optimization of
certain separability criteria, which produces a mapping, y
i
= (x
i
), that leads to an
enhanced separability of dierent face objects.
3. Calculate
T
b

b
::

T
b

b
=
1
L
B.(A
T
LC
.K.A
LC

1
L
(A
T
LC
.K.1
LC
)
1
L
(1
T
LC
.K.A
LC
)
1
L
2
(1
T
LC
.K.1
LC
)).B
where , B=diag[

c
1
...

c
c
] is a matrix with terms all equal to: one, A
LC
=
diag[a
c
1
...a
cc
] is a LXC block diagonal matrix, and a
c
i
is a C
i
x1 vector with all terms
equal to: (
1
C
i
).
29
4. Find E
m
and
b
from
T
b

b
5. CalculateU
T
S
WTH
U using the equation:
U
T
S
WTH
U = (E
m

1/2
b
)
T
(
T
b
S
WTH

b
)(E
m

b
1/2
)
6. Calculate
y = .((z))
where = (1/

L)(E
m

1/2
b
.P.
1/2
w
)
T
(B(A
T
LC
- 1(1/L)1
T
LC
)) is a M X L matrix
7. For input pattern x, calculate its kernel matrix ((z))
8. The optimal discriminant feature representation of x can be obtained by y =
((z))
30
CHAPTER 5
FACE RECOGNITION USING GABOR WAVELETS
5.1 WAVELETS
Wavelets are mathematical functions that cut up data into dierent frequency com-
ponents, and then study each component with a resolution matched to its scale.
Over traditional Fourier methods, they have advantages in analyzing physical situa-
tions when the signal contains discontinuities and sharp spikes [18]. Wavelets were
developed independently in the elds of mathematics, quantum physics, electrical
engineering etc. Interchanges between these elds during the last ten years have led
to many new wavelets. We will use Gabor wavelets to implement our face recognition
system.
5.2 GABOR FILTERS
From a face image at several scales, Gabor lters are capable of deriving multi-
orientational information , with the derived information being of local nature. The
amount of data, in the Gabor face representation, is commonly reduced to a more
manageable size by exploiting various downsampling, feature selection and subspace
projection techniques before it is nally fed to a classier. Gabor lters are among
the most popular tools for facial feature extraction. Their use in automatic face
recognition system is motivated by two major factors: their computational properties
and their biological relevance. 2D Gabor lter in the spatial domain is dened by
the following expression:

u,v
(x, y) =
f
2
u

e
(
f
2

2
x

2
+
f
2

2
y

2
)
e
j2fux

,
where, x

= xcos
v
+ysin
v
and y

= xsin
v
+ycos
v
, f
u
= f
max
/2
u
2
,
v
= v/8
Hence, in a way, Gabor lters represent Gaussian kernel functions modulated by
a complex plane wave whose center frequency and orientation are dened by f
u
and
v, respectively, with the parameters and determining the ratio between the center
frequency and the size of the Gaussian envelope [14].
31
5.2.1 Extracting features with Gabor lters
Considering a grey-scale face image dened on a grid of size ab denoted by I(x,y)
and let
u,v
(x, y), represent a Gabor lter determined by the parameters f
u
and
v
.
The ltering operation with the Gabor lter can then be written as follows:
G
u,v
(x, y) = I(x, y)
u,v
(x, y), whereG
u,v
(x, y) denotes the complex convolution
result which can be decomposed into a real and an imaginary part:
E
u,v
(x, y) = Re[G
u,v
(x, y)]
O
u,v
(x, y) = Im[G
u,v
(x, y)]
Both the phase
u,v
(x, y) as well as the magnitude A
u,v
(x, y) lter responses can
be computed, based on the decomposed ltering result, as:
A
u,v
(x, y) =
_
E
2
u,v
(x, y) +O
2
u,v
(x, y)

u,v
(x, y) = arctan(O
u,v
(x, y)/E
u,v
(x, y)).
Since the computed phase responses vary signicantly even for spatial locations
only a few pixels apart, Gabor phase features are considered unstable and are usually
discarded. The magnitude responses, on the other hand, vary slowly with the spatial
position, and are thus the preferred choice when deriving Gabor lter based features
[26].
In our experiment, we use a simple rectangular sampling grid with 256 nodes for
the initial dimensionality reduction and a)PCA b)LDA for the subspace projection
of the feature vector built by concatenating the downsampled magnitude responses.
To derive the Gabor face representation from a given face image I(x,y), the Gabor
magnitude responses for the entire lter bank of the 40 Gabor lters are commonly
computed rst. However, since each of the responses is of the same dimensionality
as the input image, this procedure results in an ination of the original pixel space
to 40 times its initial size. To cope with this problem, the magnitude responses
are typically downsampled using either a simple rectangular sampling grid or some
kind of feature selection scheme. Nevertheless, even after the downsampling, any
face representation constructed, for example, by a concatenation of the downsampled
magnitude responses, still resides in a high dimensional space. Hence, use a subspace
projection technique, such as principal component analysis or linear discriminant
analysis, to further reduce the datas dimensionality.
32
CHAPTER 6
DISTANCE MEASURES , ROC AND ORL DATABASE
6.1 DISTANCE MEASURES FOR FACE RECOGNITION
To nd the best matched image, we have several distance measuring techniques. They
are euclidean distance, mahalanobis distance, city block, cosine and chessboard. The
feature vector formed from the extracted features of the training database. Then we
calculate the feature vector from the test image. The test image feature vector is
matched with each of the training set feature vectors using the distance measure.
The training set feature vector with least distance gives the best match image
with the test image. Let us take a training set of N images then calculate feature
vector Y with these images images i.e. Y has N number of (Kx1) column vectors
as y
1,
y
2,
y
3,
..., y
n
. The feature vector of test image is y
tst
. Calculate the distance d
between y
i
and y
tst
by using various distance measures.
6.1.1 Euclidean distance
The euclidean distance is commonly used distance measure in many applications.
This distance gives the shortest distance between the two images or vectors it is
same as the pythagoras equation in 2 dimensions . It is sum of squared distance of
two feature vectors (y
tst
,y
i
)
d
2
= (y
tst
y
i
)
T
(y
tst
y
i
) (6.1)
The euclidean distance is sensitive to both adding and multiplying the vector
with some factor or value.
6.1.2 Mahalanobis distance
Mahalanobis distance comes from the gaussian multivariate probability density func-
tion as
p(x) = (2)
d/2
|C|
1/2
exp(1/2(x m)
T
C
1
(x m)) (6.2)
Where (x m)
T
C
1
(x m) is called the called squared mahalanobis distance,
which is very important in characterizing the distribution. Where C is the estimated
33
covariance matrix of y whose observations are y

i
s. The mahalanobis distance of two
feature vectors y
tst
and y
i
is given by following equation
d
2
= (y
tst
y
i
)
T
C
1
(y
tst
y
i
) (6.3)
6.1.3 Cosine Similarity
Cosine similarity is a measure of similarity between two vectors of an inner prod-
uct space that measures the cosine of the angle between them.Cosine similarity is
particularly used in positive space, where the outcome is neatly bounded in [0,1].
Given two vectors of attributes, A and B, the cosine similarity, cos() is repre-
sented using a dot product and magnitude as
d = cos() =
y
test
.y
||y
test
|| ||y||
| (6.4)
The resulting similarity ranges from 1 meaning exactly opposite, to 1 meaning
exactly the same, with 0 usually indicating independence, and in-between values
indicating intermediate similarity or dissimilarity.
6.1.4 City Block distance
Also known as the Manhattan distance. This metric assumes that in going from one
pixel to the other it is only possible to travel directly along pixel grid lines. Diagonal
moves are not allowed. Therefore the city block distance is given by
d
1
(y
tst
, y) = ||y
tst
y
i
|| =
n

i=1
|y
tst,i
y
i
| (6.5)
City block distance depends on the rotation of the coordinate system, but does
not depend on its reection about a coordinate axis or its translation.
6.1.5 Chessboard Distance
This metric assumes that you can make moves on the pixel grid as if you were a King
making moves in chess, i.e. a diagonal move counts the same as a horizontal move.
This means that the metric is given by:
d
chess
= max{ |y
tst
, y| } (6.6)
34
The last two metrics are usually much faster to compute than the Euclidean
metric and so are sometimes used where speed is critical but accuracy is not too
important.
6.2 ROC
Face recognition can be classied into face identication and face verication. In face
identication, we stored all test image identications. The system identify the test
image by comparing the identication of images in the database. It recognizes the
correlated image.
In face verication, we store some template images in the database. We give the
test image which is not stored in the database. Then compare the test image feature
vector with the feature vectors of images present in the database.
Receiver Operating Characteristic curve (ROC) is plotted with the verication
rate (persons are recognized correctly) and the false acceptance rate (number of times
wrong persons are recognized) by varying the threshold value. We should balance
these two rates based on the applications.
In verication ROC, camera takes the image of the person face and that face may
be in data base or not stored in the data base. Some individuals are claimed the
feature vector of the person.
Some linear subspace technique are used to calculate the feature vectors of stored
images and feature vector of acquired image or claimed feature vector . This claimed
feature vector is compared with the stored feature vectors by using distance measure.
If the distance is lower than the threshold then the system take decision that that
acquired image is the image of the claimed person . If the distance is greater than
the threshold, then the acquired person image is not claimed person image.
In the verication method one error is, we may verify the wrong person as the
correct person. Because sometimes individuals makes the wrong claim to their iden-
tity. Then the distance measure is lower than threshold then individual is claimed
that person even though he is not really claimed person. The number of times this
occurs over all individuals is called False Acceptance Rate.
The second type of error is individual make proper claim but the distance is
higher than the threshold then system thinks that the individual is not claimed
person even through he is claimed person. The number of times this error occurs is
called false Reject rate. Subtracting the false reject rate from 1 gives the Probability
35
Figure 6.1: Typical Receiver Operating Characteristic Curve
of Verication.A typical ROC curve is shown below.
6.3 ORL DATABASE
ORL face database contains a set of face images taken by the AT&T Laboratories.
AT&T Laboratories was found in 1986 at Cambridge as the Olivetti Research Labo-
ratories fa- mously known as ORL . The Laboratory took the face images in between
April 1992 and April 1994. They have taken these face images for Face recogni-
tion pro ject in collaboration with the Speech, Vision and Robotics Group of the
Cambridge university Engineering Department.
The ORL database contains images of 40 persons or subjects. 10 images per
each person with di erent face expressions (open/closed eyes or smiling/not smiling),
varying light conditions, facial details (glass/without glasses or slight rotation of
face). Some images are taken at di erent times. All images are taken in frontal
position against a dark homogeneous background. There are also small variations in
the background gray level. Figure 6.2 shows the sample data set of one person in
dierent facial expressions.
The size of each image in the database is 92x112 . These images are gray scale
images. Each image contains pixel values from 0 to 255 that is 256 gray level values
per pixel
36
Figure 6.2: Sample images of a single person from ORL database
6.4 SUMMARY
Dierent distance measure techniques are dened and compared. In all these distance
measures, the best performed distance measure is used to calculate the matching
between stored training images and given test images.
37
CHAPTER 7
REAL TIME FACE RECOGNITION SYSTEM USING OPENCV
Face Recognition generally involves two stages:
Face Detection, where a photo is searched to nd any face in it
Face Recognition, where that detected and processed face is compared to a
database of known faces, to decide who that person is.
7.1 FACE DETECTION
As mentioned above, the rst stage in Face Recognition is Face Detection. The
OpenCV library makes it fairly easy to detect a frontal face in an image using its
Haar Cascade Face Detector (also known as the Viola-Jones method).
The function cvHaarDetectObjects in OpenCV performs the actual face detection
[10]:
For frontal face detection, We can chose one of these Haar Cascade Classiers
that come with OpenCV (in the data\haarcascades\ folder):
haarcascade frontalface default.xml
haarcascade frontalface alt.xml
haarcascade frontalface alt2.xml
haarcascade frontalface alt tree.xml
7.2 PREPROCESSING FACIAL IMAGES
It is extremely important to apply various image pre-processing techniques to stan-
dardize the images that we supply to a face recognition system. Most face recognition
algorithms are extremely sensitive to lighting conditions, so that if it was trained to
recognize a person when they are in a dark room, it probably wont recognize them
in a bright room, etc. This problem is referred to as lumination dependent, and
there are also many other issues, such as the face should also be in a very consistent
position within the images (such as the eyes being in the same pixel coordinates),
38
consistent size, rotation angle, hair and makeup, emotion (smiling, angry, etc), posi-
tion of lights (to the left or above, etc). This is why it is so important to use a good
image preprocessing lters before applying face recognition.
Histogram Equalization [1] is as a very simple method of automatically standard-
izing the brightness and contrast of facial images
7.3 FACE RECOGNITION
Now that you have a pre-processed facial image, you can perform Eigenfaces (PCA)
for Face Recognition. OpenCV comes with the function cvEigenDecomposite(),
which performs the PCA operation, however we need a database (training set) of
images for it to know how to recognize each of the people.
So we should collect a group of preprocessed facial images of each person we
want to recognize. For example, if we want to recognize someone from a class of 10
students, then you could store 20 photos of each person, for a total of 200 preprocessed
facial images of the same size (say 100x100 pixels).
The Eigen faces methord has been throughly explained in Chapter 2
It is very easy to use a webcam stream as input to the face recognition system
instead of a le list. Basically you just need to grab frames from a camera instead
of from a le, and you run forever until the user wants to quit, instead of just run-
ning until the le list has run out. OpenCV provides the cvCreateCameraCapture()
function (also known as cvCaptureFromCAM()) for this.
Grabbing frames from a webcam can be implemented easily using this function
[10]:
Once the face has been captured the feature vector of this test image is compared
with the features of tranning set using nearest neighbour classier the image with
least euclidian distance metric is declared as the match.
39
CHAPTER 8
METHOD OF IMPLEMENTATION
8.1 PCA AND LDA
The whole recognition process involves two steps:
Initialization Process
Recognition Process
In the training phase we have to extract feature vector from each of the image in the
training stage. Let
A
be the image Training image of person A which has a pixel
resolution of M N. In order to extract the PCA features of
A
we have to convert
the image into pixel vector by concatenation of all rows of
A
into a single vector
[3].
Let that pixel vector be denoted by
A
,but the length of this vector is MN
which is very large and we use the PCA algorithm to reduce its dimensionality. we
denote the new reduced vector by
A
which has a dimensionality d M N .For
each training image
i
feature vectors
i
should be calculated and stored.
In the recognition phase (or, testing phase), a test image
j
will be given.Calculate
the feature vector
j
of this image using PCA.In order to identify the
j
the similarity
between all feature vectors stored in the training set and
j
is found out
This is done by measuring the Euclidean distance of
j
with all the feature vector
in the set.That feature vector is selected whose Euclidean distance with
j
is found
to be minimum and the corresponding image is said to be matched with
j
.
For increasing the class separability, we project the feature vectors into a new
subspace called Fisher Space, to make use of class information. This method is
called LDA.
The following Figure shows the whole procedure in a compact form:
8.2 ICA
We performed ICA on the image set under two architectures. Architecture I treated
the images as random variables and the pixels as outcomes, whereas Architecture II
treated the pixels as random variables and the images as outcomes, as in Figure 2
and Figure 3:
40
Figure 8.1: Schematic of the Face Recognizer

Figure 8.2: Blind Source Separation model


8.3 KPCA
We follow the same implementation strategy as we followed in PCA, only that in
KPCA, weve to transform the non linear input vector space, to a linear space so
that we can apply PCA on that.Polynomial and gaussian kernels were used for trans-
formation.
8.4 GABOR PCA
We propose a method that uses Gabor lter response as an input of PCA instead
of raw face image to overcome the shortcomings Using the Gabor lter responses
as input vector, the sensitive reaction due to the rotation and illumination can be
reduced if we use M gallery images, construct (N 40) by M matrix could be
41
Figure 8.3: Finding statistically independent basis images
constructed and the eigenvalues and eigenvectors can be calculated form the ensemble
matrix AAT, where the matrix A = N 40 by M matrix. From the eigenvalues,
we can select the eective Gabor lter responses and construct the eigen space with
the appropriate number of eigenvectors. Then the training sets of face image are
projected in the eigen space and the testing set of face images are also projected.
8.5 GABOR LDA
First, discriminant vectors are computed using LDA, from the given training images
. The function of the discriminant vectors is two-fold. First, discriminant vectors are
used as a transform matrix, and LDA features are extracted by projecting gray-level
images onto discriminant vectors. Second, discriminant vectors are used to select
discriminant pixels, the number of which is much less than that of a whole image.
Gabor features are extracted only on thesediscriminant pixels. Then, applying LDA
on the Gabor features, one can obtain reduced Gabor-LDA features. Finally, a
combined classier is formed based on these two types of LDA features.
42
CHAPTER 9
RESULTS AND OBSERVATIONS
The algorithms were implemented in MATLAB.The performance of dierent tech-
niques were evaluated against ORL database.The images in the database were divided
into Testing and Training set .The rst 4 images of each person was used for training
and rest were for testing. Nearest Neignbour classier was with dierent distance
metrics and Recognition rates and ROC curve was plotted for each of them.
9.1 PCA
The ORL Database of Faces was used to test the face recognition algorithm.There
are ten dierent images of each of 40 distinct subjects. For some subjects, the images
were taken at dierent times, varying the lighting, facial expressions (open / closed
eyes, smiling / not smiling) and facial details.
The mean face of all 400 images is shown Figure 9.1.
1)using PCA
The eigenfaces that were generated is shown Figure 9.2 :
an image was selected from the database the face recognition algorithm was ap-
plied to it to recognize it from the database
the following results were obtained:
The number above each images represents the euclidean distance of the test image
and the image from the database.The rst image has zero euclidean distance because
it is the same as the test image and other eight are similar to the rst image.
For calculating ROC we calculate False Accept Rate which is the probability
that system incorrectly matches with images stored with input image database and
Figure 9.1: Mean Face
43
Figure 9.2: Eigenfaces
Figure 9.3: Identifying similar faces
False Rejection Rate which is the the ratio of number of correct persons rejected
in the database to the total number of persons in database.A function is used from
PhD toolbox [19] to generate ROC curve data from genuine (client) and impostor
matching scores. The function takes either two or three input arguments with the
rst being the a vector of genuine matching scores (i.e., the client scores), the second
being a vector of impostor matching scores, and the third being the number of points
(i.e., the resolution) at which to compute the ROC cruve data
The ROC Curve is given in g 7:
The Receiver operating characteristic for PCA for dierent distances metrics is
shown in Fig 9.5. We see that PCA gave better verication rates with Mahalanobis
distance.
The graph between Recognition rate and number of eigen vectors retained is
shown in Fig 9.6.
Eigen vectors give the uncorrelated information or variation present in the image.
The highest eigen values contain more information than the lowest ones. In the di-
mensionality reduction techniques we discard some eigen vectors related information.
Due to this information loss hence the recognition rate is reduced
44
Figure 9.4: Performance of PCA based Face Recognition with ORL Face Database
The graph between Recognition rate and number of training images uses is shown
in Fig9.7 .The recognition rate increases up to a certian extent when the number if
training images used is increased.
Although the face recognition results were acceptable ,the system only using eigen
faces might not be applicable to a real time system.It needs to be more robust and
have more discriminnant features.
9.2 LDA
The rst 4 sher faces obtained are shown in gure 9.8
the ROC curve is shown in Fig 9.9
The Receiver operating characteristic for LDA has been shown in Fig 9.10. We
see that LDA gives better performance than PCA because it uses the additional class
information.Again better verication rates were obtained using Mahalanobis distance
metric.
The graph between Recognition rate and number of training images uses is shown
45
Figure 9.5: ROC for PCA
in Fig 9.11
9.3 ICA
list of gures:
source images
Aligned faces-features extracted
Mixed Images
Independent Components (Estimated Sources).
9.4 KERNEL PCA AND KERNEL LDA
The ROC for KLDA is show in g 9.16
Gaussian Kernel of the form k(x, y) = exp(|x y|
2
/
2
) was used to extract the
non linear features from the face. The value of variance was chosen to be 0.05 and
dierent ROCs were plotted for dierent distance metrics and it was found that the
use of Mahalanobis distance metric gave the best result.
The ROC for KLDA is show in g 9.17
46
Figure 9.6: Recognition rate vs number of eigen faces for PCA
9.5 GABOR FILTER
Face images after Gabor ltering with no downsampling is shown in gure 9.18
A downsampling factor of 64 results in the gure 9.19.
9.6 GABOR PCA AND GABOR LDA
It is dicult to use these linear techniques to classify nonlinear features in the data.
For nonlinear features, wavelets are used. This work considers the Gabor wavelets
method to classify the nonlinear features in data. Gabor Wavelet Transform is ap-
plied before the linear subspace technique for the nonlinear feature extraction
The ROC for Gabor is shown in g 9.20
The ROC for Gabor LDA is shown in g 9.21
Thus, from the ROC diagram, we see that the accuracy level of Gabor wavelet
based face recognition system is much higher than the other linear and non linear
techniques studied and implemented.
9.7 REAL TIME FACE RECOGNITION SYSTEM
The C++ program was compiled using gcc and the face detection and recognition
was done on the image captured from the web cam as shown in gure 9.22
47
Figure 9.7: Recognition rate vs number of training images for PCA
Figure 9.8: Fisher faces
Figure 9.9: Performance Of LDA Based Face Recognition With Orl Face Database
48
Figure 9.10: ROC for LDA
Figure 9.11: Recognition rate vs number of training images for PCA
49
Figure 9.12: Source images
Figure 9.13: Aligned faces-features extracted
Figure 9.14: Mixed images
50
Figure 9.15: Independent Components (Estimated Sources)
Figure 9.16: ROC for KPCA
51
Figure 9.17: ROC for KLDA
Figure 9.18: Magnitude response with no downsampling
52
Figure 9.19: Magnitude response with downsampling factor 64
Figure 9.20: ROC for Gabor PCA
53
Figure 9.21: ROC for Gabor LDA
Figure 9.22: Opencv implementation
54
CHAPTER 10
CONCLUSION
In this project, we look into various linear and non-linear techniques for dimension-
ality reduction with special emphasis on face recognition. The ROC Characteristics
of the various methods are found out and plotted. Lets summarize the results:
Amongst the linear techniques, LDA is better for large databases. However, for
a task with very high-dimensional data, the traditional LDA algorithm encoun-
ters several diculties.Hence we implemented it after reducing the dimensions
by PCA. LDA is primarily used to reduce the number of features to a more
manageable number before classication.
Besides the ROC, during the analysis on PCA and LDA, as the number of
Eigen faces retained is increased, the recognition is increased. However, the
feature vector size also increases, and so, there is a sort of trade-o between
dimensionality reduction and recognition rate.
ICA generalizesPCA and, like PCA, has proven a useful tool for reduction
in data dimensionality. It extracts features from natural scenes. It can be
adopted for image change detection. In addition ICA is sensitiveto lines and
edges of varying thickness of images. Also, the ICAcoecients leads to ecient
reduction of Gaussian noise.
We also saw that, in many practical cases linear methods are not suitable. LDA
and PCA can be extended for use in non-linear classication via the kernel
method. Here, the original observations are eectively mapped into a higher
dimensional non-linear space. Linear classication in this non-linear space is
then equivalent to non-linear classication in the original space.
Analyzing the kernel methods of dimensionality reduction, we nd that Gaus-
sian kernel PCA/ LDA succeeded in revealing morecomplicated structures of
data than theirlinearcounterparts and achieve much lower classication error
rates as is evident from the ROCs.
Analysis was also done based on various distance measures. In all the cases,
methods implemented with Mahalanobis distance gives better results.
55
We use wavelet (Gabor) to extract the nonlinear features in images and sub-
sequently, apply PCA and LDA. The ROC curve shows considerable improve-
ment in recognition rates. So, of all the methods we use, Wavelet based face
recognition techniques give the best results.
Finally, we implement a real time face recognition system where the detected
faces are recognized with a database of known faces.
56
BIBLIOGRAPHY
[1] Sapana Shrikrishna Bagade and Vijaya K Shandilya. Use of histogram equal-
ization in image processing for image enhancement. International Journal of
Software Engineering Research and Practices, 1(2):610, 2011.
[2] Robert J Baron. Mechanisms of human facial recognition. International Journal
of Man-Machine Studies, 15(2):137178, 1981.
[3] Marian Stewart Bartlett, Javier R Movellan, and Terrence J Sejnowski. Face
recognition by independent component analysis. Neural Networks, IEEE Trans-
actions on, 13(6):14501464, 2002.
[4] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. sherfaces:
recognition using class specic linear projection. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 19(7):711 720, jul 1997.
[5] Anthony J Bell and Terrence J Sejnowski. The independent components of
natural scenes are edge lters. Vision research, 37(23):3327, 1997.
[6] Kevin Bowyer and P Jonathon Phillips. Empirical evaluation techniques in
computer vision. IEEE Computer Society Press, 1998.
[7] Pierre Comon. Independent component analysis. Higher-Order Statistics, pages
2938, 1992.
[8] Bruce A Draper, Kyungim Baek, Marian Stewart Bartlett, and J Ross Beveridge.
Recognizing faces with pca and ica. Computer vision and image understanding,
91(1):115137, 2003.
[9] H.M. Ebied. Feature extraction using pca and kernel-pca for face recognition.
In Informatics and Systems (INFOS), 2012 8th International Conference on,
pages MM72 MM77, may 2012.
[10] Shervin Emami. Introduction to face detection and face recognition. http://
www.shervinemami.info/faceRecognition.html. Accessed January 12, 2013.
[11] Kamran Etemad and Rama Chellappa. Discriminant analysis for recognition of
human face images. JOSA A, 14(8):17241733, 1997.
57
[12] M. Kirby and L. Sirovich. Application of the karhunen-loeve procedure for the
characterization of human faces. 12(1):103108, January 1990.
[13] Michael Kirby and Lawrence Sirovich. Application of the karhunen-loeve pro-
cedure for the characterization of human faces. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 12(1):103108, 1990.
[14] Chengjun Liu and Harry Wechsler. Gabor feature based classication using the
enhanced sher linear discriminant model for face recognition. Image processing,
IEEE Transactions on, 11(4):467476, 2002.
[15] Gian Luca Marcialis and Fabio Roli. Fusion of lda and pca for face verication.
In Biometric Authentication, pages 3037. Springer, 2006.
[16] Ludwig Schwardt and Johan du Preez. Manipulating feature space. Lecture
Materials. University of Stellenbosch, South Africa, 2003.
[17] M. Sharkas and M.A. Elenien. Eigenfaces vs. sherfaces vs. ica for face recogni-
tion; a comparative study. In Signal Processing, 2008. ICSP 2008. 9th Interna-
tional Conference on, pages 914 919, oct. 2008.
[18] Linlin Shen and Li Bai. A review on gabor wavelets for face recognition. Pattern
analysis and applications, 9(2-3):273292, 2006.
[19] Vitomir Struc. Phd toolbox for face recognition.
[20] D.L. Swets and J.J. Weng. Using discriminant eigenfeatures for image retrieval.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 18(8):831
836, aug 1996.
[21] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proceed-
ings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 586591. IEEE Comput. Sco. Press, 1991.
[22] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade
of simple features, 2001.
[23] Chandra Kiran Bharadwaj Tungathurthi Y. Vijaya Lata. Facial recognition
using eigenfaces by pca. 1(1), May 1990.
58
[24] Jian Yang, Zhong Jin, Jing yu Yang, David Zhang, and Alejandro F Frangi.
Essence of kernel sher discriminant: Kpca plus lda. Pattern Recognition,
37(10):2097 2100, 2004.
[25] Ming-Hsuan Yang. Kernel eigenfaces vs. kernel sherfaces: Face recognition
using kernel methods. In Proceedings of the Fifth IEEE International Conference
on Automatic Face and Gesture Recognition, page 215. Washington, DC, 2002.
[26] Peng Yang, Shiguang Shan, Wen Gao, Stan Z Li, and Dong Zhang. Face recogni-
tion using ada-boosted gabor features. In Automatic Face and Gesture Recogni-
tion, 2004. Proceedings. Sixth IEEE International Conference on, pages 356361.
IEEE, 2004.
[27] W. Zhao, R. Chellappa, A. Rosenfeld, and P. J. Phillips. Face Recognition: A
Literature Survey. ACM Computing Surveys, pages 399458, 2003.
[28] Wenyi Zhao, Rama Chellappa, P Jonathon Phillips, and Azriel Rosenfeld. Face
recognition: A literature survey. Acm Computing Surveys (CSUR), 35(4):399
458, 2003.
59

You might also like