You are on page 1of 57

c 

 
        


CHAPTER 1
INTRODUCTION

1.1 Introduction to face recognition

Face recognition by humans is a high level visual task for which it has
been extremely difficult to construct detailed neurophysiological and psychophysical models.
This is because faces are complex natural stimuli that differ dramatically from the artificially
constructed data often used in both human and computer vision research. Thus, developing a
computational approach to face recognition can prove to be very difficult indeed. In fact,
despite the many relatively successful attempts to implement computerbased face recognition
systems, we have yet to see one which combines speed, accuracy, and robustness to face
variations caused by 3D pose, facial expressions, and aging. The primary difficulty in
analyzing and recognizing human faces arises because variations in a single face can be very
large, while variations between different faces are quite small. That is, there is an inherent
structure to a human face, but that structure exhibits large variations due to the presence of a
multitude of muscles in a particular face. Given that recognizing faces is critical for humans
in their everyday activities, automating this process would be very useful in a wide range of
applications including security, surveillance, criminal identification, and video compression.
This paper discusses a new computational approach to face recognition that, when combined
with proper face localization techniques, has proved to be very efficacious. This section
begins with a survey of the face recognition research performed to date. The proposed
approach is then presented along with its objectives and
the motivations for choosing it. The section concludes with an overview of the structure of
the paper.

Face Recognition

A facial recognition system is a computer-driven application for automatically identifying a


person from a digital image. It does that by comparing selected facial features in the live
image and a facial database. It is typically used for security systems and can be compared to
other biometrics such as fingerprint or eye iris recognition systems. The great advantage of
facial recognition system is that it does not require aid from the test subject. Properly

          


c 
 
        !


designed systems installed in airports, multiplexes and other public places can detect the
presence of criminals among the crowd.

History of Face Recognition

The development and implementation of face recognition systems is totally dependent in the
development of computers, since without computers the efficient use of the algorithms is
impossible. So the history of face recognition goes side by side with the history of computers.

Research in automatic face recognition dates back at least until the 1960¶s. Bledsoe, in 1966,
was the first to attempt semi-automated face recognition with a hybrid human computer
system that classified faces on the basis of fiducial marks entered on photographs by hand.
Parameters for the classification were normalized distances and ratios among points such as
eye corners, mouth corners, nose tip and chin point. Later work at Bell
laboratories(Goldstein, Harmon and Lesk,1971; Harmon, 1971) developed a vector of upto
21 features and recognized faces using standard pattern classification techniques. The chosen
features were largely subjective evaluation (e.g. shade of hair, length of ears, lip thickness)
made by human subjects, each of which would be difficult to automate.

An early paper by Fischler and Elschlager (1973) attempted to measure similar features
automatically. They described a linear embedding algorithm that used local feature template
matching and a global measure of fit to find and measure facial features. This template
matching approach has been continued and improved by recent work of Yuille, Cohen and
Hallinan (1989). Their strategy is based on ³deformable templates´, which are parameterized
models of the face and its features in which the parameter values are determined by
interaction with the image.Connectionalist approach to face identification seeks to capture the
configurational or gestate-like nature of the task. Kohonen (1989) and Kohonen and Lahito
(1981) describe an associative network with a simple learning algorithm that can recognize
(classify) face images and recall a face image from an incomplete or noisy version input to
the network. Fleming and Cottrell (1990) extend these ideas using nonlinear units, training
the system by back propagation. Stonham¶s WISARD system (1986) is a general pattern
recognition devise based on neutral net principles. It has been applied with some success to
binary face images, recognizing both identity and expression. Most connectionist system
dealing with faces treat the input image as a general 2-D pattern, and can make no explicit
use of the configurational prosperities of face. Moreover, some of these systems require an
inordinate number of training examples to achieve a reasonable level of performance. Kirby

          


c 
 
        "


and Sirovich were among the first to apply principal component analysis (PCA) to face
images and showed that PCA is an optimal compression scheme that minimizes the mean
squared error between the original images and their reconstructions for any given level of
compression. Turk and Pentland popularized the use of PCA for face recognition. They used
PCA to compute a set of subspace basis vectors (which they called ³eigenfaces´) for a
database of face images and projected the images in the database into the compressed
subspace. New test images were then matched to images in the database by projecting them
onto the basis vectors and finding the nearest compressed image in the subspace(eigenspace).
The initial success of eigenfaces popularized the idea of matching images in compressed
subspaces. Researchers began to search for other subspaces that might improve performance.
One alternative is Fisher¶s Linear Discriminant Analysis (LDA, a.k.a. ³fisherfaces´). For any
N-class classification problem, the goal of LDA is to find the N-1 basis vectors that maximize
the interclass distances while minimizing the intraclass distances.

1.2 Outline of a Typical Face Recognition System

The acquisition module

This is the entry point of the face recognition process. It is the module where the face image
under consideration is presented to the system. In other words, the user is asked to present a
face image to the face recognition system in this module. An acquisition module can request
a face image from several different environments: The face image can be an image file that is
located on a magnetic disk, it can be captured by a frame grabber and camera or it can be
scanned from paper with the help of a scanner.

The pre-processing module

In this module, by means of early vision techniques, face images are normalized and if
desired, they are enhanced to improve the recognition performance of the system. Some or all
of the pre-processing steps may be implemented in a face recognition system

The feature extraction module

After performing some pre-processing (if necessary), the normalized face image is presented
to the feature extraction module in order to find the key features that are going to be used for
classification. In other words, this module is responsible for composing a feature vector that
is well enough to represent the face image.

          


c 
 
        #


The classification module

In this module, with the help of a pattern classifier, extracted features of the face image is
compared with the ones stored in a face library (or face database). After doing this
comparison, face image is classified as either known or unknown.

Principal component analysis, based on information theory concepts, seeks a computational


model that best describes a face, by extracting the most relevant information contained
in that face. Eigenfaces approach is a principal component analysis method, in which a
small set of characteristic pictures are used to describe the variation between face images.
Goal is to find out the eigenvectors (Eigenfaces) of the covariance matrix of the
distribution, spanned by a training set of face images. Later, every face image is
represented by a linear combination of these eigenvectors.

Evaluations of these eigenvectors are quite difficult for typical image sizes but, an
approximation that is suitable for practical purposes is also presented. Recognition is
performed by projecting a new image into the subspace spanned by the Eigenfaces and
then classifying the face by comparing its position in face space with the positions of
known individuals.

Eigenfaces approach seems to be an adequate method to be used in face recognition due


to its simplicity, speed and learning capability. Experimental results are given to demonstrate
the viability of the proposed ³face recognition method´.

1.3 Introduction to Digital Image Processing

Information carrying function of time is called signal. Real time signals can be
audio(voice) or video(image) signals. Still video is called an image. Moving image is called a
video. Difference between digital image processing and signals and systems is that time
graph is not there in DIP. X and Y coordinates in DIP are spatial coordinates. Time graph is
not there because photo doesn¶t change with time.

What is image?

Image : An image is defined as a two dimensional function f(x, y) where x and y are spatial
coordinates and the amplitude µf¶ at any point (x, y) is known as the intensity of image at that
point.

          


c 
 
        $


What is a pixel?

Pixel : A pixel(short for picture element) is a single point in a graphic image. Each such
information element is not really a dot, nor a square but an abstract sample. Each element of
the above matrix is known as a pixel where dark = 0 and light = 1. A pixel with only 1 bit
will represent a black and white image. If the number of bits are increased then the number of
gray levels will increase and a better picture quality is achieved.

All naturally occurring images are analog in nature. If the number of pixels is
more then the clarity is more. An image is represented as a matrix in DIP. In DSP we use
only row matrices. Naturally occurring images should be sampled and quantized to get a
digital image. A good image should have 1024*1024 pixels which is known as 1k * 1k = 1M
pixel.

1.4 Fundamental steps in DIP

Image acquition : Digital image acquisition is the creation of digital images typically
from a physical object. A digital image may be created directly from a physical scene by a
camera or similar device. Alternatively it can be obtained from another image in an analog
medium such as photographs, photographic film, or printed paper by a scanner or similar
device. Many technical images acquired with tomographic equipment, side-looking radar, or
radio telescopes are actually obtained by complex processing of non-image data.

Image enhancement : The process of image acquisition frequently leads to image


degradation due to mechanical problems, out-of-focus blur, motion, inappropriate
illumination and noise. The goal of image enhancement is to start from a recorded image and
to produce the most visually pleasing image.

Image restoration : The goal of image restoration is to start from a recorded image and to
produce the most visually pleasing image. The goal of enhancement is beauty. The goal of
restoration is truth. The measure of success in restoration is usually an error measure between
the original and the estimate image. No mathematical error function is known that
corresponds to human perceptual assessment of error.

          


c 
 
        %


Colour image processing : Colour image processing is based on that any colour can be
obtained by mixing 3 basic colours red, green and blue. Hence 3 matrices are necessary each
one representing each colour.

Wavelet and multiresolution processing : Many times a particular spectral component


occurring at any instant can be of particular interest. In these cases it may be very beneficial
to know the time intervals these particular spectral components occur. For example, in EEGs
the latency of an event-related potential is of particular interest.
Wavelet transform is capable of providing the time and frequency information
simultaneously, hence giving a time-frequency representation of the signal. Although the
time and frequency resolution problems are results of a physical phenomenon ( the
Heisenberg uncertainty principle ) and exist regardless of the transform used, it is possible to
any signal by using an alternative approach called the multiresolution analysis (MRA). MRA
analyzes the signal at different frequencies with different resolutions. MRA is designed to
give good time resolution and poor frequency resolution at high frequencies and good
frequency resolution and poor time resolution at low frequencies.

Compression: Image compression is the application of data compression on digital images.


Its objective is to reduce redundancy of the image data in order to be able to store or transmit
data in an efficient form.

Morphological processing: Morphological processing is a collection of techniques for DIP


based on mathematical morphology. Since these techniques rely only on the relative ordering
of pixel values not on their numerical values they are especially suited to the processing of
binary images and grayscale images.

Segmentation: In the analysis of the objects in images it is essential that we can distinguish
between the objects of interest and ³the rest´. This latter group is also referred to as the
background. The techniques that are used to find the objects of interest are usually referred to
as segmentation techniques.

          


c 
 
        &


CHAPTER 2

FACE RECOGNITION USING DCT

2.1 Discrete Cosine Transformation (DCT)


DCT is a well-known signal analysis tool used in compression standards due to its
compact representation power. Although Discrete Cosine transform (KLT) is known to be the
optimal transform in terms of information packing, its data dependent nature makes it
unfeasible for use in some practical tasks. Furthermore, DCT closely approximates the
compact representation ability of the KLT, which makes it a very useful tool for signal
representation both in terms of information packing and in terms of computational
complexity due to its data independent nature.
ÿocal Appearance Based Face Representation

Local appearance based face representation is a generic local approach and does not
require detection of any salient local regions, such as eyes, as in the modular or component
based approaches [5, 10] for face representation. Local appearance based face representation
can be performed as follows: A detected and normalized face image Implementation of face
recognition system is divided into blocks of 8x8 pixels size. Each block is then represented
by its DCT coefficients. The reason for choosing a block size of 8x8 pixels is to have small-
enough blocks in which stationarity is provided and transform complexity is kept simple on
one hand, and to have big enough blocks to provide sufficient
compression on the other hand. The top-left DCT coefficient is removed from the
representation since it only represents the average intensity value of the block. From the
remaining DCT coefficients the ones containing the highest information are extracted via zig-
zag scan.
Fusion
To fuse the local information, the extracted features from 8x8 pixels blocks can be
combined at the feature level or at the decision level.
Feature Fusion
In feature fusion, the DCT coefficients obtained from each block are concatenated to
construct the feature vector which is used by the classifier.

          


c 
 
        '


Decision Fusion
In decision fusion, classification is done separately on each block and later, the
individual classification results are combined. To combine the individual classification results

2.2 Definition

Ahmed, Natarajan, and Rao (1974) first introduced the discrete cosine transform (DCT) in
the early seventies. Ever since, the DCT has grown in popularity, and several variants have
been proposed (Rao and Yip, 1990). In particular, the DCT was categorized by Wang (1984)
into four slightly different transformations named DCT-I, DCT-II, DCT-III, and DCT-IV. Of
the four classesWang defined, DCT-II was the one first suggested by Ahmed et al., and it is
the one of concern in this paper.

Compression Performance in Terms of the Variance Distribution

The Karhunen-Loeve transform (KLT) is a statistically optimal transform based on a


number of performance criteria. One of these Criteria Is the variance distribution of transform
coefficients. This criterion judges the performance of a discrete transform by measuring its
variance distribution for a random sequence having some specific probability distribution
function (Rao and Yip, 1990). It is desirable to have a small number of transform coefficients
with large variances such that all other coefficients can be discarded with little error in the
reconstruction of signals from the ones retained. The error criterion generally used when
reconstructing from truncated transforms is the mean-square error (MSE). In terms of pattern
recognition, it is noted that dimensionality reduction is perhaps as important an objective as
class separability in an application such as face recognition. Thus, a transform exhibiting
largevariance distributions for a small number of coefficients is desirable. This is so because
such a transform would require less information to be stored and used for recognition. In this
respect, as well as others, the DCT has been shown to approach the optimality of the KLT
(Pratt, 1991). The variance distribution for the various discrete transforms is usually
measured when the input sequence is a stationary first-order Markov process (Markov-1
process). Such a process has an autocovariance matrix of the form Shown In Eq. (2.6) and
provides a good model for the scan lines of gray-scale images (Jain, 1989). The matrix in Eq.
(2.6) is aToeplitz matrix, which is expected since the process is stationary (Jain, 1989). Thus,

          


c 
 
        (


the variance distribution measures are usually computed for random sequences of length O
that result in an auto-covariance matrix of the form:

=
1 ' '2 'Oí1
'1' 'Oí2
'Oí1 'Oí2 1
' Ł correlation coeff.
|'|  1 ««««««««««««««««(2.6)
Face Recognition Using the Discrete Cosine Transform 171

Figure 2.1 Variance distribution for a selection of discrete transforms

for O = 16 and ' = 0 9 (adapted from K.R. Rao and P. Yip, Discrete Cosine Transform
Algorithms, Advantages, Applications, New York: Academic, 1990). Data is shown for the
following transforms: discrete cosine transform (DCT), discrete Fourier transform (DFT),
slant transform (ST), discrete sine transform (type I) (DST-I), discrete sine transform (type II)
(DST-II), and Karhunen-Loeve transform (KLT). Figure 2.1 shows the variance distribution
for a selection of discrete transforms given a first-order Markov process of length O = 16 and
' = 0 9. The data for this curve were obtained directly from Rao and Yip (1990) in which

          


c 
 
        )


other curves for different lengths are also presented. The purpose here is to illustrate that the
DCT variance distribution, when compared to other deterministic transforms, decreases most
rapidly. The DCT variance distribution is also very close to that of the KLT, which confirms
its near optimality. Both of these observations highlight the potential of the DCT for data
compression and, more importantly, feature extraction.

Comparison with the KÿT

The KLT completely decorrelates a signal in the transformdomain, minimizes MSE in data
compression, contains the most energy (variance) in the fewest number of transform
coefficients, and minimizes the total representation entropy of the input sequence (Rosenfeld
and Kak, 1976). All of these properties, particularly the first two, are extremely useful in
pattern recognition applications. The computation of the KLT essentially involves the
determination of the eigenvectors of a covariance matrix of a set of training sequences
(images in the case of face recognition). In particular, given å trainingimages of size, say, O
× O, the covariance matrix of interest is given by C = A Au (2.7) where r is a matrix
whose columns are the å training images (after having an average face image subtracted
from each of them) reshaped into O2-element vectors. Note that because of the size of r, the
computation of the eigenvectors of may be intractable. However, as discussed in Turk and
Pentland (1991), because å is usually much smaller than O2 in face recognition, the
eigenvectors of can be obtained more efficiently by computing the eigenvectors of another
smaller matrix (see (Turk and Pentland, 1991) for details). Once the eigenvectors of are
obtained, only those with the highest corresponding eigenvalues are usually retained to form
the KLT basis set. One measure for the fraction of eigenvectors retained for the KLT basis set
is given by
 =
å_
_=1

å
_=1


          


c 
 
        


where  is the th eigenvalue of and å_ is the number of eigenvectors forming the KLT
basis set. As can be seen from the definition of in Eq. (2.7), the KLT basis functions are
data-dependent. Now, in the case of a first-order Markov process, these basis functions can
be found analytically (Rao and Yip,1990). Moreover, these functions can be shown to be
asymptotically equivalent to the DCT basis functions as ' (of Eq. (2.6)) ĺ 1 for any given O
(Eq. (2.6)) and as O ĺ’for any given ' (Rao and Yip, 1990). It is this asymptotic
equivalence that explains the near optimal performance of the DCT in terms of its variance
distribution for first-order Markov processes. In fact, this equivalence also explains the near
optimal performance of the DCT based on a handful of other criteria such as energy packing
efficiency, residual correlation, and mean-square error in estimation (Rao and Yip, 1990).
This provides a strong justification for the use of the DCT for face recognition. Specifically,
since the KLT has been shown to be very effective in face recognition (Pentland et al., 1994),
it is expected that a deterministic transform that is mathematically related to it would
probably perform just as well in the same application. 172 Hafed and Levine
As for the computational complexity of the DCT and KLT, it is evident from the above
overview that theKLT requires significant processing during training, since its basis set is
data-dependent. This overhead in computation, albeit occurring in a non-time-critical off-line
training process, is alleviated with the DCT. As for online feature extraction, the KLT of an O
× O image can be computed in å_O2 time where å_ is the number of KLT basis vectors.
In comparison, the DCT of the same image can be computed in O2log2O time because of
its relation to the discrete Fourier transform²which can be implemented efficiently using the
fast Fourier transform (Oppenheim and Schafer, 1989). This means that the DCT can be
computationally more efficient than the KLT depending on the size of the KLT basis set.2 It
is thus concluded that the discrete cosine transform is very well suited to application in face
recognition. Because of the similarity of its basis functions to those of theKLT,
theDCTexhibits striking feature extraction and data compression capabilities. In fact, coupled
with these, the ease and speed of the computation of theDCT may even favor it over the KLT
in face recognition.

          


c 
 
        !


CHAPTER 3
FACE NORMAÿIZATION AND RECOGNITION

3.1 Basic Algorithm

The face recognition algorithm discussed in this paper is depicted in Fig. 3.1 It involves both
face normalization and recognition. Since face and eye localization is not performed
automatically, the eye coordinates of the input faces need to be entered manually in order to
normalize the faces correctly. This requirement is not a major limitation because the
algorithm can easily be invoked after running a localization system such as the one presented
in Jebara (1996) or others in the literature. As can be seen from Fig 3.2, the system receives
as input an image containing a face along with its eye coordinates. It then executes both
geometric and illumination normalization functions as will be described later. Once a
normalized (and cropped) face is obtained, it can be compared to other faces, under the same
nominal size, orientation, position, and illumination conditions.
This comparison is based on features extracted using the DCT. The basic idea here is to
compute the DCT of the normalized face and retain a certain subset of the DCT coefficients
as a feature vector Describing this face. This feature vector contains the low-to-mid
frequency DCT coefficients, as these are the ones having the highest variance. To recognize a
particular input face, the system compares this face¶s feature vector to the feature vectors of
the database faces using a Euclidean distance nearest-neighbor classifier (Duda and Hart,
1973). If the feature vector of the probe is v and that of a database face is f, then the
Euclidean distance between the two is
? =_ 0 í02 + 1 í12+ + åí1 íåí12««««««. (3.1)
where
v = [0 1 åí1]u
f = [ 0 1 åí1] (3.2)
and å is the number of DCT coefficients retained as
features. A match is obtained by minimizing ?. Note that this approach computes the DCT on
the entire normalized image. This is different from the use of the DCT in the JPEG
compression standard (Pennebaker and Mitchell, 1993), in which the DCTis computed on
individual subsets of the image. The use of the DCT on individual subsets of an image, as

          


c 
 
        "


in the JPEG standard, for face recognition has been proposed in Shneier and Abdel-Mottaleb
(1996) and Eickeler et al. (2000). Also, note that this approach basically assumes no
thresholds on ?. That is, the system described always assumes that the closest match is the
correct match, and no probe is ever rejected as unknown. If a threshold is defined on ?,
then the gallery face that minimizes ? would only be output as the match when ?  .
Otherwise, the probewould be declared asunknown. In this way, one can actually define a
threshold to achieve 100% recognition accuracy, but, of course, at the cost ofa certain number
of rejections. In other words, the system could end up declaring an input face as unknown
even though it exists in the gallery. Suitable values of can be obtained using the so-called
Receiver Operating Characteristic curve (ROC) (Grzybowski and Younger, 1997), as will be
illustrated later. be quite small, as will be seen in the next section. As an illustration,

Figure3.1 Face recognition system using the DCT.

          


c 
 
        #


Feature Extraction
To obtain the feature vector representing a face, Its DCT is computed, and only a subset of
the obtained coefficients is retained. The size of this subset is chosen such that it can
sufficiently represent a face, but it can in fact Face Recognition Using the Discrete Cosine
Transform Fig.3.2(a) shows a sample image of a face, and Fig.3.2(b) shows the low-to-mid
frequency 8 × 8subset of its DCT coefficients. It can be observed that the DCT coefficients
exhibit the expected behavior in which a relatively large amount of information about the
original image is stored in a fairly small number of coefficients. In fact, looking at Fig. 3.2
(b), we note that the DC term is more than 15,000 and the minimum magnitude in the
presented set of coefficients is less than 1. Thus there is an order of 10,000 reduction in
coefficient magnitude in the first 64 DCT coefficients. Most of the discarded coefficients
have magnitudes less than 1. For the purposes of this paper, square subsets, similar to the one
shown in Fig. 3.2(b), are used for the feature vectors. It should be noted that the size of the
subset of DCT coefficients retained as a feature vector may not be large enough for achieving
an accurate reconstruction of the input image. That is, in the case of face recognition, data
compression ratios larger than the ones necessary to render accurate reconstruction of input
images are175 Hafed and Levine (a) (b) encountered

          


c 
 
        $


Figure 3.2 Typical face image (a) of size 128 × 128 and an 8 × 8 subset of its DCT (b).
This observation, of course, has no ramifications on the performance evaluation
ofthe system, because accurate reconstruction is not a requirement. In fact, this situation was
also encountered in Turk and Pentland (1991) where the KLT coefficients used in face
recognition were not sufficient to achieve a subjectively acceptable facial reconstruction.
Figure 3.3 shows the effect of using a feature vector of size 64 to reconstructa typical face
image. Now, it may be the case that one chooses to use more DCT coefficients to represent
faces. However, there could be a cost associated with doing so. Specifically, more
coefficients do not necessarily imply better recognition results, because by adding them, one
may actually be representing more irrelevant information (Swets and Weng, 1996).

          


c 
 
        %


(a)

(b)

Figure 3.3 Effect of reconstructing a 128 × 128 image using only 64 DCT coefficients: (a)
original (b) reconstructed.
3.2 Normalization
Two kinds of normalization are performed in the proposed face recognition system. The first
deals with geometric distortions due to varying imaging conditions. Face Recognition Using

          


c 
 
        &


the Discrete Cosine Transform 175 That is, it attempts to compensate for position, scale, and
minor orientation variations in faces. This way, feature vectors are always compared for
images characterized by the same conditions. The second kind of normalization deals with
the illumination of faces. The reasoning here is that the variations in pixel intensities between
different images of faces could be due to illumination conditions. Normalization in this case
is not very easily dealt with because illumination normalization could result in an artificial
tinting of light colored faces and a corresponding lightening of dark colored ones. In the
following two subsections, the issues involved in both kinds of normalization are presented,
and the stage is set for various experiments to test their effectiveness for face recognition.
These experiments and their results are detailed in Section 4.
Geometry
The proposed system is a holistic approach to face recognition. Thus it uses the image of a
whole face and, as discussed in Section 1, it is expected to be sensitive to variations in facial
scale and orientation. An investigation of this effect was performed in the case of the DCT to
confirm this observation. The data used for this test were from the MIT database, which is
described, along with the other databases studied, in a fair amount of detail in Section 4. This
database contains a subset of faces that only vary in scale. To investigate the effects of scale
on face recognition accuracy, faces at a single scale were used as the gallery faces, and faces
from two different scales were used as the probes. Figure 3.5 illustrates how scale can
degrade the performance of a face recognition system. In the figure, the term ³Training Case´
refers to Figure 3.5 Three faces from the MIT database exhibiting scale variations. The labels
refer to the experiments performed in Fig. 3.4.

Figure 3.4 Effect of varying scale on recognition accuracy.

          


c 
 
        '


64 DCT coefficients were used for feature vectors, and 14 individuals of the MIT database
were considered. the scale in the gallery images, and the terms ³Case 1´ and ³Case 2´
describe the two scales that were available for the probes. Figure 3.5 shows examples of faces
from the training set and from the two cases of scal investigated. These results indicate that
the DCT exhibits sensitivity to scale similar to that shown for the KLT (Turk and Pentland,
1991). The geometric normalization we have used basically attempts to make all faces have
the same size and same frontal, upright pose. It also attempts to crop face images such that
most of the background is excluded. To achieve this, it uses the input face eye coordinates
and defines a transformation to place these eyes in standard positions. That is, it scales faces
such that the eyes are176 Hafed and Levine

Figure 3.5 Examples of faces

Figure 3.6 Geometric normalization and the parameters used.

          


c 
 
        (


The final image dimensions are 128 × 128.always the same distance apart, and it positions
these faces in an image such that most of the background is excluded. This normalization
procedure is illustrated in Fig.3.6, and it is similar to that proposed in Brunelli and Poggio
(1993). Given the eye coordinates of the input face image, the normalization procedure
performs thefollowing three transformations: rotate the image so that the eyes fall on a
horizontal line, scale the image (while maintaining the original aspect ratio) so that the eye
centers are at a fixed distance apart (36 pixels), and translate the image to place the eyes at
set positions within a 128×128 cropping window (see Fig. 3.6). Note that we only require the
eye coordinates of input faces in order to perform this normalization. Thus no knowledge of
individual face contours is available, which means that we cannot easily exclude the whole
background from the normalized images. Since we cannot tailor an optimal normalization
and cropping scheme for each face without knowledge of its contours, the dimensions shown
in Fig. 3.6 were chosen to result in as little background, hair, and clothing information as
possible, and they seemed appropriate given the variations in face geometry among people.
Another observation we can make about Fig. 3.6 is that the normalization performed accounts
for only twodimensional perturbations in orientation. That is, no compensation is done for
three-dimensional (in depth) pose variations. This is a much more difficult problem to deal
with, and a satisfactory solution to it has yet to be found. Of course, one could increase the
robustness of a face recognition system to 3-D pose variations by including several training
images containing such variations for a single person. The effect of doing this will be
discussed in the next section. Also, by two-dimensional perturbations in orientation, we mean
slight rotations from the upright position. These rotations are the ones that may arise
naturally, even if people are looking straight ahead (see Fig. 3.7 for an example). Of course,
larger 2-D rotations do not occur naturally and always include some 3-D aspect to them,
which obviously 2-D normalization does not account for. As for the actual normalization
technique implemented, it basically consists of defining and applying a 2-D affine
transformation, based on the relative eye positions and their distance. Figure 3.8 illustrates
the result of applying such a transformation on a sample face image.

          


c 
 
        !)


3.3 Illumination

Illumination variations play a Significant role in degrading the performance of a face


recognition system, even though Turk and Pentland indicate that the correlation between face
images under different lighting conditions remains relatively high (Turk and Pentland, 1991).
In fact, experience has shown that for large databases of images, obtained with different
sensors under different lighting conditions, special care must be expended to ensure that
recognition thresholds are not affected. To compensate for illumination variations in our
experiments, we apply Hummel¶s histogram modification technique (Hummel, 1975). That
is, we simply choose Face Recognition Using the Discrete Cosine Transform 177

Figure 3.7 An example of naturally arising perturbations in face orientations.

          


c 
 
        !


Figure 3.8 The result of applying such a transformation on a sample face image.

An illustration of the normalization performed on faces. Note the changes in scale,


orientation, and position. a target histogram and then compute a gray-scale transformation
thatwould modify the input image histogram to resemble the target. It should be noted that
another interesting approach to illumination compensation can be found in Brunelli (1997), in
which computer graphics techniques are used to estimate and compensate for illuminant
direction. This alleviates the need to train with multiple images under varying pose, but it
also has significant computational costs. The key issue in illumination compensation is how
to select the target illumination. This is so because there could be tradeoffs involved in
choosing such a target, especially if the face database contains a wide variety of skin tones.
An extensive study of illumination 178 Hafed and Levine
compensation of faces for automatic recognition was done in conjunction with these
experiments. The aim was to find an appropriate solution to this problem in order to improve
the performance of our system. The results of this study are documented in an unpublished
report available from the authors (Hafed, 1996). The main conclusion that can be drawn from
the study is that illumination normalization is very sensitive to the choice of target
illumination. That is, if an average face is considered as a target, then all histograms will be
mapped onto one histogram that has a reduced dynamic range (due to averaging), and the net
result is a loss of contrast in the facial images. In turn, this loss of contrast makes all faces
look somewhat similar, and some vital information about these faces, like skin color, is lost.
It was found that the best compromise was achieved if the illumination of a single face is
adjusted so as to compensate for possible non-uniform lighting conditions of the two halves

          


c 
 
        !!


of the same face. That is, no inter-face normalization is performed, and in this way, no
artificial darkening or lightening of faces occurs due to attempts to normalize all faces to a
single target. Of course, the results of illumination normalization really depend on the
database being considered. For example, if the illumination of faces in a database is
sufficiently uniform, then illumination normalization techniques are redundant.

3.4 Experiments

This section describes experiments with the developed face recognition system. These were
fairly extensive, and the hallmark of the work presented here is that the DCT was put to the
test under a wide variety of conditions. Specifically, several databases, with significant
differences between them, were used in the experimentation.
A flowchart of the system described in the previous section is presented in Fig.
3.9 As can be seen, there is a pre-processing stage in which the face codes for the individual
database images are extracted and stored for later use. This stage can be thought of as a
modeling stage, which is necessary even for human beings: we perform a correlation between
what is seen and what is already known in order to actually achieve recognition (Sekuler and
Blake, 1994). At run-time, a test input is presented to the system, and its face codes are
extracted. The closest match is found by performing a search that basically computes
Euclidean distances and sort the results using a fast algorithm (Silvester, 1993). the various
modules used and the flowchart of operation. This section begins with a brief overview of the
various face databases used for testing the system; the differences among these databases are
highlighted. Then the experiments performed and their results are presented and discussed.
We compare the proposed local appearance-based approach with several well-known
holistic face recognition approaches
6 Principal Component Analysis (PCA) [15], Linear Discriminant Analysis (LDA) [2],

6 Independent Component Analysis (ICA), as well as another DCT based local


approach, which uses Gaussian mixture models for modeling the distributions of
feature vectors. This approach will be named ³local DCT + GMM´ in the remainder
of the paper. Moreover, we also test a local appearance based approach using PCA for
the representation instead of DCT which will be named Local PCA in the paper.

          


c 
 
        !"


Figure 3.9 Implementation of face recognition system

          


c 
 
        !#


Fig. 3.10 Samples from the Yale database. First row: Samples from training set. Second row:
Samples from test set.
In all our experiments, except for the DCT+GMM approach, where the classification is
done with Maximum-Likelihood, we use the nearest neighbor classifier with the normalized
correlation d as the distance metric:

3.5 Experiments on the Yale database

The Yale face database consists of 15 individuals, where for each individual, there are 11
face images containing variations in illumination and facial expression. From these 11 face
images, we use 5 for training, the ones with annotations ³center light´, ³no glasses´,
³normal´, ³sleepy´ and ³wink´. The remaining 6 images - ³glasses´, ³happy´, ³left light´,
³right light´, ³sad´, ³surprised´ - are used for testing. The test images with illumination from
sides and with glasses are put in the test set on purpose in order to harden the testing
conditions. The face images are closely cropped and scaled to 64x64 resolution. Fig. 3.10
depicts some sample images from the training and testing set.
In the first experiment, the performances of PCA, global DCT, local DCT
and local PCA with feature fusion are examined with varying feature vector dimensions.
Fig.3.11 plots the obtained recognition results for the four approaches for varying number of
coefficients (holistic and local approaches are plotted in different figures due to the difference
in the dimension of used feature vectors in the classification). It can be observed that while
there¶s no significant performance difference between PCA, local PCA and global DCT, local
DCT with feature fusion outperforms these three approaches significantly. Fig3.11 shows that
Local DCT outperforms Local PCA significantly at each feature vector dimension which

          


c 
 
        !$


indicates that using DCT for local appearance representation is a better choice than using
PCA. Next, the block-based DCT with decision fusion is examined, again with varying
feature vector dimensions. Table 1 depicts the obtained results. It can be seen that further
improvement is gained via decision fusion. Using 20 DCT coefficients, 99% accuracy is
achieved. For comparison, the results obtained when using PCA for local representation are
also depicted in Table 3.1 Overall, the results obtained with PCA for local appearance
represenation are much lower than those obtained with the local DCT representation.

Fig. 3.11 Correct recognition rate versus number of used coefficients on the Yale
database.PCA vs. DCT.
40 eigenvectors are chosen corresponding to 97.92% of the energy content.From the results
depicted in below Table it can be seen that the proposed approaches using local DCT
features outperform the holistic approaches as well as the local DCT features modeled with a
GMM, which ignores location information.
3.1 Different database method and its recognition rate
Method Reco.Rate
PCA(20) 75.6%
LDA(14) 80.0%
ICA 1(40) 77.8%
ICA 2(40) 72.2%
Global DCT(64) 74.4%
Local DCT(18)+GMM(8) as in [12] 58.9%
Local DCT +Feature Fusion (192) 86.7%
Local DCT (10)+Decision Fusion(64) 98.9%

          


c 
 
        !%


CHAPTER 4

INTRODUTION TO NEURAÿ NETWORKS

4.1 What is a neural network?

An artificial neural network (ANN) is an information processing paradigm that is inspired by


the way biological nervous system, such as the brain, process information. The key element
of this paradigm is the novel structure of the information processing system. It is composed
of a large number of highly interconnected processing elements (neurons) working in unison
to solve specific problems. ANNs like people, learn by example. An ANN is configured for a
specific application, such as pattern recognition or data classification, through a learning
process. Learning in biological system involves adjustments to the synaptic connections that
exist between the neurons. This is true of ANNs as well.

Historical background

Neural network simulations appear to be a recent development. However, this field was
established before the advent of computers, and has survived at least one major setback and
several eras.

Many important advance have been boosted by the use of inexpensive computer emulations.
Following an initial period of enthusiasm, the field survived a period of frustration and
disrepute. During this period when funding and professional supports was minimal, important
advances were made by relatively few researchers. These pioneers were able to develop
convincing technology which surpassed the limitations identified by Minsky and Papert.
Minsky and Papert , published a book (in 1969) in which they summed up a general feeling
of frustration (against neural networks) among researchers, and was thus accepted by most
without further analysis. Currently, the neural network field enjoys a resurgence of interest
and a corresponding increase in funding.

The first artificial neuron was produced in 1943 by the neurophysiologist warren McCulloch
and the logician Walter Pits. But the technology available at that time did not allow them to
do too much.

          


c 
 
        !&


4.2 Why use neural networks?

Neural networks, with their remarkable ability to derive meaning from complicated or
imprecise data, can be used to extract patterns and detect trends that are too complex to be
noticed by either humans or other computer techniques. A trained neural can be thought of as
an ³expert´ in the category of information it has been given to analise. This expert can then
be used to provide projections given new situations of interest and answer ³what if´
questions.

Other advantages

Adaptive learning: an ability to learn how to do tasks based on the data given for
training or initial experience.

1. Self - organisation: an ANN can create its own organization or representation of the
information it receives during learning time.
2. Real time operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
3. Fault tolerance via redundant information coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network
capabilities may be retained even with major network damage.

4.3 Neural network versus conventional computers

Neural networks take a different approach to problem solving than that of conventional
computers. Conventional computers use an algorithmic approach i.e. the computer follows a
set of instructions in order to solve a problem. Unless the specific steps that the computer
needs to follow are known the computer cannot solve the problems. That restricts the
problem solving capability of conventional computers to problems that we already
understand and know how to solve. But computers would be so much more useful if they
could do things that we don¶t exactly know how to do.

Neural networks process information in a similar way the human brains does. The network
is Composed of a large number of highly interconnected processing elements(neurons)
working in parallel to solve a specific problem. Neural network by an example. They cannot
be programmed to perform a specific task. The example must be selected carefully otherwise

          


c 
 
        !'


useful time is wasted or even worse the network might be functioning incorrectly. The
disadvantage is that because that because the network finds out how to solve the problem by
itself, it¶s operation can be unpredictable.

On the other hand, conventional computers use a cognitive approach to problem solving;
the way the problem is to solved must be known and stated in small unambiguous instruction.
These instructions are then converted to a high level language program and then into machine
code that the computer can understand. These machines are totally predictable; if anything
goes wrong is due to a software or hardware fault.

Neural networks and conventional algorithmic computers are not in competition but
complement each other. There are tasks are more suited to an algorithmic approach like
arithmetic operations and tasks that are more suited to neural networks. Even more, a large
number of tasks, require systems that use a combination of the two approaches (normally a
conventional computer is used to supervise the neural network) in order to perform at
maximum efficiency.

4.4 Architecture of neural networks

The commonest type of artificial neural network consists of three groups, or layers, of
units: a layer of ³input´ units is connected to a layer of ³hidden´ units, which is connected to
a layer of ³output´.

M The activity of the input units represents the raw information that is fed into the
network.
M The activity of each hidden unit is determined by the activities of the input units and
the weights on the connections between the input and the hidden units.
M The behaviour of the output units depends on the activity of the hidden units and the
weights between the hidden and output units.

          


c 
 
        !(


Fig.4.1 Back propogation to Nurel networks

          


c 
 
        ")


Block Diagram

 -  + +  4   

*    .
  + 

 
+     ,     
 5  
+  /       + 


0
1
   

 2 
 3 +

 + 

Figure 4.2 Block diagram of the face recognition system using Eigenface algorithm

Perform image pre-processing and normalization

In image processing, normalization is a process that changes the range of pixel


intensity values. Applications include photographs with poor contrast due to glare, for
example. Normalization is sometimes called contrast stretching. In more general fields of
data processing, such as digital signal processing, it is referred to as dynamic range
expansion.

          


c 
 
        "


The purpose of dynamic range expansion in the various applications is usually


to bring the image, or other type of signal, into a range that is more familiar or normal to the
senses, hence the term normalization. Often, the motivation is to achieve consistency in
dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. For
example, a newspaper will strive to make all of the images in an issue share a similar range of
grayscale.

Normalization is a linear process. If the intensity range of the image is 50 to


180 and the desired range is 0 to 255 the process entails subtracting 50 from each of pixel
intensity, making the range 0 to 130. Then each pixel intensity is multiplied by 255/130,
making the range 0 to 255. Auto-normalization in image processing software typically
normalizes to the full dynamic range of the number system specified in the image file format.
The normalization process will produce iris regions, which have the same constant
dimensions, so that two photographs of the same iris under different conditions will have
characteristic features at the same spatial location.

          


c 
 
        "!


CHAPTER 5

EIGEN VECTOR VAÿUES OF IMAGE

An eigen vector is a vector that is scaled by a linear transformation, but not


moved. Think of an eigen vector as an arrow whose direction is not changed. It may stretch,
or shrink, as space is transformed, but it continues to point in the same direction. Most
arrows will move, as illustrated by a spinning planet, but some vectors will continue to point
in the same direction, such as the north pole.

The scaling factor of an eigen vector is called its eigen value. An eigen value only makes
sense in the context of an eigen vector, i.e. the arrow whose length is being changed. In the
plane, a rigid rotation of 90° has no eigen vectors, because all vectors move. However, the
reflection y = -y has the x and y axes as eigen vectors. In this function, x is scaled by 1 and y
by -1, the eigen values corresponding to the two eigen vectors. All other vectors move in the
plane. The y axis, in the above example, is subtle. The direction of the vector has been
reversed, yet we still call it an eigen vector, because it lives in the same line as the original
vector. It has been scaled by -1, pointing in the opposite direction. An eigen vector stretches,
or shrinks, or reverses course, or squashes down to 0. The key is that the output vector is a
constant (possibly negative) times the input vector.

These concepts are valid over a division ring, as well as a field. Multiply by K on the left to
build the K vector space, and apply the transformation, as a matrix, on the right. However,
the following method for deriving eigen values and vectors is based on the determinant, and
requires a field.

5.1 Finding Eigen Values and Vectors

Given a matrix M implementing a linear transformation, what are its eigen vectors and
values? Let the vector x represent an eigen vector and let l be the eigen value. We must
solve x*M = lx. Rewrite lx as x times l times the identity matrix and subtract it from both
sides. The right side drops to 0, and the left side is x*M-x*l*identity. Pull x out of both
factors and write x*Q = 0, where Q is M with l subtracted from the main diagonal. The eigen
vector x lies in the kernel of the map implemented by Q. The entire kernel is known as the
eigen space, and of course it depends on the value of l.

          


c 
 
        ""


If the eigen space is nontrivial then the determinant of Q must be 0. Expand the determinant,
giving an n degree polynomial in l. (This is where we need a field, to pull all the entries to
the left of l, and build a traditional polynomial.) This is called the characteristic polynomial
of the matrix. The roots of this polynomial are the eigen values. There are at most n eigen
values.

Substitute each root in turn and find the kernel of Q. We are looking for the set of vectors x
such that x*Q = 0. Let R be the transpose of Q and solve R*x = 0, where x has become a
column vector. This is a set of simultaneous equations that can be solved using gaussian
elimination. In summary, a somewhat straightforward algorithm extracts the eigen values, by
solving an n degree polynomial, then derives the eigen space for each eigen value. Some
eigen values will produce multiple eigen vectors, i.e. an eigen space with more than one
dimension. The identity matrix, for instance, has an eigen value of 1, and an n-dimensional
eigen space to go with it. In contrast, an eigen value may have multiplicity > 1, yet there is
only one eigen vector. This is illustrated by [1,1|0,1], a function that tilts the x axis
counterclockwise and leaves the y axis alone. The eigen values are 1 and 1, and the eigen
vector is 0,1, namely the y axis.

The Same Eigen Value

Let two eigen vectors have the same eigen value. specifically, let a linear map multiply the
vectors v and w by the scaling factor l. By linearity, 3v+4w is also scaled by l. In fact every
linear combination of v and w is scaled by l. When a set of vectors has a common eigen
value, the entire space spanned by those vectors is an eigen space, with the same eigen value.
This is not surprising, since the eigen vectors associated with l are precisely the kernel of the
transfoormation defined by the matrix M with l subtracted from the main diagonal. This
kernel is a vector space, and so is the eigen space of l. Select a basis b for the eigen space of
l. The vectors in b are eigen vectors, with eigen value l, and every eigen vector with eigen
value l is spanned by b. Conversely, an eigen vector with some other eigen value lies outside
of b.

Different Eigen Values

Different eigen values always lead to independent eigen spaces. Suppose we have the shortest
counterexample. Thus c1x1 + c2x2 + « + ck xk = 0. Here x1 through xk are the eigen vectors,

          


c 
 
        "#


and c1 through ck are the coefficients that prove the vectors form a dependent set.
Furthermore, the vectors represent at least two different eigen values. Let the first 7 vectors
share a common eigen value l. If these vectors are dependent then one of them can be
expressed as a linear combination of the other 6. Make this substitution and find a shorter list
of dependent eigen vectors that do not all share the same eigen value. The first 6 have eigen
value l, and the rest have some other eigen value. Remember, we selected the shortest list, so
this is a contradiction. Therefore the eigen vectors associated with any given eigen value are
independent. Scale all the coefficients c1 through ck by a common factor s. This does not
change the fact that the sum of cixi is still zero. However, other than this scaling factor, we
will prove there are no other coefficients that carry the eigen vectors to 0.
If there are two independent sets of coefficients that lead to 0, scale them so
the first coefficients in each set are equal, then subtract. This gives a shorter linear
combination of dependent eigen vectors that yields 0. More than one vector remains, else
cjxj = 0, and xj is the 0 vector. We already showed these dependent eigen vectors cannot
share a common eigen value, else they would be linearly independent; thus multiple eigen
values are represented. This is a shorter list of dependent eigen vectors with multiple eigen
values, which is a contradiction. If a set of coefficients carries our eigen vectors to 0, it must
be a scale multiple of c1 c2 c3 « ck. Now take the sum of cixi and multiply by M on the
right. In other words, apply the linear transformation. The image of 0 ought to be 0. Yet
each coefficient is effectively multiplied by the eigen value for its eigen vector, and not all
eigen values are equal. In particular, not all eigen values are 0.

5.2 Axis of Rotation

Here is a simple application of eigen vectors. A rigid rotation in 3 space always has an axis
of rotation. Let M implement the rotation. The determinant of M, with l subtracted from its
main diagonal, gives a cubic polynomial in l, and every cubic has at least one real root. Since
lengths are preserved by a rotation, l is ±1. If l is -1 we have a reflection. So l = 1, and the
space rotates through some angle ș about the eigen vector. That's why every planet, every
star, has an axis of rotation.

          


c 
 
        "$


Matching Algorithm

Here you can do Both images are Same Displays the results Match Found or Not Found

*   +     , 

 
 +    + 

 
   
  

  +  6

 
 + + /

   
  + 

Ô      + 


 
  


  2 
0  



Fig.5.1 Flow chart for finding images are same or not

          


c 
 
        "%


5.3 Outline a typical Face recognition system

Modules in face recognition

The acquisition module

This is the entry point of the face recognition process. It is the module where the face image
under consideration is presented to the system. In other words, the user is asked to present a
face image to the face recognition system in this module. An acquisition module can request
a face image from several different environments: The face image can be an image file that is
located on a magnetic disk, it can be captured by a frame grabber and camera or it can be
scanned from paper with the help of a scanner.

The pre-processing module

In this module, by means of early vision techniques, face images are normalized and if
desired, they are enhanced to improve the recognition performance of the system. Some or all
of the following pre-processing steps may be implemented in a face recognition system:

1. Image size (resolution) normalization: it is usually done to change the acquired image
size to a default image size on which the face recognition system operates.
2. Histogram equalization: it is usually done on too dark or too bright images in order to
enhance the image quality and to improve face recognition performance. It modifies
the dynamic range (contrast range) of the image and as a result, some important facial
features become more apparent.
3. Median filtering: for noisy images especially obtained from a camera or from a frame
grabber, median filtering can clean the image without loosing information.
4. High-pass filtering: feature extractors that are based on facial outlines may benefit the
results that are obtained from an edge detection scheme. High-pass filtering
emphasizes the details of an image such as contours, which can dramatically improve
edge detection performance.
5. Background removal: in order to deal primarily with facial information itself, face
background can be removed. This is especially important for face recognition systems
where entire information contained in the image module should be capable of
determining the face outline.

          


c 
 
        "&


6. Translational and rotational normalizations: in some cases, it is possible to work on a


face image in which the head is somehow shifted or rotated. The head plays the key
role in the determination of facial features. Especially for face recognition systems
that are based on the frontal views of faces, it may be desirable that the pre-processing
module determines and if possible, normalizes the shifts and rotations in the head
position.
7. Illumination normalization: face images taken under different illuminations can
degrade recognition performance especially for face recognition systems based on the
principal component analysis in which entire face information is used for recognition.
Hence, normalization is done to account for this.
The feature extraction module

After performing some pre-processing (if necessary), the normalized face image is presented
to the feature extraction module in order to find the key features that are going to be used for
classification. In other words, this module is responsible for composing a feature vector that
is well enough to represent the face image.

The classification module

In this module, with the help of a pattern classifier, extracted features of the face image is
compared with the ones stored in a face library (or face database). After doing this
comparison, face image is classified as either known or unknown.

Training set

Training sets are used during the ³learning phase´ of the face recognition process. The
feature extraction and the classification modules adjust their parameters in order to achieve
optimum recognition performance by making use of training sets.

5.4 Problems that may occur during Face Recogni tion

Due to the dynamic nature of face images, a face recognition system encounters various
problems during the recognition process. It is possible to classify a face recognition system as
either ³robust´ or ³weak´ based on its recognition performances under these circumstances.

          


c 
 
        "'


The objectives of a robust face recognition system are given below:

1. Scale invariance: the same face can be presented to the system at different scales.
This may happen due to the focal distance between the face and the camera. As this
distance gets closer, the face image gets bigger.
2. Shift invariance: the same face can be presented to the system at different
perspectives and orientations. For instance, face images of the same person could be
taken from frontal and profile views. Besides, head orientation may change due to
translations and rotations.
3. Illumination invariance: face images of the same person can be taken under
different illumination conditions such as, the position and the strength of the light
source can be modified.
4. Emotional expression and detail invariance: face images of the same person can
differ in expressions when smiling or laughing. Also, some details such as dark
glasses, beards or moustaches can be present.
5. Noise invariance: a robust face recognition system should be insensitive to noise
generated by frame grabbers or cameras. Also, it should function under partially
occluded images.

          


c 
 
        "(


CHAPTER 6

DEVEÿOPING TOOÿS

6.1 MATÿAB Introduction

MATLAB is a high performance language for technical computing .It integrates computation
visualization and programming in an easy to use environment

Mat lab stands for matrix laboratory. It was written originally to provide easy access to
matrix software developed by LINPACK (linear system package) and EISPACK (Eigen
system package) projects.

MATLAB is therefore built on a foundation of sophisticated matrix software in which the


basic element is matrix that does not require pre dimensioning

Typical uses of MATÿAB

1. Math and computation


2. Algorithm development
3. Data acquisition
4. Data analysis ,exploration ands visualization
5. Scientific and engineering graphics
The main features of MATÿAB

1. Advance algorithm for high performance numerical computation ,especially in the


Field matrix algebra
2. A large collection of predefined mathematical functions and the ability to define one¶s
own functions.
3. Two-and three dimensional graphics for plotting and displaying data
4. A complete online help system
5. Powerful, matrix or vector oriented high level programming language for individual
applications.
6. Toolboxes available for solving advanced problems in several application areas
Features and capabilities of MATLAB

          


c 
 
        #)




 


 

     

 &' , 

 !  !(   & -

" ! )! ! .


 (,

/0 )! !
0 '! !
"0 & 1'
20

3
40 &''
50 ( 
60 )

Block Diagram

6.2 DIP using MAT ÿAB

MATÿAB deals with

1. Basic flow control and programming language


2. How to write scripts (main functions) with matlab
3. How to write functions with matlab
4. How to use the debugger
5. How to use the graphical interface

          


c 
 
        #


6. Examples of useful scripts and functions for image processing


After learning about matlab we will be able to use matlab as a tool to help us with our
maths, electronics, signal & image processing, statistics, neural networks, control and
automation.

Matlab resources

Language: High level matrix/vector language with

à Scripts and main programs


à Functions
à Flow statements (for, while)
à Control statements (if, else)
à data structures (struct, cells)
à input/ouputs (read,write,save)
à object oriented programming.
Environment

à Command window.
à Editor
à Debugger
à Profiler (evaluate performances)
Mathematical libraries

à Vast collection of functions


API

à Call c function from matlab


à Call matlab functions from c
Scripts and main programs

In matlab, scripts are the equivalent of main programs. The variables declared in a
script are visible in the workspace and they can be saved. Scripts can therefore take a lot of
memory if you are not careful, especially when dealing with images. To create a script, you
will need to start the editor, write your code and run it.

          


c 
 
        #!


6.3 MATÿAB functions

1.imread: Read images from graphics files.

Syntax:

A = imread(filename,fmt)

[X,map] = imread(filename,fmt)

[...] = imread(filename)

[...] = imread(...,idx) (TIFF only)

[...] = imread(...,ref) (HDF only)

[...] = imread(...,'BackgroundColor',BG) (PNG only)

[A,map,alpha] = imread(...) (PNG only)

Description:

A = imread(filename,fmt) reads a grayscale or truecolor image named filename into


A. If the file contains a grayscale intensity image, A is a two-dimensional array. If the file
contains a truecolor (RGB) image, A is a three-dimensional (m-by-n-by-3) array.

[X,map] = imread(filename,fmt) reads the indexed image in filename into X and its
associated colormap into map. The colormap values are rescaled to the range [0,1]. A and
map are two-dimensional arrays.

[...] = imread(filename) attempts to infer the format of the file from its content.

filename is a string that specifies the name of the graphics file, and fmt is a string that
specifies the format of the file. If the file is not in the current directory or in a directory in the
MATLAB path, specify the full pathname for a location on your system. If imread cannot
find a file named filename, it looks for a file named filename.fmt. If you do not specify a
string for fmt, the toolbox will try to discern the format of the file by checking the file header.

          


c 
 
        #"


6.1 This table lists the possible values for fmt.

Format File type

'bmp' Windows Bitmap (BMP)

'hdf' Hierarchical Data Format (HDF)

'jpg' or 'jpeg' Joint Photographic Experts Group (JPEG)

'pcx' Windows Paintbrush (PCX)

`png' Portable Network Graphics (PNG)

'tif' or 'tiff' Tagged Image File Format (TIFF)

'xwd' X Windows Dump (XWD)

Special Case Syntax

TIFF-Specific Syntax:

[...] = imread(...,idx) reads in one image from a multi-image TIFF file. idx is an
integer value that specifies the order in which the image appears in the file. For example, if
idx is 3, imread reads the third image in the file. If you omit this argument, imread reads the
first image in the file. To read all ages of a TIFF file, omit the idx argument.

PNG-Specific Syntax:

The discussion in this section is only relevant to PNG files that contain transparent
pixels. A PNG file does not necessarily contain transparency data. Transparent pixels, when
they exist, will be identified by one of two components: a a    
or an 
. (A PNG file can only have one of these components, not both.) The transparency
chunk identifies which pixel values will be treated as transparent, e.g., if the value in the
transparency chunk of an 8-bit image is 0.5020, all pixels in the image with the color 0.5020
can be displayed as transparent. An alpha channel is an array with the same number of pixels
as are in the image, which indicates the transparency status of each corresponding pixel in the
image (transparent or nontransparent). Another potential PNG component related to

          


c 
 
        ##


transparency is the background color chunk, which (if present) defines a color value that can
be used behind all transparent pixels. This section identifies the default behavior of the
toolbox for reading PNG images that contain either a transparency chunk or an alpha channel,
and describes how you can override it.

HDF-Specific syntax:

[...] = imread(...,ref) reads in one image from a multi-image HDF file. ref is an integer
value that specifies the reference number used to identify the image. For example, if ref is 12,
imread reads the image whose reference number is 12. (Note that in an HDF file the reference
numbers do not necessarily correspond to the order of the images in the file. You can use
imfinfo to match up image order with reference number.) If you omit this argument, imread
reads the first image in the file. .

6.2 This table summarizes the types of images that imread can read

Format Variants

1-bit, 4-bit, 8-bit, and 24-bit uncompressed images; 4-bit and 8-bit run-length
BMP
encoded (RLE) images

8-bit raster image datasets, with or without associated colormap; 24-bit raster image
HDF
datasets

Any baseline JPEG image (8 or 24-bit); JPEG images with some commonly used
JPEG
extensions

PCX 1-bit, 8-bit, and 24-bit images

Any PNG image, including 1-bit, 2-bit, 4-bit, 8-bit, and 16-bit grayscale images; 8-
PNG
bit and 16-bit indexed images; 24-bit and 48-bit RGB images

Any baseline TIFF image, including 1-bit, 8-bit, and 24-bit uncompressed images; 1-
TIFF bit, 8-bit, 16-bit, and 24-bit images with packbits compression; 1-bit images with
CCITT compression; also 16-bit grayscale, 16-bit indexed, and 48-bit RGB images.

XWD 1-bit and 8-bit ZPixmaps; XYBitmaps; 1-bit XYPixmaps

          


c 
 
        #$


2.imshow: Display image

Syntax

imshow(I)

imshow(I,[low high])

imshow(RGB)

imshow(BW)

imshow(X,map)

imshow(filename)

himage = imshow(...)

imshow(..., param1, val1, param2, val2,...)

Description

imshow(I) displays the grayscale image I.

imshow(I,[low high]) displays the grayscale image I, specifying the display range for
I in [low high]. The value low (and any value less than low) displays as black; the value high
(and any value greater than high) displays as white. Values in between are displayed as
intermediate shades of gray, using the default number of gray levels. If you use an empty
matrix ([]) for [low high], imshow uses [min(I(:)) max(I(:))]; that is, the minimum value in I
is displayed as black, and the maximum value is displayed as white.

imshow(RGB) displays the truecolor image RGB.

imshow(BW) displays the binary image BW. imshow displays pixels with the value 0
(zero) as black and pixels with the value 1 as white.

imshow(X,map) displays the indexed image X with the colormap map. A color map
matrix may have any number of rows, but it must have exactly 3 columns. Each row is
interpreted as a color, with the first element specifying the intensity of red light, the second
green, and the third blue. Color intensity can be specified on the interval 0.0 to 1.0.

          


c 
 
        #%


imshow(filename) displays the image stored in the graphics file filename. The file
must contain an image that can be read by imread or dicomread. imshow calls imread or
dicomread to read the image from the file, but does not store the image data in the MATLAB
workspace. If the file contains multiple images, the first one will be displayed. The file must
be in the current directory or on the MATLAB path.

Remarks

imshow is the toolbox's fundamental image display function, optimizing figure, axes,
and image object property settings for image display. imtool provides all the image display
capabilities of imshow but also provides access to several other tools for navigating and
exploring images, such as the Pixel Region tool, Image Information tool, and the Adjust
Contrast tool. imtool presents an integrated environment for displaying images and
performing some common image processing tasks.

Examples

Display an image from a file.

X= imread('moon.tif');

imshow(X).

          


c 
 
        #&


6.4 MATÿAB Desktop

Introduction

When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user
interfaces) for managing files, variables, and applications associated with MATLAB. The
following illustration shows the default desktop. You can customize the arrangement of tools
and documents to suit your needs. For more information about the desktop tools .

          


c 
 
        #'


6.5 Implementations

1. Arithmetic operations Entering Matrices

The best way for you to get started with MATLAB is to learn how to handle
matrices. Start MATLAB and follow along with each example. You can enter matrices into
MATLAB in several different ways:

m Enter an explicit list of elements.

m Load matrices from external data files.

m Generate matrices using built-in functions.

m Create matrices with your own functions in M-files. Start by entering Dürer¶s matrix as a
list of its elements. You only have to follow a few basic conventions:

m Separate the elements of a row with blanks or commas.

m Use a semicolon, to indicate the end of each row.

m Surround the entire list of elements with square brackets, [ ]. To enter matrix, simply type
in the Command Window

A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

MATLAB displays the matrix you just entered:

A =16 3 2 13

5 10 11 8

9 6 7 12

4 15 14 1

This matrix matches the numbers in the engraving. Once you have entered the matrix, it is
automatically remembered in the MATLAB workspace. You can refer to it simply as A. Now
that you have A in the workspace, sum, transpose, and diag

You are probably already aware that the special properties of a magic square have to do with
the various ways of summing its elements. If you take the sum along any row or column, or

          


c 
 
        #(


along either of the two main diagonals, you will always get the same number. Let us verify
that using MATLAB. The first statement to try is sum(A)

MATLAB replies with

ans =34 34 34 34

When you do not specify an output variable, MATLAB uses the variable ans, short for
answer, to store the results of a calculation. You have computed a row vector containing the
sums of the columns of A. Sure enough, each of the columns has the same sum, the magic
sum, 34.

How about the row sums? MATLAB has a preference for working with the columns of a
matrix, so one way to get the row sums is to transpose the matrix, compute the column sums
of the transpose, and then transpose the result. For an additional way that avoids the double
transpose use the dimension argument for the sum function. MATLAB has two transpose
operators. The apostrophe operator (e.g., A') performs a complex conjugate transposition. It
flips a matrix about its main diagonal, and also changes the sign of the imaginary component
of any complex elements of the matrix. The apostrophe-dot operator (e.g., A'.), transposes
without affecting the sign of complex elements. For matrices containing all real elements, the
two operators return the same result.

So A' produces

ans =

16 5 9 4

3 10 6 15

2 11 7 14

13 8 12 1

And sum(A')' produces a column vector containing the row sums

ans =

34

34

          


c 
 
        $)


34

34

The sum of the elements on the main diagonal is obtained with the sum and the diag
functions:

diag(A) produces

ans =

16

10

And sum(diag(A)) produces

ans =

34

The other diagonal, the so-called anti diagonal, is not so important Mathematically, so
MATLAB does not have a ready-made function for it. But a function originally intended for
use in graphics, fliplr, flips a matrix From left to right:

Sum (diag(fliplr(A)))

ans =

34

You have verified that the matrix in Dürer¶s engraving is indeed a magic Square and, in the
process, have sampled a few MATLAB matrix operations.

Operators

Expressions use familiar arithmetic operators and precedence rules.

+ Addition

          


c 
 
        $


- Subtraction

* Multiplication

/ Division

\ Left division (described in ³Matrices and Linear Algebra´ in the

MATLAB documentation)

. ^ Power

' Complex conjugate transpose

( ) Specify evaluation order

Generating Matrices

MATLAB provides four functions that generate basic matrices.

zeros All zeros

ones All ones

rand Uniformly distributed random elements

randn Normally distributed random elements

Here are some examples:

Z = zeros(2,4)

Z=

0000

0000

F = 5*ones(3,3)

F=

555

555

          


c 
 
        $!


555

N = fix(10*rand(1,10))

N=

9264874084

R = randn(4,4)

R=

0.6353 0.0860 -0.3210 -1.2316

-0.6014 -2.0046 1.2366 1.0556

0.5512 -0.4931 -0.6313 -0.1132

-1.0998 0.4620 -2.3252 0.3792

M-Files

You can create your own matrices using M-files, which are text files containing MATLAB
code. Use the MATLAB Editor or another text editor to create a file Containing the same
statements you would type at the MATLAB command Line. Save the file under a name that
ends in .m.For example, create a file containing these five lines: A = [...

16.0 3.0 2.0 13.0

5.0 10.0 11.0 8.0

9.0 6.0 7.0 12.0

4.0 15.0 14.0 1.0 ];

Store the file under the name magik.m. Then the statement magik reads the file and creates a
variable, A, containing our example matrix.

6.6 Graph Components

MATLAB displays graphs in a special window known as a figure. To create a graph, you
need to define a coordinate system. Therefore every graph is placed within axes, which are
contained by the figure. The actual visual representation of the data is achieved with graphics

          


c 
 
        $"


objects like lines and surfaces. These objects are drawn within the coordinate system defined
by the axes, which MATLAB automatically creates specifically to accommodate the range of
the data. The actual data is stored as properties of the graphics objects.

Plotting Tools

Plotting tools are attached to figures and create an environment for creating Graphs. These
tools enable you to do the following:

m Select from a wide variety of graph types

m Change the type of graph that represents a variable

m See and set the properties of graphics objects

m Annotate graphs with text, arrows, etc.

m Create and arrange subplots in the figure

m Drag and drop data into graphs

Display the plotting tools from the View menu or by clicking the plotting tools icon in the
figure toolbar, as shown in the following picture.

          


c 
 
        $#


Editor/Debugger

Use the Editor/Debugger to create and debug M-files, which are programs you write to run
MATLAB functions. The Editor/Debugger provides a graphical user interface for text
editing, as well as for M-file debugging. To create or edit an M-file use File > New or File >
Open, or use the edit function.

          


c 
 
        $$


CHAPTER 7

CONCÿUSION AND FUTURE SCOPE

7.1 Conclusion

High information redundancy and strong correlations in face images


result in inefficiencies when such images are used directly in recognition tasks. In this
project, Discrete Cosine Transforms (DCTs) are used to reduce image information
redundancy because only a subset of the transform coefficients are necessary to preserve the
most important facial features, such as hair outline, eyes and mouth. We demonstrate
experimentally that when DCT coefficients are fed into a backpropagation neural network for
classification, high recognition rates can be achieved using only a small proportion (0.19%)
of available transform components.

7.2Future scope

Based on energy probability, we propose a new feature extraction method for


face recognition. Our method consists of three steps. First, face images are transformed into
DCT domain. Second, DCT domain acquired from face image is applied on energy
probability for the purpose of dimension reduction of data and optimization of valid
information., Third, in order to obtain the most silent and invariant feature of face images, the
LDA is applied in the data extracted from the frequency mask that can facilitate the selection
of useful DCT frequency bands for image recognition, because not all the bands are useful in
classification. At last, it will extract the linear discriminative features by LDA and perform
the classification by the nearest neighbor classifier. For the purpose of dimension reduction of
data and optimization of valid information, the proposed method has shown better
recognition performance than PCA plus LDA and existing DCT method.

          


c 
 
        $%


7.3 References

[1] .C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: OxfordUniversity
press,1995
[2]. B. Chalmond and S. Girard, ³Nonlinear modeling of scattered multivariate data and its
application to shape change,´ IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, no. 5, pp. 422-432, 1999.
[3]. R. Chellappa, C. L. Wilson, and S. Sirohey, ³Human and machine recognition of faces: A
survey,´ Proceedingsof the IEEE, vol. 83, no. 5, pp. 705-740, 1995.
[4] C. Christopoulos, J. Bormans, A. Skodras, and J. Cornelis, ³Efficient computation of the
two-dimensional fast cosine transform,´ in SPIE Hybrid Image and Signal Processing
IV, pp. 229-237, 1994.
[5]. R. Gonzalez and R. Woods, Digital Image Processing. Reading, MA: Addison-Wesley,
1992.
[6]. A. Hyvarinen, ³Survey on independent component analysis,´ Neural Computing Surveys,
2, pp. 94-128, 1999. J. Karhunen and J. Joutsensalo, ³Generalization of principal
component analysis, optimization problems and neural networks,´ Neural Networks,
vol.8, no. 4, pp. 549-562, 1995.
[7]. M. Kirby and L. Sirovich, ³Application of the Karhunen-Loeve procedure for the
characterization of human faces,´ IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 12, no. 1, pp. 103-108, 1990.
[8]. S. Lawrence, C. Lee Giles, A. Tsoi, and A. Back, ³Face recognition: A convolutional
neural network approach,´ IEEE Transactions on Neural Networks, vol. 8, no. 1, pp.
98-113,1997.
[9]. C. Nebauer, ³Evaluation of convolutional neural networks for visual recognition,´ IEEE
Transactions on Neural Networks, vol. 9, no. 4, pp. 685-696, 1998.
[10]. Z. Pan, R. Adams, and H. Bolouri, ³Dimensionality reduction of face images using
discrete cosine transforms for recognition.´ submitted to IEEE Conference on
Computer Vision and Pattern Recognition, 2000.
[11]. F. Samaria, Face Recognition using Hidden Markov Models. PhD thesis, Cambridge
University, 1994.

          


c 
 
        $&


[12]. E. Saund, ³Dimensionality-reduction using connectionist networks,´ IEEE Transactions


on Pattern Analysis and Machine Intelligence, vol. 11, no. 3, pp. 304-314, 1989.
[13].D. Valentin, H. Abdi, A. O¶Toole, and G. Cottrell, ³Connectionist models of face
processing: A survey,´ Pattern Recognition, vol. 27, pp. 1209-1230, 1994.
[14]. M. S. Bartlett et al. Face recognition by independent component analysis. IEEE Trans.
on Neural Networks, 13(6):1450 1454, 2002.
[15]. P. N. Belhumeur et al. Eigenfaces vs. fisherfaces: Recognition using class specific
linear projection. IEEE Trans. on PAMI, 19(7):711 720, 1997.
[16]. R. Gottumukkal and V. K. Asari. An improved face recognition technique based on
modular PCA approach. Pattern Recognition Letters, 25(4), 2004.
[17]. Z. M. Hafed and M. D. Levine. Face recognition using the discrete cosine transform.
International Journal of Computer Vision, 43(3), 2001.
[18]. B. Heisele et al. Face recognition with support vector machines: Global versus
component-based approach. In ICCV, pages 688 694, 2001.
[19]. T. Kanade. Picture processing by computer complex and recognition of human faces.
Technical report, Kyoto Univ., Dept. Inform. Sci., 1973.

          

You might also like