Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
Face recognition by humans is a high level visual task for which it has
been extremely difficult to construct detailed neurophysiological and psychophysical models.
This is because faces are complex natural stimuli that differ dramatically from the artificially
constructed data often used in both human and computer vision research. Thus, developing a
computational approach to face recognition can prove to be very difficult indeed. In fact,
despite the many relatively successful attempts to implement computerbased face recognition
systems, we have yet to see one which combines speed, accuracy, and robustness to face
variations caused by 3D pose, facial expressions, and aging. The primary difficulty in
analyzing and recognizing human faces arises because variations in a single face can be very
large, while variations between different faces are quite small. That is, there is an inherent
structure to a human face, but that structure exhibits large variations due to the presence of a
multitude of muscles in a particular face. Given that recognizing faces is critical for humans
in their everyday activities, automating this process would be very useful in a wide range of
applications including security, surveillance, criminal identification, and video compression.
This paper discusses a new computational approach to face recognition that, when combined
with proper face localization techniques, has proved to be very efficacious. This section
begins with a survey of the face recognition research performed to date. The proposed
approach is then presented along with its objectives and
the motivations for choosing it. The section concludes with an overview of the structure of
the paper.
Face Recognition
designed systems installed in airports, multiplexes and other public places can detect the
presence of criminals among the crowd.
The development and implementation of face recognition systems is totally dependent in the
development of computers, since without computers the efficient use of the algorithms is
impossible. So the history of face recognition goes side by side with the history of computers.
Research in automatic face recognition dates back at least until the 1960¶s. Bledsoe, in 1966,
was the first to attempt semi-automated face recognition with a hybrid human computer
system that classified faces on the basis of fiducial marks entered on photographs by hand.
Parameters for the classification were normalized distances and ratios among points such as
eye corners, mouth corners, nose tip and chin point. Later work at Bell
laboratories(Goldstein, Harmon and Lesk,1971; Harmon, 1971) developed a vector of upto
21 features and recognized faces using standard pattern classification techniques. The chosen
features were largely subjective evaluation (e.g. shade of hair, length of ears, lip thickness)
made by human subjects, each of which would be difficult to automate.
An early paper by Fischler and Elschlager (1973) attempted to measure similar features
automatically. They described a linear embedding algorithm that used local feature template
matching and a global measure of fit to find and measure facial features. This template
matching approach has been continued and improved by recent work of Yuille, Cohen and
Hallinan (1989). Their strategy is based on ³deformable templates´, which are parameterized
models of the face and its features in which the parameter values are determined by
interaction with the image.Connectionalist approach to face identification seeks to capture the
configurational or gestate-like nature of the task. Kohonen (1989) and Kohonen and Lahito
(1981) describe an associative network with a simple learning algorithm that can recognize
(classify) face images and recall a face image from an incomplete or noisy version input to
the network. Fleming and Cottrell (1990) extend these ideas using nonlinear units, training
the system by back propagation. Stonham¶s WISARD system (1986) is a general pattern
recognition devise based on neutral net principles. It has been applied with some success to
binary face images, recognizing both identity and expression. Most connectionist system
dealing with faces treat the input image as a general 2-D pattern, and can make no explicit
use of the configurational prosperities of face. Moreover, some of these systems require an
inordinate number of training examples to achieve a reasonable level of performance. Kirby
and Sirovich were among the first to apply principal component analysis (PCA) to face
images and showed that PCA is an optimal compression scheme that minimizes the mean
squared error between the original images and their reconstructions for any given level of
compression. Turk and Pentland popularized the use of PCA for face recognition. They used
PCA to compute a set of subspace basis vectors (which they called ³eigenfaces´) for a
database of face images and projected the images in the database into the compressed
subspace. New test images were then matched to images in the database by projecting them
onto the basis vectors and finding the nearest compressed image in the subspace(eigenspace).
The initial success of eigenfaces popularized the idea of matching images in compressed
subspaces. Researchers began to search for other subspaces that might improve performance.
One alternative is Fisher¶s Linear Discriminant Analysis (LDA, a.k.a. ³fisherfaces´). For any
N-class classification problem, the goal of LDA is to find the N-1 basis vectors that maximize
the interclass distances while minimizing the intraclass distances.
This is the entry point of the face recognition process. It is the module where the face image
under consideration is presented to the system. In other words, the user is asked to present a
face image to the face recognition system in this module. An acquisition module can request
a face image from several different environments: The face image can be an image file that is
located on a magnetic disk, it can be captured by a frame grabber and camera or it can be
scanned from paper with the help of a scanner.
In this module, by means of early vision techniques, face images are normalized and if
desired, they are enhanced to improve the recognition performance of the system. Some or all
of the pre-processing steps may be implemented in a face recognition system
After performing some pre-processing (if necessary), the normalized face image is presented
to the feature extraction module in order to find the key features that are going to be used for
classification. In other words, this module is responsible for composing a feature vector that
is well enough to represent the face image.
In this module, with the help of a pattern classifier, extracted features of the face image is
compared with the ones stored in a face library (or face database). After doing this
comparison, face image is classified as either known or unknown.
Evaluations of these eigenvectors are quite difficult for typical image sizes but, an
approximation that is suitable for practical purposes is also presented. Recognition is
performed by projecting a new image into the subspace spanned by the Eigenfaces and
then classifying the face by comparing its position in face space with the positions of
known individuals.
Information carrying function of time is called signal. Real time signals can be
audio(voice) or video(image) signals. Still video is called an image. Moving image is called a
video. Difference between digital image processing and signals and systems is that time
graph is not there in DIP. X and Y coordinates in DIP are spatial coordinates. Time graph is
not there because photo doesn¶t change with time.
What is image?
Image : An image is defined as a two dimensional function f(x, y) where x and y are spatial
coordinates and the amplitude µf¶ at any point (x, y) is known as the intensity of image at that
point.
What is a pixel?
Pixel : A pixel(short for picture element) is a single point in a graphic image. Each such
information element is not really a dot, nor a square but an abstract sample. Each element of
the above matrix is known as a pixel where dark = 0 and light = 1. A pixel with only 1 bit
will represent a black and white image. If the number of bits are increased then the number of
gray levels will increase and a better picture quality is achieved.
All naturally occurring images are analog in nature. If the number of pixels is
more then the clarity is more. An image is represented as a matrix in DIP. In DSP we use
only row matrices. Naturally occurring images should be sampled and quantized to get a
digital image. A good image should have 1024*1024 pixels which is known as 1k * 1k = 1M
pixel.
Image acquition : Digital image acquisition is the creation of digital images typically
from a physical object. A digital image may be created directly from a physical scene by a
camera or similar device. Alternatively it can be obtained from another image in an analog
medium such as photographs, photographic film, or printed paper by a scanner or similar
device. Many technical images acquired with tomographic equipment, side-looking radar, or
radio telescopes are actually obtained by complex processing of non-image data.
Image restoration : The goal of image restoration is to start from a recorded image and to
produce the most visually pleasing image. The goal of enhancement is beauty. The goal of
restoration is truth. The measure of success in restoration is usually an error measure between
the original and the estimate image. No mathematical error function is known that
corresponds to human perceptual assessment of error.
Colour image processing : Colour image processing is based on that any colour can be
obtained by mixing 3 basic colours red, green and blue. Hence 3 matrices are necessary each
one representing each colour.
Segmentation: In the analysis of the objects in images it is essential that we can distinguish
between the objects of interest and ³the rest´. This latter group is also referred to as the
background. The techniques that are used to find the objects of interest are usually referred to
as segmentation techniques.
CHAPTER 2
Local appearance based face representation is a generic local approach and does not
require detection of any salient local regions, such as eyes, as in the modular or component
based approaches [5, 10] for face representation. Local appearance based face representation
can be performed as follows: A detected and normalized face image Implementation of face
recognition system is divided into blocks of 8x8 pixels size. Each block is then represented
by its DCT coefficients. The reason for choosing a block size of 8x8 pixels is to have small-
enough blocks in which stationarity is provided and transform complexity is kept simple on
one hand, and to have big enough blocks to provide sufficient
compression on the other hand. The top-left DCT coefficient is removed from the
representation since it only represents the average intensity value of the block. From the
remaining DCT coefficients the ones containing the highest information are extracted via zig-
zag scan.
Fusion
To fuse the local information, the extracted features from 8x8 pixels blocks can be
combined at the feature level or at the decision level.
Feature Fusion
In feature fusion, the DCT coefficients obtained from each block are concatenated to
construct the feature vector which is used by the classifier.
Decision Fusion
In decision fusion, classification is done separately on each block and later, the
individual classification results are combined. To combine the individual classification results
2.2 Definition
Ahmed, Natarajan, and Rao (1974) first introduced the discrete cosine transform (DCT) in
the early seventies. Ever since, the DCT has grown in popularity, and several variants have
been proposed (Rao and Yip, 1990). In particular, the DCT was categorized by Wang (1984)
into four slightly different transformations named DCT-I, DCT-II, DCT-III, and DCT-IV. Of
the four classesWang defined, DCT-II was the one first suggested by Ahmed et al., and it is
the one of concern in this paper.
the variance distribution measures are usually computed for random sequences of length O
that result in an auto-covariance matrix of the form:
=
1 ' '2 'Oí1
'1' 'Oí2
'Oí1 'Oí2 1
' Ł correlation coeff.
|'| 1 ««««««««««««««««(2.6)
Face Recognition Using the Discrete Cosine Transform 171
for O = 16 and ' = 0 9 (adapted from K.R. Rao and P. Yip, Discrete Cosine Transform
Algorithms, Advantages, Applications, New York: Academic, 1990). Data is shown for the
following transforms: discrete cosine transform (DCT), discrete Fourier transform (DFT),
slant transform (ST), discrete sine transform (type I) (DST-I), discrete sine transform (type II)
(DST-II), and Karhunen-Loeve transform (KLT). Figure 2.1 shows the variance distribution
for a selection of discrete transforms given a first-order Markov process of length O = 16 and
' = 0 9. The data for this curve were obtained directly from Rao and Yip (1990) in which
other curves for different lengths are also presented. The purpose here is to illustrate that the
DCT variance distribution, when compared to other deterministic transforms, decreases most
rapidly. The DCT variance distribution is also very close to that of the KLT, which confirms
its near optimality. Both of these observations highlight the potential of the DCT for data
compression and, more importantly, feature extraction.
The KLT completely decorrelates a signal in the transformdomain, minimizes MSE in data
compression, contains the most energy (variance) in the fewest number of transform
coefficients, and minimizes the total representation entropy of the input sequence (Rosenfeld
and Kak, 1976). All of these properties, particularly the first two, are extremely useful in
pattern recognition applications. The computation of the KLT essentially involves the
determination of the eigenvectors of a covariance matrix of a set of training sequences
(images in the case of face recognition). In particular, given å trainingimages of size, say, O
× O, the covariance matrix of interest is given by C = A
Au (2.7) where r is a matrix
whose columns are the å training images (after having an average face image subtracted
from each of them) reshaped into O2-element vectors. Note that because of the size of r, the
computation of the eigenvectors of may be intractable. However, as discussed in Turk and
Pentland (1991), because å is usually much smaller than O2 in face recognition, the
eigenvectors of can be obtained more efficiently by computing the eigenvectors of another
smaller matrix (see (Turk and Pentland, 1991) for details). Once the eigenvectors of are
obtained, only those with the highest corresponding eigenvalues are usually retained to form
the KLT basis set. One measure for the fraction of eigenvectors retained for the KLT basis set
is given by
=
å_
_=1
å
_=1
where is the th eigenvalue of and å_ is the number of eigenvectors forming the KLT
basis set. As can be seen from the definition of in Eq. (2.7), the KLT basis functions are
data-dependent. Now, in the case of a first-order Markov process, these basis functions can
be found analytically (Rao and Yip,1990). Moreover, these functions can be shown to be
asymptotically equivalent to the DCT basis functions as ' (of Eq. (2.6)) ĺ 1 for any given O
(Eq. (2.6)) and as O ĺfor any given ' (Rao and Yip, 1990). It is this asymptotic
equivalence that explains the near optimal performance of the DCT in terms of its variance
distribution for first-order Markov processes. In fact, this equivalence also explains the near
optimal performance of the DCT based on a handful of other criteria such as energy packing
efficiency, residual correlation, and mean-square error in estimation (Rao and Yip, 1990).
This provides a strong justification for the use of the DCT for face recognition. Specifically,
since the KLT has been shown to be very effective in face recognition (Pentland et al., 1994),
it is expected that a deterministic transform that is mathematically related to it would
probably perform just as well in the same application. 172 Hafed and Levine
As for the computational complexity of the DCT and KLT, it is evident from the above
overview that theKLT requires significant processing during training, since its basis set is
data-dependent. This overhead in computation, albeit occurring in a non-time-critical off-line
training process, is alleviated with the DCT. As for online feature extraction, the KLT of an O
× O image can be computed in å_O2 time where å_ is the number of KLT basis vectors.
In comparison, the DCT of the same image can be computed in O2log2O time because of
its relation to the discrete Fourier transform²which can be implemented efficiently using the
fast Fourier transform (Oppenheim and Schafer, 1989). This means that the DCT can be
computationally more efficient than the KLT depending on the size of the KLT basis set.2 It
is thus concluded that the discrete cosine transform is very well suited to application in face
recognition. Because of the similarity of its basis functions to those of theKLT,
theDCTexhibits striking feature extraction and data compression capabilities. In fact, coupled
with these, the ease and speed of the computation of theDCT may even favor it over the KLT
in face recognition.
CHAPTER 3
FACE NORMAÿIZATION AND RECOGNITION
The face recognition algorithm discussed in this paper is depicted in Fig. 3.1 It involves both
face normalization and recognition. Since face and eye localization is not performed
automatically, the eye coordinates of the input faces need to be entered manually in order to
normalize the faces correctly. This requirement is not a major limitation because the
algorithm can easily be invoked after running a localization system such as the one presented
in Jebara (1996) or others in the literature. As can be seen from Fig 3.2, the system receives
as input an image containing a face along with its eye coordinates. It then executes both
geometric and illumination normalization functions as will be described later. Once a
normalized (and cropped) face is obtained, it can be compared to other faces, under the same
nominal size, orientation, position, and illumination conditions.
This comparison is based on features extracted using the DCT. The basic idea here is to
compute the DCT of the normalized face and retain a certain subset of the DCT coefficients
as a feature vector Describing this face. This feature vector contains the low-to-mid
frequency DCT coefficients, as these are the ones having the highest variance. To recognize a
particular input face, the system compares this face¶s feature vector to the feature vectors of
the database faces using a Euclidean distance nearest-neighbor classifier (Duda and Hart,
1973). If the feature vector of the probe is v and that of a database face is f, then the
Euclidean distance between the two is
? =_ 0 í02 + 1 í12+
+ åí1 íåí12««««««. (3.1)
where
v = [0 1 åí1]u
f = [ 0 1 åí1] (3.2)
and å is the number of DCT coefficients retained as
features. A match is obtained by minimizing ?. Note that this approach computes the DCT on
the entire normalized image. This is different from the use of the DCT in the JPEG
compression standard (Pennebaker and Mitchell, 1993), in which the DCTis computed on
individual subsets of the image. The use of the DCT on individual subsets of an image, as
in the JPEG standard, for face recognition has been proposed in Shneier and Abdel-Mottaleb
(1996) and Eickeler et al. (2000). Also, note that this approach basically assumes no
thresholds on ?. That is, the system described always assumes that the closest match is the
correct match, and no probe is ever rejected as unknown. If a threshold is defined on ?,
then the gallery face that minimizes ? would only be output as the match when ? .
Otherwise, the probewould be declared asunknown. In this way, one can actually define a
threshold to achieve 100% recognition accuracy, but, of course, at the cost ofa certain number
of rejections. In other words, the system could end up declaring an input face as unknown
even though it exists in the gallery. Suitable values of can be obtained using the so-called
Receiver Operating Characteristic curve (ROC) (Grzybowski and Younger, 1997), as will be
illustrated later. be quite small, as will be seen in the next section. As an illustration,
Feature Extraction
To obtain the feature vector representing a face, Its DCT is computed, and only a subset of
the obtained coefficients is retained. The size of this subset is chosen such that it can
sufficiently represent a face, but it can in fact Face Recognition Using the Discrete Cosine
Transform Fig.3.2(a) shows a sample image of a face, and Fig.3.2(b) shows the low-to-mid
frequency 8 × 8subset of its DCT coefficients. It can be observed that the DCT coefficients
exhibit the expected behavior in which a relatively large amount of information about the
original image is stored in a fairly small number of coefficients. In fact, looking at Fig. 3.2
(b), we note that the DC term is more than 15,000 and the minimum magnitude in the
presented set of coefficients is less than 1. Thus there is an order of 10,000 reduction in
coefficient magnitude in the first 64 DCT coefficients. Most of the discarded coefficients
have magnitudes less than 1. For the purposes of this paper, square subsets, similar to the one
shown in Fig. 3.2(b), are used for the feature vectors. It should be noted that the size of the
subset of DCT coefficients retained as a feature vector may not be large enough for achieving
an accurate reconstruction of the input image. That is, in the case of face recognition, data
compression ratios larger than the ones necessary to render accurate reconstruction of input
images are175 Hafed and Levine (a) (b) encountered
Figure 3.2 Typical face image (a) of size 128 × 128 and an 8 × 8 subset of its DCT (b).
This observation, of course, has no ramifications on the performance evaluation
ofthe system, because accurate reconstruction is not a requirement. In fact, this situation was
also encountered in Turk and Pentland (1991) where the KLT coefficients used in face
recognition were not sufficient to achieve a subjectively acceptable facial reconstruction.
Figure 3.3 shows the effect of using a feature vector of size 64 to reconstructa typical face
image. Now, it may be the case that one chooses to use more DCT coefficients to represent
faces. However, there could be a cost associated with doing so. Specifically, more
coefficients do not necessarily imply better recognition results, because by adding them, one
may actually be representing more irrelevant information (Swets and Weng, 1996).
(a)
(b)
Figure 3.3 Effect of reconstructing a 128 × 128 image using only 64 DCT coefficients: (a)
original (b) reconstructed.
3.2 Normalization
Two kinds of normalization are performed in the proposed face recognition system. The first
deals with geometric distortions due to varying imaging conditions. Face Recognition Using
the Discrete Cosine Transform 175 That is, it attempts to compensate for position, scale, and
minor orientation variations in faces. This way, feature vectors are always compared for
images characterized by the same conditions. The second kind of normalization deals with
the illumination of faces. The reasoning here is that the variations in pixel intensities between
different images of faces could be due to illumination conditions. Normalization in this case
is not very easily dealt with because illumination normalization could result in an artificial
tinting of light colored faces and a corresponding lightening of dark colored ones. In the
following two subsections, the issues involved in both kinds of normalization are presented,
and the stage is set for various experiments to test their effectiveness for face recognition.
These experiments and their results are detailed in Section 4.
Geometry
The proposed system is a holistic approach to face recognition. Thus it uses the image of a
whole face and, as discussed in Section 1, it is expected to be sensitive to variations in facial
scale and orientation. An investigation of this effect was performed in the case of the DCT to
confirm this observation. The data used for this test were from the MIT database, which is
described, along with the other databases studied, in a fair amount of detail in Section 4. This
database contains a subset of faces that only vary in scale. To investigate the effects of scale
on face recognition accuracy, faces at a single scale were used as the gallery faces, and faces
from two different scales were used as the probes. Figure 3.5 illustrates how scale can
degrade the performance of a face recognition system. In the figure, the term ³Training Case´
refers to Figure 3.5 Three faces from the MIT database exhibiting scale variations. The labels
refer to the experiments performed in Fig. 3.4.
64 DCT coefficients were used for feature vectors, and 14 individuals of the MIT database
were considered. the scale in the gallery images, and the terms ³Case 1´ and ³Case 2´
describe the two scales that were available for the probes. Figure 3.5 shows examples of faces
from the training set and from the two cases of scal investigated. These results indicate that
the DCT exhibits sensitivity to scale similar to that shown for the KLT (Turk and Pentland,
1991). The geometric normalization we have used basically attempts to make all faces have
the same size and same frontal, upright pose. It also attempts to crop face images such that
most of the background is excluded. To achieve this, it uses the input face eye coordinates
and defines a transformation to place these eyes in standard positions. That is, it scales faces
such that the eyes are176 Hafed and Levine
The final image dimensions are 128 × 128.always the same distance apart, and it positions
these faces in an image such that most of the background is excluded. This normalization
procedure is illustrated in Fig.3.6, and it is similar to that proposed in Brunelli and Poggio
(1993). Given the eye coordinates of the input face image, the normalization procedure
performs thefollowing three transformations: rotate the image so that the eyes fall on a
horizontal line, scale the image (while maintaining the original aspect ratio) so that the eye
centers are at a fixed distance apart (36 pixels), and translate the image to place the eyes at
set positions within a 128×128 cropping window (see Fig. 3.6). Note that we only require the
eye coordinates of input faces in order to perform this normalization. Thus no knowledge of
individual face contours is available, which means that we cannot easily exclude the whole
background from the normalized images. Since we cannot tailor an optimal normalization
and cropping scheme for each face without knowledge of its contours, the dimensions shown
in Fig. 3.6 were chosen to result in as little background, hair, and clothing information as
possible, and they seemed appropriate given the variations in face geometry among people.
Another observation we can make about Fig. 3.6 is that the normalization performed accounts
for only twodimensional perturbations in orientation. That is, no compensation is done for
three-dimensional (in depth) pose variations. This is a much more difficult problem to deal
with, and a satisfactory solution to it has yet to be found. Of course, one could increase the
robustness of a face recognition system to 3-D pose variations by including several training
images containing such variations for a single person. The effect of doing this will be
discussed in the next section. Also, by two-dimensional perturbations in orientation, we mean
slight rotations from the upright position. These rotations are the ones that may arise
naturally, even if people are looking straight ahead (see Fig. 3.7 for an example). Of course,
larger 2-D rotations do not occur naturally and always include some 3-D aspect to them,
which obviously 2-D normalization does not account for. As for the actual normalization
technique implemented, it basically consists of defining and applying a 2-D affine
transformation, based on the relative eye positions and their distance. Figure 3.8 illustrates
the result of applying such a transformation on a sample face image.
3.3 Illumination
Figure 3.8 The result of applying such a transformation on a sample face image.
of the same face. That is, no inter-face normalization is performed, and in this way, no
artificial darkening or lightening of faces occurs due to attempts to normalize all faces to a
single target. Of course, the results of illumination normalization really depend on the
database being considered. For example, if the illumination of faces in a database is
sufficiently uniform, then illumination normalization techniques are redundant.
3.4 Experiments
This section describes experiments with the developed face recognition system. These were
fairly extensive, and the hallmark of the work presented here is that the DCT was put to the
test under a wide variety of conditions. Specifically, several databases, with significant
differences between them, were used in the experimentation.
A flowchart of the system described in the previous section is presented in Fig.
3.9 As can be seen, there is a pre-processing stage in which the face codes for the individual
database images are extracted and stored for later use. This stage can be thought of as a
modeling stage, which is necessary even for human beings: we perform a correlation between
what is seen and what is already known in order to actually achieve recognition (Sekuler and
Blake, 1994). At run-time, a test input is presented to the system, and its face codes are
extracted. The closest match is found by performing a search that basically computes
Euclidean distances and sort the results using a fast algorithm (Silvester, 1993). the various
modules used and the flowchart of operation. This section begins with a brief overview of the
various face databases used for testing the system; the differences among these databases are
highlighted. Then the experiments performed and their results are presented and discussed.
We compare the proposed local appearance-based approach with several well-known
holistic face recognition approaches
6 Principal Component Analysis (PCA) [15], Linear Discriminant Analysis (LDA) [2],
Fig. 3.10 Samples from the Yale database. First row: Samples from training set. Second row:
Samples from test set.
In all our experiments, except for the DCT+GMM approach, where the classification is
done with Maximum-Likelihood, we use the nearest neighbor classifier with the normalized
correlation d as the distance metric:
The Yale face database consists of 15 individuals, where for each individual, there are 11
face images containing variations in illumination and facial expression. From these 11 face
images, we use 5 for training, the ones with annotations ³center light´, ³no glasses´,
³normal´, ³sleepy´ and ³wink´. The remaining 6 images - ³glasses´, ³happy´, ³left light´,
³right light´, ³sad´, ³surprised´ - are used for testing. The test images with illumination from
sides and with glasses are put in the test set on purpose in order to harden the testing
conditions. The face images are closely cropped and scaled to 64x64 resolution. Fig. 3.10
depicts some sample images from the training and testing set.
In the first experiment, the performances of PCA, global DCT, local DCT
and local PCA with feature fusion are examined with varying feature vector dimensions.
Fig.3.11 plots the obtained recognition results for the four approaches for varying number of
coefficients (holistic and local approaches are plotted in different figures due to the difference
in the dimension of used feature vectors in the classification). It can be observed that while
there¶s no significant performance difference between PCA, local PCA and global DCT, local
DCT with feature fusion outperforms these three approaches significantly. Fig3.11 shows that
Local DCT outperforms Local PCA significantly at each feature vector dimension which
indicates that using DCT for local appearance representation is a better choice than using
PCA. Next, the block-based DCT with decision fusion is examined, again with varying
feature vector dimensions. Table 1 depicts the obtained results. It can be seen that further
improvement is gained via decision fusion. Using 20 DCT coefficients, 99% accuracy is
achieved. For comparison, the results obtained when using PCA for local representation are
also depicted in Table 3.1 Overall, the results obtained with PCA for local appearance
represenation are much lower than those obtained with the local DCT representation.
Fig. 3.11 Correct recognition rate versus number of used coefficients on the Yale
database.PCA vs. DCT.
40 eigenvectors are chosen corresponding to 97.92% of the energy content.From the results
depicted in below Table it can be seen that the proposed approaches using local DCT
features outperform the holistic approaches as well as the local DCT features modeled with a
GMM, which ignores location information.
3.1 Different database method and its recognition rate
Method Reco.Rate
PCA(20) 75.6%
LDA(14) 80.0%
ICA 1(40) 77.8%
ICA 2(40) 72.2%
Global DCT(64) 74.4%
Local DCT(18)+GMM(8) as in [12] 58.9%
Local DCT +Feature Fusion (192) 86.7%
Local DCT (10)+Decision Fusion(64) 98.9%
CHAPTER 4
Historical background
Neural network simulations appear to be a recent development. However, this field was
established before the advent of computers, and has survived at least one major setback and
several eras.
Many important advance have been boosted by the use of inexpensive computer emulations.
Following an initial period of enthusiasm, the field survived a period of frustration and
disrepute. During this period when funding and professional supports was minimal, important
advances were made by relatively few researchers. These pioneers were able to develop
convincing technology which surpassed the limitations identified by Minsky and Papert.
Minsky and Papert , published a book (in 1969) in which they summed up a general feeling
of frustration (against neural networks) among researchers, and was thus accepted by most
without further analysis. Currently, the neural network field enjoys a resurgence of interest
and a corresponding increase in funding.
The first artificial neuron was produced in 1943 by the neurophysiologist warren McCulloch
and the logician Walter Pits. But the technology available at that time did not allow them to
do too much.
Neural networks, with their remarkable ability to derive meaning from complicated or
imprecise data, can be used to extract patterns and detect trends that are too complex to be
noticed by either humans or other computer techniques. A trained neural can be thought of as
an ³expert´ in the category of information it has been given to analise. This expert can then
be used to provide projections given new situations of interest and answer ³what if´
questions.
Other advantages
Adaptive learning: an ability to learn how to do tasks based on the data given for
training or initial experience.
1. Self - organisation: an ANN can create its own organization or representation of the
information it receives during learning time.
2. Real time operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
3. Fault tolerance via redundant information coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network
capabilities may be retained even with major network damage.
Neural networks take a different approach to problem solving than that of conventional
computers. Conventional computers use an algorithmic approach i.e. the computer follows a
set of instructions in order to solve a problem. Unless the specific steps that the computer
needs to follow are known the computer cannot solve the problems. That restricts the
problem solving capability of conventional computers to problems that we already
understand and know how to solve. But computers would be so much more useful if they
could do things that we don¶t exactly know how to do.
Neural networks process information in a similar way the human brains does. The network
is Composed of a large number of highly interconnected processing elements(neurons)
working in parallel to solve a specific problem. Neural network by an example. They cannot
be programmed to perform a specific task. The example must be selected carefully otherwise
useful time is wasted or even worse the network might be functioning incorrectly. The
disadvantage is that because that because the network finds out how to solve the problem by
itself, it¶s operation can be unpredictable.
On the other hand, conventional computers use a cognitive approach to problem solving;
the way the problem is to solved must be known and stated in small unambiguous instruction.
These instructions are then converted to a high level language program and then into machine
code that the computer can understand. These machines are totally predictable; if anything
goes wrong is due to a software or hardware fault.
Neural networks and conventional algorithmic computers are not in competition but
complement each other. There are tasks are more suited to an algorithmic approach like
arithmetic operations and tasks that are more suited to neural networks. Even more, a large
number of tasks, require systems that use a combination of the two approaches (normally a
conventional computer is used to supervise the neural network) in order to perform at
maximum efficiency.
The commonest type of artificial neural network consists of three groups, or layers, of
units: a layer of ³input´ units is connected to a layer of ³hidden´ units, which is connected to
a layer of ³output´.
M The activity of the input units represents the raw information that is fed into the
network.
M The activity of each hidden unit is determined by the activities of the input units and
the weights on the connections between the input and the hidden units.
M The behaviour of the output units depends on the activity of the hidden units and the
weights between the hidden and output units.
Block Diagram
-+ + 4
*
.
+
+
,
5
+ /
+
0
1
2
3 +
+
Figure 4.2 Block diagram of the face recognition system using Eigenface algorithm
CHAPTER 5
The scaling factor of an eigen vector is called its eigen value. An eigen value only makes
sense in the context of an eigen vector, i.e. the arrow whose length is being changed. In the
plane, a rigid rotation of 90° has no eigen vectors, because all vectors move. However, the
reflection y = -y has the x and y axes as eigen vectors. In this function, x is scaled by 1 and y
by -1, the eigen values corresponding to the two eigen vectors. All other vectors move in the
plane. The y axis, in the above example, is subtle. The direction of the vector has been
reversed, yet we still call it an eigen vector, because it lives in the same line as the original
vector. It has been scaled by -1, pointing in the opposite direction. An eigen vector stretches,
or shrinks, or reverses course, or squashes down to 0. The key is that the output vector is a
constant (possibly negative) times the input vector.
These concepts are valid over a division ring, as well as a field. Multiply by K on the left to
build the K vector space, and apply the transformation, as a matrix, on the right. However,
the following method for deriving eigen values and vectors is based on the determinant, and
requires a field.
Given a matrix M implementing a linear transformation, what are its eigen vectors and
values? Let the vector x represent an eigen vector and let l be the eigen value. We must
solve x*M = lx. Rewrite lx as x times l times the identity matrix and subtract it from both
sides. The right side drops to 0, and the left side is x*M-x*l*identity. Pull x out of both
factors and write x*Q = 0, where Q is M with l subtracted from the main diagonal. The eigen
vector x lies in the kernel of the map implemented by Q. The entire kernel is known as the
eigen space, and of course it depends on the value of l.
If the eigen space is nontrivial then the determinant of Q must be 0. Expand the determinant,
giving an n degree polynomial in l. (This is where we need a field, to pull all the entries to
the left of l, and build a traditional polynomial.) This is called the characteristic polynomial
of the matrix. The roots of this polynomial are the eigen values. There are at most n eigen
values.
Substitute each root in turn and find the kernel of Q. We are looking for the set of vectors x
such that x*Q = 0. Let R be the transpose of Q and solve R*x = 0, where x has become a
column vector. This is a set of simultaneous equations that can be solved using gaussian
elimination. In summary, a somewhat straightforward algorithm extracts the eigen values, by
solving an n degree polynomial, then derives the eigen space for each eigen value. Some
eigen values will produce multiple eigen vectors, i.e. an eigen space with more than one
dimension. The identity matrix, for instance, has an eigen value of 1, and an n-dimensional
eigen space to go with it. In contrast, an eigen value may have multiplicity > 1, yet there is
only one eigen vector. This is illustrated by [1,1|0,1], a function that tilts the x axis
counterclockwise and leaves the y axis alone. The eigen values are 1 and 1, and the eigen
vector is 0,1, namely the y axis.
Let two eigen vectors have the same eigen value. specifically, let a linear map multiply the
vectors v and w by the scaling factor l. By linearity, 3v+4w is also scaled by l. In fact every
linear combination of v and w is scaled by l. When a set of vectors has a common eigen
value, the entire space spanned by those vectors is an eigen space, with the same eigen value.
This is not surprising, since the eigen vectors associated with l are precisely the kernel of the
transfoormation defined by the matrix M with l subtracted from the main diagonal. This
kernel is a vector space, and so is the eigen space of l. Select a basis b for the eigen space of
l. The vectors in b are eigen vectors, with eigen value l, and every eigen vector with eigen
value l is spanned by b. Conversely, an eigen vector with some other eigen value lies outside
of b.
Different eigen values always lead to independent eigen spaces. Suppose we have the shortest
counterexample. Thus c1x1 + c2x2 + « + ck xk = 0. Here x1 through xk are the eigen vectors,
and c1 through ck are the coefficients that prove the vectors form a dependent set.
Furthermore, the vectors represent at least two different eigen values. Let the first 7 vectors
share a common eigen value l. If these vectors are dependent then one of them can be
expressed as a linear combination of the other 6. Make this substitution and find a shorter list
of dependent eigen vectors that do not all share the same eigen value. The first 6 have eigen
value l, and the rest have some other eigen value. Remember, we selected the shortest list, so
this is a contradiction. Therefore the eigen vectors associated with any given eigen value are
independent. Scale all the coefficients c1 through ck by a common factor s. This does not
change the fact that the sum of cixi is still zero. However, other than this scaling factor, we
will prove there are no other coefficients that carry the eigen vectors to 0.
If there are two independent sets of coefficients that lead to 0, scale them so
the first coefficients in each set are equal, then subtract. This gives a shorter linear
combination of dependent eigen vectors that yields 0. More than one vector remains, else
cjxj = 0, and xj is the 0 vector. We already showed these dependent eigen vectors cannot
share a common eigen value, else they would be linearly independent; thus multiple eigen
values are represented. This is a shorter list of dependent eigen vectors with multiple eigen
values, which is a contradiction. If a set of coefficients carries our eigen vectors to 0, it must
be a scale multiple of c1 c2 c3 « ck. Now take the sum of cixi and multiply by M on the
right. In other words, apply the linear transformation. The image of 0 ought to be 0. Yet
each coefficient is effectively multiplied by the eigen value for its eigen vector, and not all
eigen values are equal. In particular, not all eigen values are 0.
Here is a simple application of eigen vectors. A rigid rotation in 3 space always has an axis
of rotation. Let M implement the rotation. The determinant of M, with l subtracted from its
main diagonal, gives a cubic polynomial in l, and every cubic has at least one real root. Since
lengths are preserved by a rotation, l is ±1. If l is -1 we have a reflection. So l = 1, and the
space rotates through some angle ș about the eigen vector. That's why every planet, every
star, has an axis of rotation.
Matching Algorithm
Here you can do Both images are Same Displays the results Match Found or Not Found
+
+
+ 6
+
+ /
+
2
0
This is the entry point of the face recognition process. It is the module where the face image
under consideration is presented to the system. In other words, the user is asked to present a
face image to the face recognition system in this module. An acquisition module can request
a face image from several different environments: The face image can be an image file that is
located on a magnetic disk, it can be captured by a frame grabber and camera or it can be
scanned from paper with the help of a scanner.
In this module, by means of early vision techniques, face images are normalized and if
desired, they are enhanced to improve the recognition performance of the system. Some or all
of the following pre-processing steps may be implemented in a face recognition system:
1. Image size (resolution) normalization: it is usually done to change the acquired image
size to a default image size on which the face recognition system operates.
2. Histogram equalization: it is usually done on too dark or too bright images in order to
enhance the image quality and to improve face recognition performance. It modifies
the dynamic range (contrast range) of the image and as a result, some important facial
features become more apparent.
3. Median filtering: for noisy images especially obtained from a camera or from a frame
grabber, median filtering can clean the image without loosing information.
4. High-pass filtering: feature extractors that are based on facial outlines may benefit the
results that are obtained from an edge detection scheme. High-pass filtering
emphasizes the details of an image such as contours, which can dramatically improve
edge detection performance.
5. Background removal: in order to deal primarily with facial information itself, face
background can be removed. This is especially important for face recognition systems
where entire information contained in the image module should be capable of
determining the face outline.
After performing some pre-processing (if necessary), the normalized face image is presented
to the feature extraction module in order to find the key features that are going to be used for
classification. In other words, this module is responsible for composing a feature vector that
is well enough to represent the face image.
In this module, with the help of a pattern classifier, extracted features of the face image is
compared with the ones stored in a face library (or face database). After doing this
comparison, face image is classified as either known or unknown.
Training set
Training sets are used during the ³learning phase´ of the face recognition process. The
feature extraction and the classification modules adjust their parameters in order to achieve
optimum recognition performance by making use of training sets.
Due to the dynamic nature of face images, a face recognition system encounters various
problems during the recognition process. It is possible to classify a face recognition system as
either ³robust´ or ³weak´ based on its recognition performances under these circumstances.
1. Scale invariance: the same face can be presented to the system at different scales.
This may happen due to the focal distance between the face and the camera. As this
distance gets closer, the face image gets bigger.
2. Shift invariance: the same face can be presented to the system at different
perspectives and orientations. For instance, face images of the same person could be
taken from frontal and profile views. Besides, head orientation may change due to
translations and rotations.
3. Illumination invariance: face images of the same person can be taken under
different illumination conditions such as, the position and the strength of the light
source can be modified.
4. Emotional expression and detail invariance: face images of the same person can
differ in expressions when smiling or laughing. Also, some details such as dark
glasses, beards or moustaches can be present.
5. Noise invariance: a robust face recognition system should be insensitive to noise
generated by frame grabbers or cameras. Also, it should function under partially
occluded images.
CHAPTER 6
DEVEÿOPING TOOÿS
MATLAB is a high performance language for technical computing .It integrates computation
visualization and programming in an easy to use environment
Mat lab stands for matrix laboratory. It was written originally to provide easy access to
matrix software developed by LINPACK (linear system package) and EISPACK (Eigen
system package) projects.
(,
/0 )!!
0 '!!
"0 &1'
20
3
40 &''
50 (
60 )
Block Diagram
Matlab resources
à Command window.
à Editor
à Debugger
à Profiler (evaluate performances)
Mathematical libraries
In matlab, scripts are the equivalent of main programs. The variables declared in a
script are visible in the workspace and they can be saved. Scripts can therefore take a lot of
memory if you are not careful, especially when dealing with images. To create a script, you
will need to start the editor, write your code and run it.
Syntax:
A = imread(filename,fmt)
[X,map] = imread(filename,fmt)
[...] = imread(filename)
Description:
[X,map] = imread(filename,fmt) reads the indexed image in filename into X and its
associated colormap into map. The colormap values are rescaled to the range [0,1]. A and
map are two-dimensional arrays.
[...] = imread(filename) attempts to infer the format of the file from its content.
filename is a string that specifies the name of the graphics file, and fmt is a string that
specifies the format of the file. If the file is not in the current directory or in a directory in the
MATLAB path, specify the full pathname for a location on your system. If imread cannot
find a file named filename, it looks for a file named filename.fmt. If you do not specify a
string for fmt, the toolbox will try to discern the format of the file by checking the file header.
TIFF-Specific Syntax:
[...] = imread(...,idx) reads in one image from a multi-image TIFF file. idx is an
integer value that specifies the order in which the image appears in the file. For example, if
idx is 3, imread reads the third image in the file. If you omit this argument, imread reads the
first image in the file. To read all ages of a TIFF file, omit the idx argument.
PNG-Specific Syntax:
The discussion in this section is only relevant to PNG files that contain transparent
pixels. A PNG file does not necessarily contain transparency data. Transparent pixels, when
they exist, will be identified by one of two components: a a
or an
. (A PNG file can only have one of these components, not both.) The transparency
chunk identifies which pixel values will be treated as transparent, e.g., if the value in the
transparency chunk of an 8-bit image is 0.5020, all pixels in the image with the color 0.5020
can be displayed as transparent. An alpha channel is an array with the same number of pixels
as are in the image, which indicates the transparency status of each corresponding pixel in the
image (transparent or nontransparent). Another potential PNG component related to
transparency is the background color chunk, which (if present) defines a color value that can
be used behind all transparent pixels. This section identifies the default behavior of the
toolbox for reading PNG images that contain either a transparency chunk or an alpha channel,
and describes how you can override it.
HDF-Specific syntax:
[...] = imread(...,ref) reads in one image from a multi-image HDF file. ref is an integer
value that specifies the reference number used to identify the image. For example, if ref is 12,
imread reads the image whose reference number is 12. (Note that in an HDF file the reference
numbers do not necessarily correspond to the order of the images in the file. You can use
imfinfo to match up image order with reference number.) If you omit this argument, imread
reads the first image in the file. .
6.2 This table summarizes the types of images that imread can read
Format Variants
1-bit, 4-bit, 8-bit, and 24-bit uncompressed images; 4-bit and 8-bit run-length
BMP
encoded (RLE) images
8-bit raster image datasets, with or without associated colormap; 24-bit raster image
HDF
datasets
Any baseline JPEG image (8 or 24-bit); JPEG images with some commonly used
JPEG
extensions
Any PNG image, including 1-bit, 2-bit, 4-bit, 8-bit, and 16-bit grayscale images; 8-
PNG
bit and 16-bit indexed images; 24-bit and 48-bit RGB images
Any baseline TIFF image, including 1-bit, 8-bit, and 24-bit uncompressed images; 1-
TIFF bit, 8-bit, 16-bit, and 24-bit images with packbits compression; 1-bit images with
CCITT compression; also 16-bit grayscale, 16-bit indexed, and 48-bit RGB images.
Syntax
imshow(I)
imshow(I,[low high])
imshow(RGB)
imshow(BW)
imshow(X,map)
imshow(filename)
himage = imshow(...)
Description
imshow(I,[low high]) displays the grayscale image I, specifying the display range for
I in [low high]. The value low (and any value less than low) displays as black; the value high
(and any value greater than high) displays as white. Values in between are displayed as
intermediate shades of gray, using the default number of gray levels. If you use an empty
matrix ([]) for [low high], imshow uses [min(I(:)) max(I(:))]; that is, the minimum value in I
is displayed as black, and the maximum value is displayed as white.
imshow(BW) displays the binary image BW. imshow displays pixels with the value 0
(zero) as black and pixels with the value 1 as white.
imshow(X,map) displays the indexed image X with the colormap map. A color map
matrix may have any number of rows, but it must have exactly 3 columns. Each row is
interpreted as a color, with the first element specifying the intensity of red light, the second
green, and the third blue. Color intensity can be specified on the interval 0.0 to 1.0.
imshow(filename) displays the image stored in the graphics file filename. The file
must contain an image that can be read by imread or dicomread. imshow calls imread or
dicomread to read the image from the file, but does not store the image data in the MATLAB
workspace. If the file contains multiple images, the first one will be displayed. The file must
be in the current directory or on the MATLAB path.
Remarks
imshow is the toolbox's fundamental image display function, optimizing figure, axes,
and image object property settings for image display. imtool provides all the image display
capabilities of imshow but also provides access to several other tools for navigating and
exploring images, such as the Pixel Region tool, Image Information tool, and the Adjust
Contrast tool. imtool presents an integrated environment for displaying images and
performing some common image processing tasks.
Examples
X= imread('moon.tif');
imshow(X).
Introduction
When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user
interfaces) for managing files, variables, and applications associated with MATLAB. The
following illustration shows the default desktop. You can customize the arrangement of tools
and documents to suit your needs. For more information about the desktop tools .
6.5 Implementations
The best way for you to get started with MATLAB is to learn how to handle
matrices. Start MATLAB and follow along with each example. You can enter matrices into
MATLAB in several different ways:
m Create matrices with your own functions in M-files. Start by entering Dürer¶s matrix as a
list of its elements. You only have to follow a few basic conventions:
m Surround the entire list of elements with square brackets, [ ]. To enter matrix, simply type
in the Command Window
A =16 3 2 13
5 10 11 8
9 6 7 12
4 15 14 1
This matrix matches the numbers in the engraving. Once you have entered the matrix, it is
automatically remembered in the MATLAB workspace. You can refer to it simply as A. Now
that you have A in the workspace, sum, transpose, and diag
You are probably already aware that the special properties of a magic square have to do with
the various ways of summing its elements. If you take the sum along any row or column, or
along either of the two main diagonals, you will always get the same number. Let us verify
that using MATLAB. The first statement to try is sum(A)
ans =34 34 34 34
When you do not specify an output variable, MATLAB uses the variable ans, short for
answer, to store the results of a calculation. You have computed a row vector containing the
sums of the columns of A. Sure enough, each of the columns has the same sum, the magic
sum, 34.
How about the row sums? MATLAB has a preference for working with the columns of a
matrix, so one way to get the row sums is to transpose the matrix, compute the column sums
of the transpose, and then transpose the result. For an additional way that avoids the double
transpose use the dimension argument for the sum function. MATLAB has two transpose
operators. The apostrophe operator (e.g., A') performs a complex conjugate transposition. It
flips a matrix about its main diagonal, and also changes the sign of the imaginary component
of any complex elements of the matrix. The apostrophe-dot operator (e.g., A'.), transposes
without affecting the sign of complex elements. For matrices containing all real elements, the
two operators return the same result.
So A' produces
ans =
16 5 9 4
3 10 6 15
2 11 7 14
13 8 12 1
ans =
34
34
34
34
The sum of the elements on the main diagonal is obtained with the sum and the diag
functions:
diag(A) produces
ans =
16
10
ans =
34
The other diagonal, the so-called anti diagonal, is not so important Mathematically, so
MATLAB does not have a ready-made function for it. But a function originally intended for
use in graphics, fliplr, flips a matrix From left to right:
Sum (diag(fliplr(A)))
ans =
34
You have verified that the matrix in Dürer¶s engraving is indeed a magic Square and, in the
process, have sampled a few MATLAB matrix operations.
Operators
+ Addition
- Subtraction
* Multiplication
/ Division
MATLAB documentation)
. ^ Power
Generating Matrices
Z = zeros(2,4)
Z=
0000
0000
F = 5*ones(3,3)
F=
555
555
555
N = fix(10*rand(1,10))
N=
9264874084
R = randn(4,4)
R=
M-Files
You can create your own matrices using M-files, which are text files containing MATLAB
code. Use the MATLAB Editor or another text editor to create a file Containing the same
statements you would type at the MATLAB command Line. Save the file under a name that
ends in .m.For example, create a file containing these five lines: A = [...
Store the file under the name magik.m. Then the statement magik reads the file and creates a
variable, A, containing our example matrix.
MATLAB displays graphs in a special window known as a figure. To create a graph, you
need to define a coordinate system. Therefore every graph is placed within axes, which are
contained by the figure. The actual visual representation of the data is achieved with graphics
objects like lines and surfaces. These objects are drawn within the coordinate system defined
by the axes, which MATLAB automatically creates specifically to accommodate the range of
the data. The actual data is stored as properties of the graphics objects.
Plotting Tools
Plotting tools are attached to figures and create an environment for creating Graphs. These
tools enable you to do the following:
Display the plotting tools from the View menu or by clicking the plotting tools icon in the
figure toolbar, as shown in the following picture.
Editor/Debugger
Use the Editor/Debugger to create and debug M-files, which are programs you write to run
MATLAB functions. The Editor/Debugger provides a graphical user interface for text
editing, as well as for M-file debugging. To create or edit an M-file use File > New or File >
Open, or use the edit function.
CHAPTER 7
7.1 Conclusion
7.2Future scope
7.3 References
[1] .C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: OxfordUniversity
press,1995
[2]. B. Chalmond and S. Girard, ³Nonlinear modeling of scattered multivariate data and its
application to shape change,´ IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, no. 5, pp. 422-432, 1999.
[3]. R. Chellappa, C. L. Wilson, and S. Sirohey, ³Human and machine recognition of faces: A
survey,´ Proceedingsof the IEEE, vol. 83, no. 5, pp. 705-740, 1995.
[4] C. Christopoulos, J. Bormans, A. Skodras, and J. Cornelis, ³Efficient computation of the
two-dimensional fast cosine transform,´ in SPIE Hybrid Image and Signal Processing
IV, pp. 229-237, 1994.
[5]. R. Gonzalez and R. Woods, Digital Image Processing. Reading, MA: Addison-Wesley,
1992.
[6]. A. Hyvarinen, ³Survey on independent component analysis,´ Neural Computing Surveys,
2, pp. 94-128, 1999. J. Karhunen and J. Joutsensalo, ³Generalization of principal
component analysis, optimization problems and neural networks,´ Neural Networks,
vol.8, no. 4, pp. 549-562, 1995.
[7]. M. Kirby and L. Sirovich, ³Application of the Karhunen-Loeve procedure for the
characterization of human faces,´ IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 12, no. 1, pp. 103-108, 1990.
[8]. S. Lawrence, C. Lee Giles, A. Tsoi, and A. Back, ³Face recognition: A convolutional
neural network approach,´ IEEE Transactions on Neural Networks, vol. 8, no. 1, pp.
98-113,1997.
[9]. C. Nebauer, ³Evaluation of convolutional neural networks for visual recognition,´ IEEE
Transactions on Neural Networks, vol. 9, no. 4, pp. 685-696, 1998.
[10]. Z. Pan, R. Adams, and H. Bolouri, ³Dimensionality reduction of face images using
discrete cosine transforms for recognition.´ submitted to IEEE Conference on
Computer Vision and Pattern Recognition, 2000.
[11]. F. Samaria, Face Recognition using Hidden Markov Models. PhD thesis, Cambridge
University, 1994.