You are on page 1of 6

State-of-the-art of 3D facial reconstruction methods for face recognition based

on a single 2D training image per person


Martin D. Levine
*
, Yingfeng (Chris) Yu
Center for Intelligent Machines and Dept. of Elec. and Computer Eng., McGill University, Montreal, Quebec, Canada H3A 2A7
a r t i c l e i n f o
Article history:
Received 21 December 2007
Received in revised form 13 March 2009
Available online 29 March 2009
Communicated by G. Sanniti di Baja
Keywords:
State-of-the-art
3D reconstruction
Face recognition
Single 2D training image
3D morphable model
a b s t r a c t
3D facial reconstruction systems attempt to reconstruct 3D facial models of individuals from their 2D
photographic images or video sequences. Currently published face recognition systems, which exhibit
well-known deciencies, are largely based on 2D facial images, although 3D image capture systems
can better encapsulate the 3D geometry of the human face. Accordingly, face recognition research is grad-
ually shifting from the legacy 2D domain to the more sophisticated 2D to 3D or 2D/3D hybrid domain.
Currently there exist four methods for 3D facial reconstruction. These are: Stochastic Newton Optimiza-
tion method (SNO) [Blanz, V., Vetter, T., 1999. A morphable model for the synthesis of 3D faces. In: Proc.
26th Annu. Conf. on Computer Graphics and Interactive Techniques, SIGGRAPH. pp. 187194; Blanz, V.,
Vetter, T., 2003. Face recognition based on tting a 3D morphable model. IEEE Trans. Pattern Anal.
Machine Intell. 25(9), 10631074; Blanz, V., 2001. Automatische Rekonstruction der Dreidimensionalen
Form von Gesichtern aus einem Einzelbild. Ph.D. Thesis, Universitat Tubingen, Germany] inverse compo-
sitional image alignment algorithm (ICIA) [Romdhani, S., Vetter, T., 2003. Efcient, robust and accurate
tting of a 3D morphable model. In: IEEE Int. Conf. on Computer Vision, vol. 2, no. 1. pp. 5966], linear
shape and texture tting algorithm (LiST) [Romdhani, S., Blanz, V., Vetter, T., 2002. Face identication
by tting a 3D morphable model using linear shape and texture error functions. In: Proc. ECCV, vol. 4.
pp. 319], and shape alignment and interpolation method correction (SAIMC) [Jiang, D., Hu, Y., Yan, S.,
Zhang, L., Zhang, H., Gao, W., 2005. Efcient 3D reconstruction for face recognition. Pattern Recogn.
38(6), 787798]. The rst three, SNO, ICIA + 3DMM, and LiST can be classied as analysis-by-synthesis
techniques and SAIMC can be separately classied as a 3D supported 2D model. In this paper, we intro-
duce, discuss and analyze the difference between these two frameworks. We begin by presenting the 3D
morphable model (3DMM; Blanz and Vetter, 1999), which forms the foundation of all four of the recon-
struction techniques described here. This is followed by a review of the basic analysis-by-synthesis
framework and a comparison of the three methods that employ this approach. We next review the
3D supported 2D model framework and introduce the SAIMC method, comparing it to the other three.
The characteristics of all four methods are summarized in a table that should facilitate further research on
this topic.
2009 Elsevier B.V. All rights reserved.
1. Introduction
3D facial reconstruction systems attempt to reconstruct 3D fa-
cial models of individuals from their 2D photographic images or vi-
deo sequences. Research in the areas of computer graphics and
machine vision has given rise to a number of concepts that aid in
facial reconstruction. Certain of these, for instance Structure from
Motion (SFM) (Bregler et al., 2000; Pollefeys, 1999; Zhao and
Chellappa, 2000), Shape from Contours (Brady and Yuille, 1983;
Himaanshu et al., 2004) and Shape from Silhouette (Matusik
et al., 2000; Moghaddam et al., 2003) have found application in
commercial facial reconstruction applications. There exist several
excellent survey papers addressing these and other theories,
implementations, and applications of currently available facial
reconstruction methods (Bowyer et al., 2004; Zhou and Chellappa,
2005). All of these methods require several 2D images or video
frames per person to reconstruct the 3D version of the persons
face. However, it is not always practically feasible to obtain several
2D images per person. In this paper, we focus exclusively on the
existing state-of-the-art of 3D facial reconstruction using only a
single frontal 2D image as the training input for each person. In par-
ticular, we address only those 3D approaches that have been ap-
plied to face recognition and which have been tested on face
databases.
Currently available face recognition systems, which are largely
based on 2D facial images, suffer from low reliability due to their
sensitivity to lighting conditions, facial expressions and changes
0167-8655/$ - see front matter 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2009.03.011
* Corresponding author. Tel.: +1 514 398 7348; fax: +1 514 398 7115.
E-mail address: m.levine@ieee.org (M.D. Levine).
Pattern Recognition Letters 30 (2009) 908913
Contents lists available at ScienceDirect
Pattern Recognition Letters
j our nal homepage: www. el sevi er . com/ l ocat e/ pat r ec
in head pose. The inadequate performance of these systems should
come as no surprise as these 2D based systems ignore the fact that
the human face is three-dimensional and therefore needs to be de-
scribed by a 3D model. In recent years, several advancements in
the elds of computer vision and computer graphics have helped
researchers acquire highly accurate 3D scanning data (generally a
2.5D range image) and thus better capture the 3D geometry of
the human face. Accordingly, face recognition research has shifted
moderately from the legacy 2D domain to the more sophisticated
2D to 3D or 2D/3D hybrid domain. Although 3D face recognition
is theoretically superior in performance to 2D face recognition,
not only with regard to varying head pose but also to illumination
changes, the high cost and limited applicability of 3D sensing de-
vices (range sensors) has restricted progress. Furthermore, 3D face
recognition technologies tend to be constrained by the usage of
presently available legacy databases, which are primarily of a 2D
nature. Thus, 3D facial reconstruction could play an important role
to bridge the gap between existing 2D face databases and state-of-
the-art 3D face recognitions system. The benets are shared; the
construction of 3D facial data from 2D images may help 3D face
recognition systems; the synthesized 2D images generated from
existing 2D images using a 3D model, may enhance the capabilities
of existing 2D face recognition systems.
To the best of our knowledge, there exist at present four meth-
ods for facial reconstruction. These are
1. Stochastic Newton Optimization method (SNO) proposed by
Blanz and Vetter (1999, 2003), Blanz (2001).
2. Inverse compositional image alignment (ICIA) applied to 3D
morphable models (3DMM) (Romdhani and Vetter, 2003) ini-
tially introduced by Baker and Matthews (2001). This algorithm
was later adopted by Blanz and Vetter (1999) with regard to
3DMM.
3. Linear shape and texture tting (LiST), introduced by Romdhani
et al. (2002).
4. Shape alignment and interpolation method correction (SAIMC)
introduced by Jiang et al. (2005) to retrieve a 3D facial model
from a single frontal 2D image.
1
The rst three methods, SNO, ICIA + 3DMM, and LiST were clas-
sied as analysis-by-synthesis techniques by their authors. How-
ever, in the recent survey paper on 3D face recognition methods by
Scheenstra et al. (2005), the rst three algorithms are renamed
template matching approaches (Scheenstra et al., 2005) and
SAIMC was separately classied as being a 3D supported 2D mod-
el approach. In this paper, we use the nomenclature analysis-by-
synthesis to refer to SNO, ICIA + 3DMM, and LiST and 3D sup-
ported 2D model to refer to the approach of Jiang et al. We discuss
the differences between these two frameworks later in this paper.
Although analysis-by-synthesis and 3D supported 2D model
are signicantly different from each other, they are both based
on the 3D graphics model called the 3D morphable model (3DMM).
In this paper, we begin in Sections 24 by introducing the
3DMM (Blanz and Vetter, 1999), which forms the foundation of
all four of the reconstruction techniques described here. Sections
3 and 4 deal specically with shape and texture correspondence,
respectively, which form the backbone of this approach. Following
this, in Section 5, we review the basic framework of analysis-by-
synthesis (Blanz and Vetter, 1999, 2003; Blanz, 2001; Romdhani
and Vetter, 2003; Romdhani et al., 2002) and the three methods
that employ this technique, and compare them in Section 6. Then
in Section 7, we review the 3D supported 2D model (Blanz and
Vetter, 1999) framework and introduce the SAIMC method. Gener-
ally, due to the lack of literature available concerning the two
frameworks, there tends to be some confusion regarding the
implementation of each. Finally, in Section 8, all four methods
are summarized and conclusions are drawn.
2. Face modeling: 3D morphable model (3DMM)
A 3D human facial model is represented by its shape and tex-
ture: the shape is captured as vertices in three dimensions and
the texture of the face describes the color information. The 3D
morphable model (3DMM) infrastructure, developed by Blanz
and Vetter (1999) and used in state-of-the-art facial reconstruction
systems, decomposes any 3D human facial model into a linearly
convex combination of shape and texture vectors spanning a set
of exemplars which describe a realistic human face. The linearly
convex combination is fully controlled by shape and texture
parameters, which perform as weights, called a and

b. We begin
by reviewing several important concepts of the 3D morphable
model (3DMM).
In order to obtain their morphable model, Blanz and Vetter used
100 young adults and scanned (Cyberware 3D laser scanner) their
heads, thereby providing a set of exemplar faces. Each laser-
scanned head was saved in a 75,972 vertex, 150,958-facet format.
2
A basic assumption of the work of Blanz and Vetter are that any 3D
facial surface can be realistically modeled using a convex combina-
tion of shape and texture vectors of the set of exemplar faces, where
the shape and texture vectors are given as follows:
S
i
x
i;1
; y
i;1
; z
i;1
; . . . ; x
i:75972
; y
i;75972
; z
i;75972

T
2 R
227;9161
;
T
i
R
i;1
; G
i;1
; B
i;1
; . . . ; R
i;75972
; G
i;75972
; B
i;75972

T
2 R
227;9161
;
where S
i
and T
i
have dimension 227,916 (=3
*
75,972). Blanz and Vet-
ter performed principle component analysis on the 100 laser-
scanned heads and processed shape and texture information sepa-
rately. Information from each of these exemplar heads was saved
as shape and texture vectors (S
i
,T
i
) where i = 1, . . ., 100.
Before continuing, we dene several important quantities based
on the observation that exactly the same vertices and facets dene
all heads. Let
S
0

1
100

100
i1
S
i
; T
0

1
100

100
i1
T
i
;
S S
1
; S
2
; . . . ; S
100
2 R
375972 100
;
T T
1
; T
2
; . . . ; T
100
2 R
375972 100
;
2:1
where (S
0
,T
0
) represents the mean face of the 100 exemplar faces
and is referred to as the Generic Mean Face (GMF). The covariance
matrices of shape and texture are dened as
C
S

100
i1
S
i
S
0
S
i
S
0

T
; C
T

100
i1
T
i
T
0
T
i
T
0

T
; 2:2
The eigenvectors and eigenvalues of the covariance matrices are
written as
E
S
e
s
1
; e
s
2
; . . . ; e
s
100

; e
S
k
s
1
; . . . ; k
s
100

; 2:3
E
T
e
t
1
; e
t
2
; . . . ; e
t
100

; e
T
k
t
1
; . . . ; k
t
100

; 2:4
1
Note that this material also appears in Hu et al. (2004), which was published by
the same research group but using different notation.
2
Generally, 3D laser-scanned faces are obtained using a varied number of vertices
and type of triangular connectivity depending on the characteristic features of the
individual face. However, in order to facilitate their particular scheme, Blanz and
Vetter chose to use the Optical Flow Method to force every face to have exactly 75,972
vertices sharing exactly 150,958 identical facets.
M.D. Levine, Y. (Chris) Yu/ Pattern Recognition Letters 30 (2009) 908913 909
The eigenvectors (eigenfaces) e
s
i
and e
t
i
and the eigenvalues k
s
i
and k
t
i
of the covariance matrices C
S
and C
T
are such that
C
S
e
s
i
k
s
i
e
s
i
C
T
e
t
i
k
t
i
e
t
i
; 2:5
The most important observation and conclusion that can be drawn
from the 3DMM after PCA modeling is that every one of the 100 la-
ser-scanned exemplar heads can be decomposed into the form:
S
j
S
0

100
i1
a
i
k
s;i
e
s
i
;
T
j
T
0

100
i1
b
i
k
t;i
e
t
i
2:6
where e
s
i
and e
t
i
represent the i-th eigenvectors of the covariance
matrices of C
S
and C
T
, respectively. Moreover, following Blanz and
Vetters work, any arbitrary human facial 3D model can be con-
structed by varying parameters (a
k
,b
k
),k = 1, . . ., 100, such that a
and

b are modeled by the normal probability density functions:
p
S

a $ e

1
2

100
k1
a
k
k
s
k

2
and p
T

b

$ e

1
2

100
k1
b
k
k
t
k

2
; 2:7
The authors conclude that a and

b can perform as model descrip-
tors, which can sufciently represent a specic 3D human facial
model. Accordingly, the objective of the 3D facial reconstruction
problem is now redened from one retrieving 3D shape and texture
to searching for the best possible tted parameters, a and

b, such
that the rendered morphable model optimally matches the single
2D input image presented to the reconstruction system.
3
3. Shape correspondence
At this point we emphasize the signicance of the indicies as-
signed to the shape and texture vectors. These run from 1 to
75,972 and are saved as an ordered set, with each index specically
corresponding to one morphological point on the 3D face. The or-
dered indices are assigned (linked) to a specic anthropomorphic
marker on the face and not to the 3D coordinates of the feature
on the 3D model. This concept, which will be referred to as shape
correspondence throughout this paper, forms an integral part of
3DMM. Volker and Blanz (Blanz and Vetter, 1999) obtained this
3D-to-3D correspondence for the 75,972 vertices by using optical
ow and the cylindrical projection method. Fig. 1 illustrates this
concept.
4. Texture correspondence
To fully appreciate the texture correspondence concept, it is
rst necessary to present a texture information retrieval procedure
used in the area of computer graphics. Clearly, a 3D model must
always contain both 3D shape and texture information. However,
accurate 3D texture information can also be retained in a 2D image
format by using the so-called UV map
4
as shown in Fig. 2. UV
mapping is a process which can roughly be described by the
coordinate mapping function fg u; v : x; y; z & R
2
!R
3
g that
warps a 2D image containing texture information to a 3D mesh.
Thus, the opposite 2D attening process is the reverse of the
mapping to R
3
and the attening process f satises the relationship:
ff g
1
x; y; z : u; v & R
3
!R
2
g with respect to g. The pixel
I(u,v) information in UV space is saved in the RGB color format using
chromaticity such that I(u, v) = {r, g, b}. Hence, we can summarize
relationship between the UV mapping, the attening process and
the rgb color space in the following manner:
Iu; v I f
u
x; y; z; f
v
x; y; z
I g
1
u
x; y; z; g
1
v
x; y; z

r; g; b f g; 2:8
Ix; y; z I g
x
u; v; g
y
u; v; g
z
u; v

I f
1
x
u; v; f
1
y
u; v; f
1
z
u; v

fr; g; bg; 2:9
The subscripts in the above equations represent the partial deriva-
tive in a specic direction.
To achieve attening from the 3D-shape to the corresponding
2D-embedded-space, Blanz and Vetter used cylindrical projection
(Blanz and Vetter, 1999). The outcome of this attening process
is stored as a le called textmap.obj. In general, we would expect
that attening would result in UV maps that vary with different
geometries of the 3D face. However, when applied to 3DMM,
which we saw exhibits the property of shape correspondence,
the UV map is independent of the 3D geometry. Given the k-th
vertex V
k
x
k
y
k
z
k
of an arbitrary 3D shape S
i
, the attening
Fig. 1. The shape correspondence concept of the 3D morphable model (3DMM). This gure illustrates that the index of the vertices in the 3DMM are assigned on the basis of
anthropomorphic markers on the 3D face model and not on the basis of the 3D coordinates x, y and z that represent the position of the morphological marker, which will vary
from one persons scanned head to another. In the above, for example, S
0
is shape of the Generic Mean Face (GMF) and S
i
is an arbitrary 3D head in the 3DMM database. The
anthropomorphic marker representing the tip of the nose is assigned a specic index, say k = 37,814. Therefore, the tip of the nose for every scanned head will always be
indexed as k = 37,814, even though the x, y and z coordinates that represent the position of the tip on the nose on the face will change.
3
See Section 5.1 and Table 1 for details.
4
The axes of the 2D UV map are called U and V.
910 M.D. Levine, Y. (Chris) Yu/ Pattern Recognition Letters 30 (2009) 908913
process f, which is a function of index k (and not of x, y and z), has a
unique output of (u
k
,v
k
) in UV space (ignoring the specic values of
x
k
, y
k
and z
k
). Therefore, in the case of 3DMM, the attening process
f is simplied to being a look-up table saved in textmap.obj. The
key point here is that once we know the index of a vertex, the spe-
cic 3D geometric coordinates of that vertex would not contribute
to the vertexs coordinates in UV space. The index is sufcient to re-
trieve corresponding UV coordinates fromtextmap.obj. The texture
(color) information of the vertex is saved at the location of its
corresponding UV coordinates. An illustration of the concept of
3D-shape-to-2D-embedded-space correspondence (texture corre-
spondence) is given in Fig. 2.
5. Analysis-by-synthesis framework
To date we are aware of three analysis-by-synthesis methods:
Stochastic Newton Optimization (SNO)(Blanz and Vetter, 1999,
2003; Blanz, 2001), inverse compositional image alignment with
3DMM (ICIA + 3DMM)(Romdhani and Vetter, 2003) and Linear
shape and texture tting (LiST) (Romdhani et al., 2002). These re-
quire as input a single arbitrary image of a face at a specic pose
to reconstruct the corresponding synthesized 3D face using
3DMM. The estimated a and

b parameters that control the 3D
shape and texture of the requisite 3D facial model are returned
as an output of the reconstruction process. In a nutshell, synthesis
refers to creating a virtual 3D face by estimating the needed
parameters that completely describe the 3D model of the face
and analysis refers to solving the overall face recognition problem
using the information from the synthesis step. In this section, we
briey discuss the three methods that follow the analysis-by-syn-
thesis framework.
5.1. Stochastic Newton Optimization method (SNO)
The Stochastic Newton Optimization method (see Appendix B in
Jiang et al., 2005 for details.) is the rst published tting algorithm
for facial reconstruction. It not only optimizes the shape and tex-
ture parameters, alpha and beta, but also 22 other rendering
parameters including pose angles (3 parameters), 3D translation
(4 parameters), focal length (1 parameter), ambient light intensi-
ties (3 parameters), directed light intensities (3 parameters), the
angles of the directed light (2 parameters), color contrast (1 param-
eter), and gains and offsets of color channels (6 parameters). Its
main drawback is low efciency. It is reported that it requires
around 4.5 min to perform parameter estimation on a 2 GHz Pen-
tium IV (Romdhani et al., 2004).
5.2. Inverse compositional image alignment with 3DMM
(ICIA + 3DMM)
The inverse compositional image alignment (ICIA) algorithm,
introduced Baker and Matthews (2001), is an efcient 2D image
alignment method based on the LucasKanade matching algorithm
(Baker et al., 2002), one of the most widely used techniques in
computer vision for such applications as optical ow, tracking, mo-
saic construction, medical image registration, face coding, etc. As
originally published, ICIA was merely capable of tting 2D images.
However, Romdhani and Vetter (2003) have extended ICIA to t
the 3D morphable model (3DMM) Blanz and Vetter (1999) using
the correspondence between the input and UV-map image of the
3DMM (called the reference frame in Romdhani and Vetter,
2003). It is reported that ICIA + 3DMM completes the tting pro-
cess within an average of 30 s (Romdhani et al., 2004) on the same
machine as the SNO. Romdhani et al. (2005) identied two draw-
backs of ICIA + 3DMM. Firstly, the adaptation of ICIA to 3DMM
causes ICIA to lose its original efciency, although the accuracy is
improved. The efciency and accuracy of this algorithm are dis-
cussed in Romdhani et al. (2004), Romdhani et al. (2005). Secondly,
the algorithm is not able to handle direct light sources due to the
fact that it does not permit shading as SNO does.
5.3. Linear shape and texture tting algorithm (LiST)
This algorithm is similar to ICIA + 3DMM and has been reported
to be ve times faster than the SNO algorithm.
5
We are not aware of
any studies that compare ICIA + 3DMM and LiST. The efcient and
accurate estimation obtained by LiST is mainly based on the unique
correspondence between the UV map (model reference frame) and
the input image. The assumption made here is that the correspon-
dence achieved by using a 2D optical ow algorithm Bergen and
Hingorani (1990) can be modeled by a bilinear relationship be-
tween the 2D vertices projection (sic) and the shape (alpha) and rigid
parameters (Romdhani et al., 2005). The texture and illumination
parameters are also recovered using this correspondence. LiST per-
forms slightly different rigid parameter modeling from SNO: it uses
a weak-perspective projection while SNO uses a normal perspective
projection. Romdhani et al. (2005) claim that the correspondence
concept, which dominates LiST, sacrices the accuracy of illumina-
tion and texture recovery due to the fact that no shading information
is used in LiST in contrast to SNO. This is one of the drawbacks of the
Fig. 2. Three examples of a UV map belonging to three different individuals. The three individuals possess different 3D shapes according to each individuals different facial
appearance. However, we can see that these shape differences do not affect the geometry observed in their UV maps. The only differences for the three different individuals in
the UV maps are the pixel intensities (rgb color values). When these UV maps are transformed into 3D space by combining texture information with the shape information at
each indexed vertex, the faces of the three persons will appear signicantly different.
5
As mentioned above, it was reported that SNO requires 4.5 min on average to
complete the tting process. Based on the assumption that LiST works 5 times faster
than SNO, LiST will require about 54 s (=4.5 min
*
(60 s/min)/5times). Accordingly,
ICIA + 3DMM is the most efcient of the three analysis-by-synthesis approaches.
M.D. Levine, Y. (Chris) Yu/ Pattern Recognition Letters 30 (2009) 908913 911
LiST algorithm. Moreover, Gill and Levine (2005) attempted to
implement the LiST algorithm but the synthesized images they ob-
tained were unsatisfactory. They claim that one of the factors caus-
ing this may be the accuracy of the correspondence obtained by
using optimal ow. Essentially, LiST has the same drawbacks as
ICIA + 3DMM.
6. Comparison of the three analysis-by-synthesis methods
With reference to SNO, ICIA + 3DMM, and LiST, we note that
these three algorithms share the following similarities:
1. There are no specic constraints on the input image and the
input face can be at an arbitrary pose angle. Five or more man-
ual feature points are needed for algorithm initialization.
2. The idea is to t existing 3D morphable models to 2D images by
nding the optimal alpha (shape parameters) and beta (texture
parameters) of the 3D morphable model (3DMM) plus the rele-
vant shape transformation parameters (referred to as the rigid
parameters by Blanz and Vetter, 1999): rotation matrix, scale,
focal length of the camera and translation vector and so on.
3. A gradient descent algorithm is employed for minimizing the
non-linear optimization objective function. Furthermore, dur-
ing the iterative tting process, the updated alpha and beta
parameters are highly correlated and synchronous.
4. The reconstructed 3D model is entirely determined by the esti-
mated parameters: alpha (shape parameters), beta (texture
parameters) and relevant shape transformation parameters.
Alpha and beta are assumed to be sufcient for face recognition
purposes. Assume a
i
2 R
100
and

b
i
2 R
100
represent the recon-
structed 3DMM shape and texture parameters of image I and
similarly for image j, a
j
and

b
j
. We further dene
c
*
i
a
i
;

b
i
2 R
200
and c
*
j
a
j
;

b
j
2 R
200
. The recognition deci-
sion is simply based on the similarity score between c
*
i
and
c
*
j
. In other words, since the alpha and beta parameters com-
pletely and uniquely describe the 3D facial model, there is no
need to reconstruct the face and then use the reconstructed
3D model for face recognition. The parameters can be used
directly.
Blanz and Vetter referred to the above methodology as analy-
sis-by-synthesis. Modeling using 3DMM accomplishes the synthe-
sis by nding the required parameters ( a and

b and recognition
based on these parameters takes care of the analysis task.
7. 3D supported 2D models
Jiang et al. (2005) discuss SAIMC, which is also based on the 3D
morphable model but the reconstruction approach is somewhat
different from SNO, ICIA + 3DMM and LiST. The difference arises
due to the following aspects:
1. SAIMC restricts the input image to a single 2D frontal facial
image but does not permit arbitrary facial poses. It requires
84 feature points to initialize the 3D reconstruction algorithm.
2. Shape and texture parameter updates are completely separate.
SAIMC still contains a step to evaluate alpha parameters
a 2 R
100
but assumes that the reconstructed shape is not accu-
rate enough in the xy plane and so performs an additional
interpolation correction step (e.g., Kriging interpolation). The
outcome of this correction step is that the estimated alpha
parameters are modied to represent the nal reconstructed
shape and the signicance of the earlier estimated alpha param-
eters is lost. (This is due to the fact that correction step modies
the (x, y) values of the shape obtained from alpha.) As a result T
a
b
l
e
1
C
o
m
p
a
r
i
s
o
n
o
f
t
h
e
f
o
u
r
e
x
i
s
t
i
n
g
f
a
c
i
a
l
r
e
c
o
n
s
t
r
u
c
t
i
o
n
m
e
t
h
o
d
s
b
a
s
e
d
o
n
a
s
i
n
g
l
e
f
a
c
i
a
l
i
m
a
g
e
.
A
n
a
l
y
s
i
s
-
b
y
-
s
y
n
t
h
e
s
i
s
3
D
s
u
p
p
o
r
t
e
d
2
D
m
o
d
e
l
s
S
N
O
I
C
I
A
+
3
D
M
M
L
i
S
T
S
A
I
M
C
I
n
p
u
t
i
m
a
g
e
A
r
b
i
t
r
a
r
y
v
i
e
w
A
r
b
i
t
r
a
r
y
v
i
e
w
A
r
b
i
t
r
a
r
y
v
i
e
w
F
r
o
n
t
a
l
v
i
e
w
I
n
i
t
i
a
l
i
z
a
t
i
o
n
7
$
8
f
e
a
t
u
r
e
p
o
i
n
t
s
7
$
8
f
e
a
t
u
r
e
p
o
i
n
t
s
7
$
8
f
e
a
t
u
r
e
p
o
i
n
t
s
8
7
f
e
a
t
u
r
e
p
o
i
n
t
s
S
h
a
p
e
r
e
c
o
n
s
t
r
u
c
t
i
o
n
G
r
a
d
i
e
n
t
d
e
s
c
e
n
t
t
e
c
h
n
i
q
u
e
s
S
t
o
c
h
a
s
t
i
c
v
e
r
s
i
o
n
o
f
N
e
w
t
o
n

s
m
e
t
h
o
d
L
e
v
e
n
b
e
r
g

M
a
r
q
u
a
r
d
t
a
p
p
r
o
x
i
m
a
t
i
o
n
L
e
v
e
n
b
e
r
g

M
a
r
q
u
a
r
d
t
a
p
p
r
o
x
i
m
a
t
i
o
n
N
o
S
h
a
p
e
a
l
i
g
n
m
e
n
t
E
v
e
r
y
p
i
x
e
l
i
s
i
n
v
o
l
v
e
d
E
v
e
r
y
p
i
x
e
l
i
s
i
n
v
o
l
v
e
d
E
v
e
r
y
p
i
x
e
l
i
s
i
n
v
o
l
v
e
d
O
n
l
y
8
7
f
e
a
t
u
r
e
p
o
i
n
t
s
a
r
e
i
n
v
o
l
v
e
d
S
h
a
p
e
c
o
r
r
e
c
t
i
o
n
s
t
e
p
N
o
N
o
N
o
Y
e
s
,
c
o
m
p
l
e
t
e
d
b
y
K
r
i
g
i
n
g
i
n
t
e
r
p
o
l
a
t
i
o
n
D
o
a
p
a
r
a
m
e
t
e
r
s
c
o
n
t
r
o
l
t
h
e
p
r
o
p
e
r
t
i
e
s
o
f
t
h
e
w
h
o
l
e
r
e
c
o
n
s
t
r
u
c
t
e
d
s
h
a
p
e
?
Y
e
s
Y
e
s
Y
e
s
N
o
T
e
x
t
u
r
e
r
e
c
o
v
e
r
y
M
e
t
h
o
d
o
l
o
g
y
F
i
n
d
o
p
t
i
m
a
l
t
e
x
t
u
r
e
b
F
i
n
d
o
p
t
i
m
a
l
t
e
x
t
u
r
e
b
F
i
n
d
o
p
t
i
m
a
l
t
e
x
t
u
r
e
b
N
o
t
c
l
e
a
r
s
i
n
c
e
n
o
m
e
t
h
o
d
i
s
p
r
o
v
i
d
e
d
R
e
l
a
t
i
o
n
s
h
i
p
t
o
s
h
a
p
e
r
e
c
o
n
s
t
r
u
c
t
i
o
n
C
o
m
p
u
t
e
d
s
i
m
u
l
t
a
n
e
o
u
s
l
y
C
o
m
p
u
t
e
d
s
i
m
u
l
t
a
n
e
o
u
s
l
y
C
o
m
p
u
t
e
d
s
i
m
u
l
t
a
n
e
o
u
s
l
y
C
o
m
p
u
t
e
d
s
e
p
a
r
a
t
e
l
y
F
a
c
i
a
l
t
e
x
t
u
r
e
S
y
n
t
h
e
s
i
z
e
d
S
y
n
t
h
e
s
i
z
e
d
S
y
n
t
h
e
s
i
z
e
d
A
c
t
u
a
l
E
f

c
i
e
n
c
y
4
.
5
m
i
n
3
0
s
5
4
s
1
0
s
Q
u
a
n
t
i
t
a
t
i
v
e
a
n
a
l
y
s
i
s
o
f
r
e
c
o
n
s
t
r
u
c
t
i
o
n
a
c
c
u
r
a
c
y
N
o
N
o
N
o
N
o
F
a
c
e
r
e
c
o
g
n
i
t
i
o
n
B
a
s
e
d
o
n
a
,
b
B
a
s
e
d
o
n
a
;
b
B
a
s
e
d
o
n
a
,
b
B
a
s
e
d
o
n
s
y
n
t
h
e
s
i
z
e
d
2
D
i
m
a
g
e
s
912 M.D. Levine, Y. (Chris) Yu/ Pattern Recognition Letters 30 (2009) 908913
we cannot use alpha for the recognition decision as is done in
SNO, ICIA + 3DMM and LiST. For texture retrieval, there is no
specic information provided in the paper. Jiang et al. simply
mention that the 2D image is projected orthogonally to the 3D
geometry to generate the texture.
3. In SAIMC, the estimated alpha parameters do not represent the
nal reconstructed shape and no beta parameters are returned.
This is why the analysis framework of SNO, ICIA + 3DMM and
LiST does not hold for SAIMC.
4. For 2D face recognition, SAIMC uses reconstructed 3D face mod-
els to synthesize 2D images by projecting the 3D model onto
the 2D plane. Synthesized images are used as well for training
data.
8. Conclusions
Although both major frameworks involve shape parameter esti-
mation, the procedures are signicantly different. To begin with,
the 3D supported 2D models framework requires several feature
points on the input image (Jiang et al. use 87). The step that
achieves shape parameter estimation, referred to as shape
alignment in the 3D supported 2D model approach, ts the 3D
morphable model (3DMM) to these feature points only, while
non-feature points are not considered by the shape alignment
procedure. This is appreciably different from the analysis-by-syn-
thesis framework that updates shape parameters globally (all pix-
els in 2D images will be taken into consideration by the tting
procedure). This is why the 3D supported 2D models framework
does not require a gradient descent technique, the most time-con-
suming step, to minimize a non-linear objective function as do
SNO, ICIA + 3DMM, and LiST. But the ensuing superior efciency
of 3D supported 2D models more or less achieved by sacricing
the accuracy of shape reconstruction. In SAIMC, a shape alignment
is used to reconstruct the shape based on the manually selected 84
feature points. Non-feature points do not contribute to the actual
reconstruction but the correction does improve the accuracy of
the (x, y) coordinates of the overall estimated shape. Unfortunately,
the z (depth) information, which is dominated by the alpha param-
eters obtained in the earlier shape alignment process, cannot be
xed. This is the main predicament associated with the correc-
tion. Moreover, the relevant shape transformation parameters
(e.g., rotation matrix, scale, focal length of the camera, and transla-
tion vector) are tainted in the 3D supported 2D models case be-
cause only one frontal 2D image is used as the input.
As was explained above, the analysis-by-synthesis and 3D
supported 2D models frameworks are almost completely different
from each other except for the fact that both utilize a 3D morph-
able model (3DMM). 3DMM is the only intersecting aspect of these
two reconstruction frameworks. The efciency of the algorithms is
estimated as
SNO: 4.5 min
ICIA + 3DMM: on average 30 s.
LiST: 54 s (rough estimate).
SAMIC: 10 s and fteen times faster than LiST
Table 1 summarizes the qualitative differences between the
four methods.
Finally, yet importantly, in Table 1, we observe that no quanti-
tative or comparative analysis of the reconstruction accuracy for
any of the four methods has appeared in the literature. This re-
mains to be done in the future.
Acknowledgements
The authors would like to acknowledge the nancial support of
the Natural Sciences and Engineering Research Council of Canada
(NSERC).
References
Baker, S., Matthews, I., 2001. Equivalence and efciency of image alignment
algorithms. In: IEEE Comput. Soc. Conf. on Computer Vision and Pattern
Recognition (CVPR01), vol. 1. pp. 10901097.
Baker, S., Gross, R., Matthews, I., 2002. LucasKanade 20 years on: A unifying
framework: Part 1. Technical Report CMU-RI-TR-02-16. Robotics Institute,
Carnegie Mellon University.
Bergen, J.R., Hingorani, R., 1990, Hierarchical motion-based frame rate conversion.
Technical Report. David Sarnoff Research Center, Princeton, NJ.
Blanz, V., 2001. Automatische Rekonstruction der Dreidimensionalen Form von
Gesichtern aus einem Einzelbild. Ph.D. Thesis, Universitat Tubingen, Germany.
Blanz, V., Vetter, T., 1999. A morphable model for the synthesis of 3D faces. In: Proc.
26th Annu. Conf. on Computer Graphics and Interactive Techniques. SIGGRAPH.
pp. 187194.
Blanz, V., Vetter, T., 2003. Face recognition based on tting a 3D morphable model.
IEEE Trans. Pattern Anal. Machine Intell. 25 (9), 10631074.
Bowyer, K.W., Chang, K., Flynn, P., 2004. A survey of approaches and challenges in
3D and multi-modal 3D + 2D face recognition. Comput. Vision Image Und. 101
(1), 115.
Brady, M., Yuille, A.L., 1983. An extremum principle for shape from contour. IEEE
Trans. Pattern Anal. Machine Intell. 6 (3), 288301.
Bregler, C., Hertzmann, A., Biermann, H., 2000. Recovering non-rigid 3D shape from
image streams. In: Proc. IEEE Comput. Soc. Conf. on Computer Vision and
Pattern Recognition, vol. 2. pp. 690696.
Gill, G.S., Levine, M.D., 2005. Searching for the holy grail: A completely automated
3D morphable model. Technical Report, March 15, 2005. Department of
Electrical & Computer Engineering & Center for Intelligent Machines, McGill
University Montreal, Canada. <http://www.cim.mcgill.ca/~levine/reports.php>.
Himaanshu, G., RoyChowdhury, A.K., Chellappa, R., 2004. Contour-based 3D face
modeling from a monocular video. In: British Machine Vision Conference,
BMVC04, September 79. Kingston University, London.
Hu, Y., Jiang, D., Yan, S., Zhang, L., Zhang, H., 2004. Automatic 3D reconstruction for
face recognition. In: Proc. 6th IEEE Int. Conf. on Automatic Face and Gesture
Recognition. pp. 843848.
Jiang, D., Hu, Y., Yan, S., Zhang, L., Zhang, H., Gao, W., 2005. Efcient 3D
reconstruction for face recognition. Pattern Recogn. 38 (6), 787798.
Matusik, W., Buehler, C., Raskar, R., Gortler, S.J., McMillan, L., 2000. Image-based
visual hulls. In: Proc. Int. Conf. on Computer Graphics and Interactive
Techniques, SIGGRAPH, 2000. pp. 369374.
Moghaddam, B., Lee, J.H., Pster, H., Machiraju, R., 2003. Model-based 3D face
capture with shape-from-silhouettes. In: IEEE Int. Workshop on Analysis and
Modeling of Faces and Gestures (AMFG), Bice, France. pp. 2027.
Pollefeys, M., 1999. Metric 3D Surface Reconstruction from Uncalibrated Image
Sequences. Ph.D. Thesis, Katholieke Universiteit Leuven.
Romdhani, S., Vetter, T., 2003. Efcient, robust and accurate tting of a 3D
morphable model. In: IEEE Int. Conf. on Computer Vision, vol. 2, no. 1. pp. 59
66.
Romdhani, S., Blanz, V., Vetter, T., 2002. Face identication by tting a 3D
morphable model using linear shape and texture error functions. In: Proc.
ECCV, vol. 4. pp. 319.
Romdhani, S., Blanz, B., Basso, C., Vetter, T., 2004. Morphable Models of Faces. In:
Li, S.Z., Jain, A.K. (Eds.), Handbook of Face Recognition. Springer, New York. p.
395.
Romdhani, S., Pierrard, J.S., Vetter, T., 2005. 3D morphable face model, a unied
approach for analysis and synthesis of image. In: Zhao, W., Rama Chellappa, R.
(Eds.), Face Processing: Advanced Modeling and Methods. Elsevier, p. 768.
Scheenstra, A., Ruifrok, A., Veltkamp, R., 2005. A survey of 3D face recognition
methods. In: Fifth Int. Conf. on Audio- and Video-Based Biometric Person
Authentication. Rye Brook, New York.
Zhao, W.Y., Chellappa, R., 2000. SFS based view synthesis for robust face
recognition. In: Proc. IEEE Int. Automatic Face and Gesture Recognition. pp.
285292.
Zhou, S., Chellappa, R., 2005. Beyond a single still image: Face recognition from
multiple still images and videos. Face Processing: Advanced Modeling and
Methods. Academic Press Inc, New York. p. 547.
M.D. Levine, Y. (Chris) Yu/ Pattern Recognition Letters 30 (2009) 908913 913

You might also like