You are on page 1of 6

880 Brief Communication

Categorizing sex and identity from the biological motion


of faces
H. Hill and A. Johnston
Head and facial movements can provide valuable identity independently of the underlying shape and tex-
cues to identity in addition to their primary roles ture of the face. Similarly, in computer-animated films, a
in communicating speech and expression [18]. Here characters expressions and voice can be derived from an
we report experiments in which we have used actor, as in Tom Hanks performance as Woody in Pixars
recent motion capture and animation techniques to Toy Story [10]. The characters face and head movements
animate an average head [9]. These techniques mimic those of the actor even though their underlying
have allowed the isolation of motion from other cues shapes are quite different. We report experiments in
and have enabled us to separate rigid translations which we computer-animated an average head with move-
and rotations of the head from nonrigid facial ments captured from real people in order to investigate
motion. In particular, we tested whether human whether motion provides useful information for categoriz-
observers can judge sex and identity on the basis of ing faces.
this information. Results show that people can
discriminate both between individuals and between The animation process is illustrated and described in Fig-
males and females from motion-based information ure 1. Four different movement sequences were captured
alone. Rigid head movements appear particularly for each of twelve actors (we use the term as a short-
useful for categorization on the basis of identity, handthe volunteers were not trained actors) and were
while nonrigid motion is more useful for used to animate the same three-dimensional model of an
categorization on the basis of sex. Accuracy for average face [9]. Each animation was of a person telling
both sex and identity judgements is reduced when a two-line question-and-answer joke to another individual
faces are presented upside down, and this finding (e.g., Why do cows have bells? Because their horns dont
shows that performance is not based on low-level work!). This activity was intended to elicit expressive
motion cues alone and suggests that the and natural facial gestures, expressions, and speech from
information is represented in an object-based the actor.
motion-encoding system specialized for upright
faces. Playing animations backward also reduced All the stimuli produced in this manner were physically
performance for sex judgements and emphasized identical at the start of each animation and differed only
the importance of direction specificity in admitting in the way that they moved. This allowed us to investigate
access to stored representations of characteristic motion-based information independently of other cues.
male and female movements. The technique also allowed the separation of rigid head
motion, in which the head translates and rotates but does
Address: Department of Psychology, University College London, not change shape, and nonrigid motion, in which the
Gower Street, London WC1E 6BT, United Kingdom. expression changes but the head does not move. Given
their different natures, these two components may be
Correspondence: H. Hill
E-mail: harold.hill@ucl.ac.uk
processed independently and may make different contri-
butions to the perception of face-based biological motion.
Received: 7 February 2001
Revised: 9 March 2001 By concentrating on motion, we do not intend to deny
Accepted: 18 April 2001
the importance of other sources of information for these
Published: 5 June 2001 tasks. Everyday experience and numerous experiments
show that we can recognize sex and identity from static
Current Biology 2001, 11:880885 photographs that provide no motion information. How-
ever, motion is fundamental to vision, and diagnostic dif-
0960-9822/01/$ see front matter
2001 Elsevier Science Ltd. All rights reserved.
ferences may be used by a system that makes use of any
and all available discriminating information. The available
evidence suggests motion may be particularly useful when
spatial and other cues are degraded or changed, for exam-
Results and discussion ple by presentation in photographic negative [5, 6]. In
The fact that impersonators can mimic the ways in which previous studies, researchers have used moving-point
famous people move their heads and faces demonstrates light stimuli to look at biological motion in general [11, 12]
that, in addition to the primary role they have in communi- and facial movement in particular [1, 2], but the current
cation, these movements can provide cues to sex and technique provides more natural motion information
Brief Communication 881

Figure 1

Examples of the animations used as stimuli


(please also see the Supplementary
material). The movement of the faces was
captured by a pair of digital video cameras.
The top row shows the original movement
sequence from the right hand camera, the
middle row shows an animation from the same
viewpoint, and the bottom row shows the
animation as the observers viewed it. The
middle row was not used in any of the
experiments, but we include it to facilitate
comparison between the original and the
animation. In each case, the leftmost image
shows the neutral starting position. The
following frames were taken at 1 s intervals.
This sequence was 6.7 s long, excluding pre-
and post-masks. The average length of
sequences was 7.2 s (standard deviation,
1.6 s). The motion of the 17 markers and the
pupils were automatically tracked with
Famous vTracker (Famous Technologies) from the movements of the marker. The markers on rendered with 3D Studio Max (Kinetix) for the
video footage taken from the two cameras the forehead, temples and nose were used production of 640 480 pixel 25-frames-
placed approximately 15 either side of the for defining rigid translations and rotations of per-second avi format movies. Animations
direction in which the actors were facing and the head. Because rigid motion can be fully were compressed with Radius cinepak, with
at a distance of 1 m. The cameras were characterized by the motion of a few markers 90% compression quality and a keyframe
calibrated for each recording with a in a way that nonrigid motion cannot, the rigid every 15 frames. A 10 frame mid-gray mask
calibration object of known dimensions so that component of the animations was more was added to the beginning and end of each
2D tracked positions could be converted into accurate. In both cases the timing of animation. Backward and inverted
3D positions on the basis of projective movement is accurately captured given 25 animations were rendered with the same
geometry. These were used for the animation frames per second temporal resolution, parameters as for upright forward animations,
of a three-dimensional head model produced whereas spatial properties cannot be truly but with the order or orientation of the images
as an average of 200 heads, 100 male and veridical given the differences in shape altered. In all experiments, observers could
100 female [9], with the number of vertices between actor and average head model as view the animations as many times as they
reduced to 65,525 for import into 3D Studio well as the limited spatial sampling. However, wished. Presentation was controlled by
Max. Animation was accomplished with the these limitations were constant for all Microsoft Mediaviewer, and responses were
commercially available Famous animator animations and experimental conditions and recorded manually in the rating and sorting
(Famous technologies). In this system, marker so should not have biased the results. experiments and by programs written with
positions are associated with a hotspot and Animated head models were texture- Macromedia Director for the 2-AFC and odd-
an area of influence on the model that inherit mapped with an average texture [9] and one-out tasks.

while more fully eliminating residual spatial cues, such the spatial changes occurring over time for a moving object
as the aspect ratio of the underlying face. produce a spatio-temporal signature that in itself may
be useful for recognition [14].
There are a number of ways in which motion may be
useful for distinguishing between faces. It can provide In order to investigate whether the biological motion of
indirect cues to three-dimensional shape via structure- faces provides cues to identity, we used two tasks. In
from-motion [13]. However, mathematical analysis of this the first, observers were presented with 16 animations, 4
process assumes rigid motion [13], an assumption satisfied different animations for each of 4 actors, and were asked
by head movements but violated by most facial move- to sort these into 4 equally sized groups on the basis of
ments. In previous experiments [18] the cues motion identity. Observers could view the animations in any order
provides about shape may have been important, but in as many times as they wished, and they sorted the anima-
the present experiments any such cues would be limited tions simply by moving their icons into groups on the
because the underlying shape was always the same. How- screen. All the stimuli used in this experiment were gener-
ever, differences in shape do result as a consequence of ated with movement sequences captured from male
differences in movement, and the resulting differences actors. This method ensured that the ability to do this
in shape may provide useful information. This highlights task was independent of any ability to categorize sex.
the difficulty of completely separating motion from spatial Different groups of observers saw rigid head motion (N
information. Also, differences in the ways that people 15), nonrigid facial motion (N 16), or combined rigid
move their faces may result in useful and reliable differ- head and nonrigid facial motion (N 16). The observers,
ences in low-level image motion. Some people may move like the actors, were recruited from the student population
the whole or parts of their faces more than others. Lastly, of University College London. The task, like recognition,
882 Current Biology Vol 11 No 11

Figure 2 motion (p .1). Rigid head motion appears particularly


useful for categorizing people on the basis of identity.
The difference between performance in the rigid head
motion alone and nonrigid facial motion alone conditions
was significant (p .05), and the other two differences
between conditions were marginally significant (p .1).
Nonrigid facial motion is less useful than rigid motion
for characterizing individuals. It is possible that nonrigid
facial movement could interfere with identity judgements
in this task given that the facial speech of two people
telling the same joke or making the same expression may
be more similar in many ways than that of the same person
telling different jokes or making different expressions.
Although this factor was not fully balanced, two jokes
were told by all of the actors used in this experiment, so
in order to test whether subjects grouped faces on the
basis of the joke told, we scored the data according to
The results of the identity sorting experiment, in which four different
animations of four actors were sorted into equally sized groups whether examples of different actors telling these jokes
on the basis of identity. Different groups of observers saw nonrigid were grouped together. In this case, sorting scores were
facial motion alone (N 16), rigid head motion alone (N 15), or no different than what would have been due to chance.
both types of motion combined (N 16). Performance was scored This result shows that observers were not sorting on the
according to how many other examples of the same actor were put
in the same group for each animation. This gives a maximum score of basis of the joke told and validates the sorting task and
48 (4 actors 4 examples 3 other examples in the group). The scoring method.
minimum score, when each group contains only one example of each
actor, is 0. Chance, calculated by the Monte Carlo method of
generating and averaging the score for 10,000 of the 16! possible We also used an odd-one-out task to test whether motion
ways to sort 16 animations, was 9.6. We also tested whether average provides cues to identity. Twelve naive observers were
scores for our groups of observers were significantly different from presented with seventy-two trials each consisting of three
chance by calculating the proportion of times that the observed
scores were exceeded by random samples of the same size. Combined
animationstwo different examples of one person and
rigid and nonrigid and rigid alone stimuli were both sorted one example of a different person of the same sex. The
significantly better than chance would have allowed (p .05), and observers task was to identify the animation derived from
nonrigid stimuli alone were sorted marginally better (p .1). Error the unique individualthe odd one out. Observers initi-
bars show standard errors.
ated presentation of the stimuli and responded by using
an application written in Macromedia Director. In order
to provide clues as to the critical properties of the motion
required that the description of the motion that observers information used for this task, we compared inverted and
recovered was stable enough to generalize over different backward play to normal (forward and upright) presenta-
examples of the same face while sensitive enough to dis- tion for stimulus triplets. Inverted presentation leaves
tinguish between examples of different faces [15]. This low-level motion cues the same but is well known to
task allowed us to investigate the motion information adversely affect many aspects of face processing [17]. Play-
essential for recognition independently of memory-based ing an animation backward uses the same static frames
or cue conflict effects that would be involved in recogni- as the same animation played forward but changes the
tion per se. overall pattern of movement. Both inverted and backward
play test the extent to which performance can be achieved
Performance in this task was scored for each animation on the basis of perceptual matching or whether stored
according to how many other examples of the same person knowledge is needed, as they leave the perceptual similar-
were sorted into the same group (Maximum score 48, ities available for matching the same but might be ex-
minimum 0, chance 9.6 by Monte Carlo simulation). pected to affect access to stored knowledge about how
Results are summarized in Figure 2, and details of the faces normally appear. All the stimuli used in this experi-
scoring system are given in the legend. In order to avoid ment contained both rigid head and nonrigid facial
having to make assumptions about the scoring distribu- motion.
tion, we used Monte Carlo and Bootstrap resampling sta-
tistical methods to analyze the data [16]. There was a As can be seen from the results summarized in Figure 3,
significant effect of type of motion shown (p .05), with the accuracy with which the odd one out was identified
performance significantly above chance for rigid motion depended upon how the stimuli were presented. A one-
and for combined rigid and nonrigid motion (p .05). way repeated-measures ANOVA on the proportion of cor-
Performance was marginally above chance for nonrigid rect responses showed a main effect of the presentation
Brief Communication 883

Figure 3 Figure 4

The results of an experiment in which observers rated the sex of


animations on a 6 point scale with 1 indicating definitely male and
The results of the odd-one-out experiment in which observers (N
6 indicating definitely female. The same observers took part in the
12) had to choose which animation corresponded to the unique
same conditions as described for Figure 2. Ratings for male and
individual from a choice of three. The other two animations were two
female items were significantly different, but the amount of difference
different examples of another individual. Performance was above
depended on the type of movement shown. Rigid head movements alone
chance in all conditions but significantly worse when animations were
appear least useful for discriminating sex. Error bars show standard
shown inverted. Error bars show standard errors.
errors.

condition [F(2,22) 8.7, p .05]. Post hoc paired


t tests showed that playing inverted stimuli produced tions and relate them to their knowledge of sex differ-
significantly worse performance than normal [t(11) 4.4, ences. Sex judgements cannot be achieved on the basis
p .05] or backward [t(11) 4.4, p .05] presenta- of perceptual matching alone, as they require access to
tion. Normal and backward presentation did not differ stored knowledge.
from each other (p .1). One-sample t tests showed
that performance was significantly above chance (33%) in
all conditions [normal: t(11) 8.9, p .05; backward: In the first sex judgement experiment, 48 observers rated
t(11) 7.7, p .05; and inverted: t(11) 5.7, p .05]. the sex of all 48 animations on a scale of 1 to 6, with 1
The detrimental effect of inversion shows that perfor- indicating definitely male and 6 definitely female (or vice
mance is not based upon low-level properties of motion versa for half the observers). Observers controlled presen-
alone (for example, gross amount of head motion), as this tation of the stimuli by using Microsoft Mediaviewer and
information would be recoverable as easily from inverted responded by hand on a prepared ratings form. Three
stimuli. Instead, it appears that identity specific-motion groups, each with a different set of 16 observers, saw
information is processed by a system tuned to upright nonrigid facial motion, rigid head motion, or combined
faces. Backward play did not affect performance, and this rigid head and nonrigid facial motion (the observers were
result shows that even when played backward, motion the same as those who subsequently sorted the stimuli
contains information that allows us to discriminate be- according to identity, as reported above).
tween individuals. Either discriminating cues are static,
direction independent, and/or temporally symmetric, or
playing animations backward generates new but equally Results are summarized in Figure 4. Observers rated male
discriminable patterns of movement. Previous evidence and female faces differently, and their ability to do this
showing that we are less good at recognizing faces that depended on the type of motion information available.
have been learned normally from videos played backward Analysis of variance confirmed this pattern of results by
[7] favors the latter explanation. showing a significant interaction between the type of mo-
tion and the sex of the face [F(2,45) 3.4, p .05]. There
To extend the evidence obtained from the identity-based were simple main effects of sex for combined [F(1,45)
tasks used so far, we also tested whether observers could 40.0, p .05], facial alone [F(1,45) 26.9, p .05]
recover information about the sex of the actors from these and head alone [F(1,45) 7.5, p .05] conditions, and
animations. No training was given, so any ability to do these results show that all types of motion contained use-
this depended both on there being differences between ful cues to sex. One-sample t tests comparing ratings to
the ways males and females move their faces and on the theoretically neutral rating value of 3.5 showed that
observers being able to extract these cues from the anima- stimuli with combined motion were rated significantly
884 Current Biology Vol 11 No 11

Figure 5 in any of the previous experiments were presented with


pairs of stimuli, one male and one female, and had to
decide which was which. Stimuli were presented in two
blocks, with upright and inverted stimulus pairs randomly
interleaved in one block and forward and backward stimu-
lus pairs randomly interleaved in the other. The order of
the blocks was balanced, and both stimuli in a pair were
always shown in the same condition. All stimuli contained
both rigid head and nonrigid facial movements.

Results are summarized in Figure 5, with the percentage


of correct categorizations for normal stimuli collapsed
across both blocks. Paired t tests showed no differences
between presentation conditions. However, one-sample
t tests showed that only for stimulus pairs presented nor-
mally was performance significantly above chance (50%);
t(13) 5.6, p .05 and t(13) 3.2, p .05 in the
Results of the 2-AFC sex judgement task. Observers (N 14) saw
pairs of animations, one male and one female, and had to indicate blocks with inverted and backward stimuli, respectively.
which was which. Performance was significantly better than chance Performance for inverted stimuli was marginally above
would have allowed only when stimuli were played normally (upright chance, with t(13) 1.9 and p .1. Levels of performance
and forward), although inverted faces were also categorized marginally were not high with these stimuli because most of the
better than chance would have allowed. The detrimental effect of
playing animations backward highlights the importance of the pattern
normal spatial cues to sex, including color and shape [18],
of movement for sex judgements as opposed to the low-level motion are kept constant. Static or low-level motion cues alone
or static cues that remain the same when animations are played cannot explain the pattern of performance observed, as
backward. Error bars show standard errors. these remained the same between presentation condi-
tions. Instead, there appear to be direction-specific pat-
terns of movement for upright faces that differentiate
between male and female.
differently from neutral for both male [t(15) 5.7, p
.05] and female [t(15) 3.4, p .05] items, as were
stimuli containing only nonrigid facial motion [male: Conclusions
t(15) 5.9, p .05; female: t(15) 3.4, p .05]. With The results show that both rigid head and nonrigid facial
rigid head motion alone, male stimuli items were rated movements provide useful information for categorizing
as marginally differentl from neutral [t(15) 1.9, p both sex and identity. There are differences in the ways
.07], but female items were not rated as significantly dif- that people move their heads and their faces, and we can
ferent from neutral (p .1). For this task, nonrigid motion recover and use these identity cues. Rigid head move-
produced better performance than rigid motion, and this ments appear to be particularly useful for distinguishing
pattern is opposite to that found for the identity-sorting between individuals, and nonrigid motion appears to be
task. This suggests that the differences reported between useful for categorizing on the basis of sex. This may be
the two types of information are not simply a function of because rigid head movements can be idiosyncratic, while
any limitation in our animation of nonrigid motion. The most nonrigid facial motion is functionally related to spe-
results contrast with previous evidence from experiments cific aspects of speech and expression, which have anyway
in which point light stimuli were used. In these experi- to be processed independently from identity [19]. Effects
ments, nonrigid motions were found to be more useful of inversion show that low-level motion cues are not suffi-
for sex judgements, with no difference for identity judge- cient to explain performance and suggest that dynamic
ments [3]. However, in this previous work rigid motions information is encoded by a model-based system special-
were posed nods, shakes, and rocks of the head, not natu- ized for upright faces. Backward movement, although dis-
rally occurring movements, and they would have provided criminable, disrupts sex judgements, and this result shows
clues to underlying differences in shape not available that the direction of patterns of movement can be critical.
here. Performance in the previous study was also above
Supplementary material
chance and at a similar level to that reported here, Examples of animations used as stimuli are available on the internet at
61.9% [3]. http://images.cellpress.com/supmat/supmatin.htm.

We further investigated sex judgements by using a two- Acknowledgements


alternative forced-choice task (2-AFC), a task that avoids This work was supported by the the Engineering and Physical Sciences
Research Council (MS2715). Thanks go to Peter McOwan, Szonya Durant,
any bias associated with the shape and texture of the Colin Clifford, and Branka Spehar for comments on drafts of the paper and
average head. Fourteen observers who had not taken part to Glyn Cowe for technical assistance.
Brief Communication 885

References
1. Basilli J: Facial motion in the perception of faces and emotional
expression. J Exp Psychol 1978, 4:373-379.
2. Basilli J: Emotion recognition: The role of facial movement and
the relative importance of upper and lower areas of the
face. J Pers Soc Psychol 1979, 37:2049-2058.
3. Bruce V, Valentine T: When a nods as good as a wink: The role
of dynamic information in facial recognition. In: Practical
aspects of memory: Current research and issues, vol. 1. Edited by
Gruneberg MM, Morris PE, Sykes RN. Chichester, UK: Wiley;
1988:169-174.
4. Christie F, Bruce V: The role of dynamic information in the
recognition of unfamiliar faces. Memory and Cognition 1988,
26:780-790.
5. Knight B, Johnston A: The role of movement in face recognition.
Visual Cognition 1997, 4:265-273.
6. Lander K, Christie F, Bruce V: The role of movement in the
recognition of famous faces. Memory and Cognition 1999,
27:974-985.
7. Lander K, Bruce V: Recognizing famous faces: exploring
the benefits of facial motion. Ecological Psychology 2000,
12:259-272.
8. Pike GE, Kemp RI, Towell NA, Phillips KC: Recognizing moving
faces: the relative contribution of motion and perspective
view information. Visual Cognition 1997, 4:409-437.
9. Vetter T, Troje N: Separation of texture and shape in images
of faces for image coding and synthesis. J Opt Soc Amer
1997, 14:2152-2161.
10. Smith AR: Digital humans wait in the wings. Scientific American
2000, 283:72-78.
11. Johannsson G: Visual perception of biological motion and a
model for its analysis. Percept Psychophys 1973, 14:201-211.
12. Johannsson G: Studies of perception of locomotion. Perception
1977, 6:365-376.
13. Ullman S: The interpretation of visual motion. Cambridge,
Massachusetts: MIT Press; 1979.
14. Stone JV: Object recognition using spatiotemporal signatures.
Vision Res 1998, 38:947-951.
15. Marr D, Nishihara HK: Representation and recognition of three-
dimensional shapes. Proc R Soc Lond B Biol Sci 1978,
200:269-294.
16. Efron B: Computer-intensive methods in statistics. Scientific
American 1983, 248:116-130.
17. Valentine T: Upside-down faces: a review of the effect of
inversion upon face recognition. Br J Psychol 1988, 79:471-
491.
18. Hill H, Bruce V, Akamatsu S: Perceiving the sex and race of
faces: the role of shape and colour. Proc R Soc Lond B Biol
Sci 1995, 261:367-373.
19. Bruce V, Young A: Understanding face recognition. Br J Psychol
1986, 77:305-327.