You are on page 1of 29

Department of Computer Science & Engineering

SOUTHEAST UNIVERSITY

CSE4000: Research Methodology


A study on tracking faces and applying
2D masks on live camera feed

A dissertation submitted to the Southeast University in partial fulfillment of the


requirements for the degree of B. Sc. in Computer Science & Engineering

Submitted by
Jahid Hasan Rajib Roy Tasnif Taussuk
ID: 2013200000018 ID: 2013200000036 ID: 2013200000038

Supervised by
Monirul Hasan
Senior Lecturer and Coordinator,
Department of CSE
Southeast University

Copyright 2018
c
January, 2018
Letter of Transmittal
January 10, 2018

The Chairman,
Department of Computer Science & Engineering,
Southeast University,
Banani, Dhaka.

Through: Supervisor, Monirul Hasan

Subject: Submission of CSE4000

Dear Sir,

We, Jahid Hasan, Rajib Roy and Tasnif Taussuk have worked under the su-
pervision of Monirul Hasan, Senior Lecturer & Coordinator, Department of CSE,
Southeast University to conduct a research on tracking faces on live camera feed
and to apply 2D masks on the tracked faces. We have also implemented the idea
and developed an Android App based on it.
We are asking for your approval to consider our research as a partial fulfilment
of graduation requirements for B. Sc. in CSE.

Thank you.

Sincerely yours, Supervisor:

Jahid Hasan Monirul Hasan


ID: 2013200000018 Senior Lecturer and Coordinator
Batch: 35 Program: CSE Department of CSE
Southeast University

Rajib Roy
ID: 2013200000036
Batch: 35 Program: CSE

Tasnif Taussuk
ID: 2013200000038
Batch: 35 Program: CSE
Certificate
This is to certify that the research title Implementation of “A study on tracking
faces and applying 2D masks on live camera feed” has been submitted to the
respected member of the board of examiner of the faculty of Science and Engi-
neering in partial fulfillment of the requirements for the degree of Bachelor of
Science in CSE. Science and Engineering by the following student and has been
accepted. This report has been carried out under my guidance.

Author: Supervisor:

Jahid Hasan Monirul Hasan


ID: 2013200000018 Senior Lecturer and Coordinator
Batch: 35 Program: CSE Department of CSE
Southeast University

Rajib Roy
ID: 2013200000036
Batch: 35 Program: CSE

Tasnif Taussuk
ID: 2013200000038
Batch: 35 Program: CSE
Abstract

Face detection is a computing technology being utilized as a part of an assort-


ment of utilization’s that distinguishes human faces in an image. Face-detection
algorithms concentrate on the detection of frontal human faces. It is practically
equivalent to image detection in which the picture of a man is coordinated a little
bit at a time. Image matches with the cascade which gives a decision if it is a
face or not. In this study, we reviewed how face detection works and how we
implemented this machine learning process as an API which detected a face for
our application. We also presented a brief review of how we placed a 2D mask
on that detected face.

i
Acknowledgements

First of all, all acclaim to the almighty for establishing us to finish this research.

We thank our University for enabling us to finish our research.

We put on record, our sincere thanks to our supervisor Monirul Hasan sir, Co-
ordinator of the Department. His dedication and the keen interest in all his
overwhelming attitude to help his students had been solely and mainly respon-
sible for completing our work. His timely advice, meticulous scrutiny, scholarly
advice and android based approach have helped us to a very great extent to ac-
complish this research. Without his significant direction and support, it will be
exceptionally troublesome for us to complete the research.

We also like to thank the Chairman of CSE Department Shahriar Manzoor sir,
for guiding and supporting us.

We thank profusely all of the Department faculty members for their kind as-
sistance, support, co-operation, and consolation throughout our research work.

Lastly, our sentiment gratefulness to one and all, who has particularly or by
suggestion helped us to continue with our voyage.

ii
Contents

Abstract i

Acknowledgements ii

List of Tables v

List of Figures vi

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Review 4
2.1 Model Based Face Tracking . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Edge-Orientation Matching . . . . . . . . . . . . . . . . . 4
2.1.2 Hausdorff Distance . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Weak classifier cascades . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Deep Learning for Face Recognition . . . . . . . . . . . . . . . . . 7

3 Problem and Limitations 8


3.1 Face API Limitation . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Implementation 10
4.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

iii
CONTENTS

4.2 Face Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


4.3 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 On Still Image . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.2 On Live Feed . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Result and Evaluation 16


5.1 Face Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Conclusion 19

Bibliography 20

iv
List of Tables

v
List of Figures

2.1 Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.1 Detected Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


4.2 Face Tracker Sample . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Euler’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Euler Angle Preview . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.1 Detected Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


5.2 Masking on Face . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

vi
Chapter 1

Introduction

The Mobile Vision application program interface (API) gives a system for dis-
covering objects in photos and video [1]. As of now, the Mobile Vision API
incorporates face, scanner tag, and content identifiers, which can be connected
independently or together. This research is about to create an android appli-
cation and implement face detection algorithms to analyze live video or image
to detect human faces. When we have a rundown of countenances distinguished
on a picture, we can assemble data about each face, such as orientation, smiling
probability, regardless of whether somebody has their eyes open or shut, and par-
ticular landmarks points, for example, the eyes, nose, mouth, and cheeks. This
Face API just gives functionality for face detection and ability to find landmarks
on an identified face not face recognition. Characterization is deciding if a spe-
cific facial trademark is available. The Android Face API at present backings two
orders: eyes open and smiling.

1
CHAPTER 1. INTRODUCTION

1.1 Motivation

With more than 80 percent market share on mobile devices, Android is the dom-
inant mobile operating system today. People who use smart-phones are demand-
ing better applications and wants to update existing ones, which created a huge
scope for Android application development in the world. It’s Linux-based oper-
ating system, mainly developed by Google for smart-phones and various devices.
Android is open-sourced, which becomes easy for the developers and device man-
ufacturers to alter the software design according to their needs. Companies like
Google, Kairos, Microsoft and many others are creating powerful API and tools
which is really fun to work with. Besides nowadays face detection system is be-
coming increasingly important as the world becomes more interconnected, and
our identities further digitize themselves away from passwords and PIN numbers.
These activities encourage us to jump in this field. We wanted to implement our
knowledge on image processing and also wanted to learn android development.
Also, we wanted to learn more about various Android APIs and work with them.
So this is the best way to make an application for face detection in android to
make accessible to everyone.

2
CHAPTER 1. INTRODUCTION

1.2 Goal

First of all, we wanted to add an overlay on live camera footage. It likewise


indicates how we can have a custom UI for our camera and overlay pictures
over the camera preview. Our objective was to exploit more APIs which can
fit into our project. Another unique objective was to give users the option to
use a customizable mask. That means users can use his/her chose an image for
masking.

1.3 Overview

We will discuss necessary algorithms from basic to API level which is required
to detect a face on a live camera feed, how we implemented Google Face API on
our application. We will also cover briefly about how we applied a mask with the
size and rotation of detected face on the face on the live camera feed. We faced
some problems and limitation while working on this project which will also be
covered. Last but not least, we discussed how to improve this application with
some future work.

3
Chapter 2

Literature Review

This segment gives an outline of the significant human face acknowledgment meth-
ods that apply for the most part to frontal appearances, focal points and draw-
backs of every technique is additionally given. The techniques considered are edge
based methods using geometrical models, Edge-Orientation Matching, eigenfaces
(eigenfeatures), neural systems. The approaches are dissected as far as the facial
portrayals they utilized.

2.1 Model Based Face Tracking

There are some improvements in edge-based methods geometrical models. Few


of the efficient methods were published in the early 2000s [2].

2.1.1 Edge-Orientation Matching

By using Edge-Orientation Matching method face pattern can be located in the


edge orientation image rather than using the original image. Fröba and Kublbëck
extricate the edge orientation image a face model and utilize this to coordinate
against the edge orientation map of the input image. A pyramid of edge orienta-
tion fields is needed to locate different sizes of faces than the face model. False
detection often occurs when image texture or edge frequency is high because this
method uses only the edge orientation information [3].

4
CHAPTER 2. LITERATURE REVIEW

In here, an algorithm was proposed which uses skin color information and
orientation map to make face detection faster. A color image is converted to
a skin probability image using the Gaussian skin color model, from which an
orientation map is extracted. When the orientation map is matched with a pre-
generated model then the skin color information can be used to suppress the
background. As a result, false detection at high edge frequency areas can be
reduced.

2.1.2 Hausdorff Distance

The Hausdorff distance is a measurement of two point sets. It was restricted into
two dimensions since the goal is to use object detection in digital images.
 
Let, A = a1 ,. . . , am and B = b1 ,. . . , bn denote two finite point sets. Then
the Hausdorff distance is defined as,

H(A, B) = max(h(A, B), (h(B, A)), where (2.1)

h(A, B) = max min ||a − b|| (2.2)


aA bB

Here, h(A, B) is called the directed Hausdorff distance from set A to B with some
underlying norm || · || on the points of A and B.
The modified Hausdorff distance, which was proposed by Dubuisson at (2.1). It
is defined as,

1 X
hmod (A, B) = min ||a − b|| (2.3)
|A| aA bB

If the average of the single point distances was taken, this version decreases the
impact of outlines making it more suitable for pattern recognition purposes [4][5].

5
CHAPTER 2. LITERATURE REVIEW

2.2 Weak classifier cascades

Weak-classifier is a strong and most commonly used method for Object Detection.
In 2001 Paul Viola and Michael Jones proposed an algorithm which uses Haar
features on weak-classifiers to detect faces or other objects efficiently [6]. It was
a machine learning based approach and vastly used to detect a face from an
image on lower performance devices. In this weak classifiers structure, Cascading
Classifiers are prepared with a few hundred “positive” example images of a specific
object and discretionary “negative” images of a similar size. After the classifier is
prepared it can be connected to an area of a picture and recognize the face being
referred to. To search for the face in the whole edge, the search window can be
moved over the image and check each area for the classifier. This process is most
commonly used in image processing for object detection and tracking, primarily
facial detection and recognition [7] .

Figure 2.1: Cascade

6
CHAPTER 2. LITERATURE REVIEW

2.3 Deep Learning for Face Recognition

After detecting a face from weak-classifiers researchers use Deep Convolutional


Neural Network to recognize faces. In this process, over 100 measurements for
each face were generated to train the network. This measurement of each face
call “Embedding”. The training process works by looking at 3 face images at a
time:

1. Load a training face image of a known person

2. Load another picture of the same known person

3. Load a picture of a totally different person

Then the face detection algorithm looks at the measurements to tweak the neural
network slightly so that it can recognize faces in an efficient way. These steps
are repeated for a million of times for millions of images of different people to
learn the neural network reliably. This approach was invented by researchers at
Google in 2015 [8]. Although with a state of the art computers, it takes hours
to get a good accuracy. But once the network has been trained, it can generate
measurements for any face, even ones it has never seen before [9].

7
Chapter 3

Problem and Limitations

While working on this project we have faced few problems. Some of them was
because of the limitation of Face API, processing the image and other

3.1 Face API Limitation

The limitations are given below :

• Face API can only detect face, cannot recognize face.

• Face API cannot detect any face if there is rotation in X-axis.

• Face API can only detect Y axis in the range of ±36.

• Face API can only detect eyes open and be smiling in the range of ±18 of
Y-axis.

• A face needs to appear at least two sequential frame to be detect as a face.

8
CHAPTER 3. PROBLEM AND LIMITATIONS

3.2 Processing

Processing an image as a mask on a face requires a decent amount of pre-


processing. The app needs to scale, rotate and position mask. On low processing
devices, it becomes hard to put the current mask on the image on the current
frame. As different devices have different processing units, it can affect the pro-
cessing time. So, it is harder to make a unique universal tool that will work
flawlessly on every device.

9
Chapter 4

Implementation

There is two part of our project. First, we detect faces and get the required
attributes. Then we apply a 2D mask on the detected face.

4.1 Face Detection

To develop this application first we had to find or develop a suitable algorithm or


API(Application Programming Interface) for face detection. Then we found an
optimized library of android called “Mobile Vision” that gave us an API called
Face API. We get attributes like,

• Landmark position

• Face orientation which contains face rotation, height, and width

• Smiling probability

• Eyes open probability

First, we tried on detecting the face in the still image. In face API faceDetector
method builds a face detector instance note city. Then we get different face
instance. After that, we loop through all the instance to get all the positions of
a particular face.

10
CHAPTER 4. IMPLEMENTATION

Figure 4.1: Detected Face

As shown in figure 4.1, there’s some landmark on detected face. There’s also
the probability of smiling and face orientation angles. In this API getLandmarkP ossion()
gives us the position of desired landmark. For getting the smiling probability we
used getSmilingP robability() function which gives us a probability value where
we can set the threshold value of smiling acceptance.

11
CHAPTER 4. IMPLEMENTATION

4.2 Face Tracker

Face tracking extends face detection on the live video feed. In order to a face, a
face has to appear at least 2 sequential frames to be detected where we use 30
frames per second. Face tracker also provides us the same attributes like we get
from the still image. We created a custom camera UI, where we implement the
face tracker.
As shown in figure 4.2, there’s multiple of face has been detected.

Figure 4.2: Face Tracker Sample

12
CHAPTER 4. IMPLEMENTATION

4.3 Masking

4.3.1 On Still Image

We take a 2D image that will remain on top of the face. We used a custom view
for this. This custom view is a child class of view group. View group is a module
to add an overlay on a particular image. It is an invisible container which holds
different layouts. We take the detected face image as the background image and
a transparent image as a foreground image. This foreground image is our mask
that will put on the face. We set the detected face on an imageView and the mask
on the custom view. We positioned the custom view on imageView respective to
the landmark position we get.

4.3.2 On Live Feed

Scaling

We need a canvas to draw a mask over the face. We converted the mask into
a bitmap since bitmap is easier to modify and draw on the canvas. We can get
face’s size with getHeight() & getW idth(). We scaled the image according to
the face that has been detected on the current frame. If the face stays still, the
API gives us a bit different face size on every frame. We used a threshold of 5 %
percent to evade the unstable length from API. Now we need to scale the mask
image to face height and width. Though it is not the actual size we need, set an
appropriate size of the face after manually testing compare with our mask. So
we set some manual offset value to reconfigure the face size.

13
CHAPTER 4. IMPLEMENTATION

Orientation

The API gives us the top-left position of the face. For making it simpler to set
any mask on the face, we translated the given position to the center of the face.
For doing that we took half of the height and width and add that to the origin
position of the face. Since the face won’t always be still at a certain angle we
also need the Euler angle of the face. The illustrated image on Figure 4.3, was

Figure 4.3: Euler’s Formula

achived by establish a fundamental relation between trigonometrical formula and


complex formula [10]. For any real numbber x the eular formula states that,

eix = cos x + i sin x (4.1)

14
CHAPTER 4. IMPLEMENTATION

We can get those euler angle of the face through getEulerZ() & getEulerY ()
as shown on Figure 4.4.

Figure 4.4: Euler Angle Preview

As we can not rotate the bitmap on a certain angle, we need a matrix. So,
at first, we rotated the matrix respective to the face orientation angle. Then we
set the bitmap onto the matrix. Then again we convert the matrix into a bitmap
since matrix cannot be drawn on a canvas. At this point, we have the center
position of the face & appropriate mask for the current frame. For now, we set
the mask on the face with our desired position. For doing that, we measured the
offset length between the center point and the top-left regraded of the mask. At
that point, we scale this estimation on to the center regarded of the face to our
desired position to set the mask on a face appropriately. Now, this process will
work on every frame until the program has been turned off.

15
Chapter 5

Result and Evaluation

Our face detection application performance based on Google face API. It analyzes
our face detection performance and gives us actual faces landmarks position.
After completing our application Face API gives us both single and multiple
faces detect with proper landmarks. We need to know the limitations of Face
API for a good result.We discussed in details in the “Problems and Limitations”
section.

5.1 Face Tracking

These topics are a walk-through of how to manufacture an application that uses


the back confronting camera to demonstrate a perspective of the recognized faces.
We demonstrate by this app to track a few faces at the same time and draw a
rectangle around each, showing the rough position, size, and face ID of each face
in a single image or a live video. A example of this is shown on Figure 5.1.

16
CHAPTER 5. RESULT AND EVALUATION

Figure 5.1: Detected Face

5.2 Masking

For masking, We used view-group for add an overlay on a particular image and
also need to scale the mask and apply it in a background image of the same size
and then combine both background and foreground. We have done face tracking
in the live feed also.For masking the live image, we need to put the mask on every
frame. So we scale the mask with the center of the image and put the mask in
our desire position onto the face. There’s some example on Figure 5.2

17
CHAPTER 5. RESULT AND EVALUATION

Figure 5.2: Masking on Face

18
Chapter 6

Conclusion

During the last decade, the Face API presented by Google has become one of
the most helpful libraries for face detection in real time. This API behind the
two classifications is quite interesting one of the eyes open and the other one is
smiling. The main advantage of their approach is easily multiple faces detect
and find out faces landmark position and the other hand main disadvantage
is Face API does not support face recognition. As of now, the Mobile Vision
API incorporates face, standardized identification bar-code, and text detectors.
Besides, the Android developers splendid and prosperous future in front of them
in the developing software market. Because it’s open source, easily developed an
application, user-friendly and most popular in this time and increasing demand
day by day. So anybody can begin with Machine Learning/Computer vision
on portable, regardless of related knowledge. Google continues expanding their
approach to giving learning materials to developers.

19
Bibliography

[1] “Get started with the mobile vision API — mobile vision — google
developers,” Dec 2017. [Online]. Available: https://developers.google.com/
vision/android/getting-started

[2] “Face detection algorithms and techniques,” Dec 2017. [Online]. Available:
https://facedetection.com/algorithms/

[3] J. Bigun and F. Smeraldi, Audio- and video-based biometric person authen-
tication: third international conference, AVBPA 2001 Halmstad, Sweden,
June 6 - 8, 2001: proceedings. Springer, 2001.

[4] J. Blanc-Talon, Advanced concepts for intelligent vision systems: 8th inter-
national conference, ACIVS 2006, Antwerp, Belgium, September 18-21, 2006
; proceedings. Springer-Verlag, 2006.

[5] S. Z. Li, Advances in biometric person authentication: international work-


shop on biometric recognition systems, IWBRS, Beijing, China, October 22-
23, 2005: proceedings. Springer, 2005.

[6] P. Viola and M. J. Jones, Robust Real-Time Face Detection. Kluwer Aca-
demic Publishers, 2004.

[7] “Cascading classifiers,” Dec 2017. [Online]. Available: https://en.wikipedia.


org/wiki/Cascading classifiers

[8] F. Schroff, D. Kalenichenko, and J. Philbin, FaceNet: A Unified Embedding


for Face Recognition and Clustering; proceedings. Google, 2015.

[9] A. Geitgey, “Machine learning is fun! part 4: Modern face recognition with
deep learning,” Jul 2016. [Online]. Available: https://goo.gl/eUKUXQ

[10] “Euler’s formula,” Dec 2017. [Online]. Available: https://en.wikipedia.org/


wiki/Euler%27s formula

20

You might also like