You are on page 1of 31

Open Source

Facial Recognition
and Mass Surveillance
Ian O'Neill – ianoneill591@gmail.com
Proprietary Implementations

Private/Business Security

Can Identify and Log Faces

Benefits:
– Easy to Implement

Issues:
– Requires proprietary hardware
– Closed-source front/backend
– Storage issues and data portability
Proprietary Implementations

Government-Level Installations

Most visible usage is in China
– Facial recognition being used to
identify and automatically fine
jaywalkers
– Used as a means of payment, by attaching payment
account information to a user’s face
– Identifying and arresting individuals wanted by police
Implementations in the US

Homeland Advanced Recognition
Technology (HART) database will reportedly
include at least seven biometric identifiers,
including face and voice data, tattoos, DNA,
scars, “physical descriptors", and “Non-Obvious
Relationships” on as many as 500 million people

“it will offer a broader range of services to federal
government agencies, state and local law enforcement, the
intelligence community, and international partners”

Amazon sells Rekognition, their facial recognition platform,
to local police departments throughout the country

While not necessarily as visible as in other places, facial
recognition is already in widespread use in the US
Implementations in the US
How Facial Recognition
Systems Work
How Facial Recognition Works

Facial recognition begins with recognizing and
isolating individual faces
Early Face Detection Systems

Viola–Jones object detection framework, 2001

Based on levels of contrast and an average-based
map of an image

Looks for known patterns, such as defined shapes

Fast, but inaccurate without consistent inputs
Histogram of Oriented Gradients

Takes desaturated (B&W) image as input

Creates histogram which emphasizes differences
between brightness across image, regardless of
total image exposure level

Image: Adam Geitgey, “Machine Learning is Fun!”


Face Detection

Faces in input image are identified through
numerical comparison to known face-like pattern

Image: Adam Geitgey, “Machine Learning is Fun!”


Face Landmark Estimation

Model defines 68 individual
points, or landmarks that exist on
all faces

Machine learning algorithm
trained to be able to find those 68
points on new face inputs

Image: Brandon Amos, OpenFace Developer


Preparing Landmark Results

After face landmarks are detected, they are
aligned to make a less angle-specific mapping,
usable as a reference for comparisons

Image: Adam Geitgey, “Machine Learning is Fun!”


Training the Neural Network

Train a Deep Convolutional Neural Network to generate 128
measurements for each face

The Neural Network develops accuracy by comparing 3 face
images at a time
– Face image of a known person
– Another picture of the same known person
– Picture of a totally different person

Algorithm compares the measurements generated for each of
those three images

Adjusts neural network slightly to make the matching face’s
measurements closer, and unknown ones further apart

Repeat millions of times with millions of images of thousands
of different people
The Training Process

Takes a very long time on conventional
hardware

Upwards of a day, even on an
expensive NVIDIA Tesla video card.

Fortunately, a trained model, such as that provided
by OpenFace, can be applied to new data sets with
consistent accuracy

Once the training process is complete, images of
faces can be effectively identified and compared
Open-Source Facial Recognition
Frameworks

Python wrapper options like OpenFace
and face_recognition
– face_recognition provides a Python interface to the Dlib
C++ facial recognition library
(https://github.com/ageitgey/face_recognition/)

In my use case, I implemented the face_recognition
library for ease of connection to SQL database
management options
Creating a Facial
Recognition Database
Implementing an open-source
biometric database

Problems
– Gathering Techniques

Logically sourcing new faces to recognize
– Data Storage Format

Formatting facial recognition data and metadata
– Determining Means of Sorting Entries

Quickly compare faces in real time
– Database Management

Centralizing data to remote servers

Size/database scaling concerns
Data Gathering

Companies and governments may have the option of
using ID card photo databases to directly input to
facial recognition system
– Metadata such as name, age, and other personal
information can be directly connect to a face

Security systems may attempt to identify the
previous occasions a given face was seen

Specifically taking photographs of individuals to use
for a database is not the only option
Data Crawling

Social media can be crawled to add
theoretically millions of people and
matching metadata

findface.ru exploited a lack of rate-limiting
on the popular Russian social media service VK to allow
users to identify the profiles of millions of people with
just a photo of them

Social media and dating websites are information-rich,
often even including information regarding identity and
opinions which could be used to target specific groups

These same sites provide APIs to streamline this process
Camera Hijacking

Shodan, and tools like MassScan, enable the
possibility of utilizing vulnerable or public
cameras to facilitate the creation of a pirate
surveillance system
Data Farming

Through the protection that the potential
identification of faces which are “threats,”
businesses and public spaces may agree to
willingly submit or provide camera data

This access could then allow for the generation of
specific and rich metadata, such as a person’s
transit patterns and frequent behaviors

This data can have value for marketing, or for
private investigation
Data Application

Once you have, or have begun growing a
database of known faces, you need a way to
manage them

While most frameworks provide a way to save
images locally, this is inefficient in both terms of
computing speed and data storage

Instead, common DBMS solutions can be used
Data Formatting

Rather than comparing images one by one to
determine if an image is a match, we can store a
precomputed version

In this case, a face is
represented as a set
of unique coordinates

This set is generated
by the neural network
Data Sorting

Organizing facial recognition data is
unlike standard table data

Face coordinates cannot be indexed by
traditional criteria, such as numerical
position, length, average, or sum

Instead, they can be treated as a shape
existing in multidimensional space, with
128 dimensions

The factor which makes this data comparable to another piece
of similar data is the similarity of the totality of these
measurements, computed using the formula for Euclidian
Distance
Database Management

Conventional DBMS systems can be
used, so long as they provide a means
to compute Euclidian distance

PostgreSQL allows this using
”cube” sorting, but it is limited to
100 dimensions by default

Resolved by editing /contrib/cube/cubedata.h
– Change “#define CUBE_MAX_DIM (100)” to “#define
CUBE_MAX_DIM (128)”
– Recompile PostgreSQL
Database Management

Once 128-dimensional cubes are allowed in
PostgreSQL, data can be added and managed like
any other table

SQL Data can be ordered by similarity using distance
ORDER BY face_encoding <-> cube(array["+face_encoding+"])
Database Scaling

Each entry is around 4KB, meaning every 250,000 entries
represents roughly 1GB

At this size, the entire world population could be stored in
less than 30TB

Tested to around 50,000 entries, PostgreSQL retrieves
results in consistently less than a second

Specific databases and contexts could emphasize certain
factors to check first, speeding this process by limiting the
size of the comparison referenced during the calculation of
Euclidian distance

Parallel databases can be constructed to manage metadata
like location, name, and other personal information
Deploying Open-
Source Facial
Recognition Systems
Code Implementation

Python Libraries
– face_recognition to handle facial recognition
– fsycopg2 to manage database requests
– icrawler for web crawling

PostgreSQL for DBMS

Despite the daunting complexity of the technology,
current tools allow for a functional facial recognition
database with automatic expansion can be created
with only a few hundred lines of code

“Machine Learning is Fun! Part 4: Modern Face
Recognition with Deep Learning”
– https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-
with-deep-learning-c3cffc121d78


“Euclidean Distance In 'n'-Dimensional Space”
– https://hlab.stanford.edu/brian/euclidean_distance_in.html

“On the Surprising Behavior of Distance Metrics in
High Dimensional Space”
– https://bib.dbvis.de/uploadedFiles/155.pdf

Contact: Ian O'Neill – ianoneill591@gmail.com


https://creator.wonderhowto.com/takhion/

You might also like