You are on page 1of 17

A Low-Cost Embedded System for Driver

Drowsiness Detection
Humberto Aboud
December 2014
Abstract
This report presents a low-cost approach for a driver drownsiness detection embedded system. The project relies on inexpensive open source
hardware and software. The system tracks the users face and eyes to
detect patterns and infere drowsiness. The hardware architecture is composed by a Single-Board Computer (SBC) and a USB camera, and the
embedded software relies on the open source library OpenCV. This project
uses heavy image processing and depends on high optimization both on
the hardware and the software sides. The SBC chosen for this project was
powerfull enough to produce a working system but a more powerfull one
would provide better results.

Introduction

Brazils national fleet is composed by almost 50 million cars according to the


National Traffic Deparment[1]. This vehicle provides an efficient and convenient
transportation modality, however, traffic accidents are the second major cause of
deaths in Brazil, as reported by the National Observatory in Traffic Safety[2].
Only between 1980 and 2011 980,838 lifes were lost on the national roads[3].
Behind these numbers, statistics from the Center of Multidisciplinary Studies
in Somnolence and Traffic Accidents (CEMSA)[4] of the Federal University of
S
ao Paulo shows that between 26% and 32% of the countrys traffic accidents
are caused by drivers that fall asleep while driving.
To address this problem, the automobile industry develops a wide range
of safety solutions. Ford, for example, produced its own system, the Driver
Alert[6]. This system comprises a small forward-facing camera connected to an
on-board computer. The camera is mounted on the back of the rear-view mirror
and is trained to identify lane markings on both sides of the vehicle. When the
vehicle is on the move, the computer looks at the road ahead and predicts where
the car should be positioned relative to the lane markings. It then measures
where the vehicle actually is and if the difference is significant, the system issues
a warning.
Unfortunately, the current technology able to detect and prevent this kind of
accident is still very expansive. Fords system, for example, can cost from $2,000
to $12,000 for a single vehicle[7], making it very inaccessible to the average
household.

This project uses open source hardware and software to develop an inexpansive drowsiness detection system that can be easily reproduced and installed in
any car with total cost less then US$ 80.

Objective

This projects objective is to compile and express the knowledge obtained by the
student along his Computer Engineering undergraduate degree in University of
Campinas.
The undertake chosen was to produce a low cost driver drownsiness detection
embedded system that can be easily reproduced and installed on any car. To
achieve that, the project had four goals:
Develop an image recognition software to infere drowsiness state
Set an embedded systems hardware environment
Embed the developed software on the hardware
Optimize the systems performance
Measure the systems performance and accuracy

3
3.1

Materials and Methods


Approach

There are a couple known approaches to perform driver drowsiness detection,


three common are the eye tracking, the heartbeat monitoring and the driving
pattern monitoring, this paper will tackle the eye tracking approach, due to its
potential to provide consistency with low-cost equipment.
Research from the National Association of Sleep[5] relates the drowsiness
while driving with increase on the drivers eye blinking frequency. This relation
can be explored to develop a system that tracks the drivers eyes image and
analyzes it to determine its degree of sleepiness.
To do that, a computer vision software was developed. The software reads
video streaming input from a camera pointing to a persons face and track her
face and eyes to determine her sleepiness condition. At first, the software was
developed and tested in a desktop computer environment so it could be free
of performance constraints. After the software was proved to work correctly,
the hardware part of the embedded system was setup. With both software and
hardware tested and working correctly separated, the software was ported to
the hardware creating the embedded system. The embedded system at first
presented a weak performance capacity leading to poor results. To overcome
that, perfomance improvements techiniques on both software and hardware sides
were used. The system operation was then tested and the results compiled.

3.2

Face and Eye Recognition Software

The software developed for this system was writen in Python[8] and is shown in
Apendix 1 at the end of this report. Its objective is to detect the users eyes
open/closed state and use this information to infere drowsiness condition.
The software uses OpenCV[11] to perform face and eye recognition and defines a threshold to filter the obtained information. The Cascade Classification
is used to train the software to recognize a certain object using classifiers,
files with samples. The following quoted text from OpenCVs website explain
its action.
First, a classifier (namely a cascade of boosted classifiers working
with haar-like features) is trained with a few hundred sample views
of a particular object (i.e., a face or a car), called positive examples,
that are scaled to the same size (say, 20x20), and negative examples
- arbitrary images of the same size.
After a classifier is trained, it can be applied to a region of interest
(of the same size as used during the training) in an input image.
The classifier outputs a 1 if the region is likely to show the object
(i.e., face/car), and 0 otherwise. To search for the object in the
whole image one can move the search window across the image and
check every location using the classifier. The classifier is designed so
that it can be easily resized in order to be able to find the objects
of interest at different sizes, which is more efficient than resizing the
image itself. So, to find an object of an unknown size in the image
the scan procedure should be done several times at different scales.
The developed software uses two classification files openly distributed by
OpenCVs developers community:
Haarcascade frontalface default.xml for face recognition
Haarcascade eye tree eyeglasses.xml for eye recognition
The software operation works as an infinite loop. First, it captures a frame
from the camera and analyzes it looking for the users face. If it doesnt recognize
a face it captures another framework and analyzes it until it finds. Then the
software analyzes the face area of the frame looking for the users eyes. If it
recognizes two (or more) eyes on the face it increments a threshold variable up
to a maximum of ten, otherwise it decrements it down to a minimum of zero.
Then, if the threshold variable is above five, the software outputs that the user
eyes are closed infering drowsiness, otherwise it outputs that that they are open
and the user is awaken.
A flowchart ilustrating the algorithm is presented below:

Figure 1: Algorithm for drowsiness condition detection


To analyze the frame and perform face and eye recognition the software
uses OpenCVs detectMultiScale() function. Given an image, this function uses
the classification file and two parameters to detect a certain object in it. The
first parameter is a scale factor related to the object expected size. The second
parameter is a minimum number of neighbors, candidates for the expected
object, to characterize a recognition. The following images explain this concept
in more details.
For the first image, the minimum number of neighboors chosen was 1, and
the software recognized many faces in the picture. For the second image,
the minimum number of neighboors was 3, making the software consider only
objects with three or more neighbors leading to less faces recognized and a
more precise result. For the built software, the best value for the minNeighbors

parameter was 3, obtained through trial and error.

Figure 2: A detectMultiScale() call with minNeighbors parameter 1

Figure 3: A detectMultiScale() call with minNeighbors parameter 3

3.3

Hardware setup

In order to make this system a total embeddable device that can be easily
implanted on any car the system uses a Single-Board Computer (SBC) as its
core processing unit. Single-Board Computers are small, light and low-power
consumption computers that have powerfull computational capability. They
can run operational systems and control and communicate with other hardware
devices.
There are several SBCs available on the market with different processing
power, size, power consumption and prices. Some of the most common SBCs
on the market are displayed on table 1.
The Raspberry Pi is a popular SBC with a good cost-bennefit, however its
processing power leaves to be desired. The PandaBoard has a very good power
processing, but its hardware is not open source making it very expansive and
the systems overall price impractical. Freescales RIoTboard has a reasonable
processing power and price, but has no support for vector processing, as the
image processing of this system requires. The BeagleBone Black, from Texas
Instruments, is a low-power open source hardware SBC that costs a modest
price, has a reasonable processing power and vector processing support, fiting
the needs for this project.

SBC
Raspberry Pi
BeagleBone Black
PandaBoard ES
RIoTboard

CPU
Architecture
Cores
ARM11
1
ARM Cortex-A8 1
ARM Cortex-A9 2
Freescale i.MX6
1

Average Price
Frequency
700 MHz
1 GHz
1.2 GHz
1 GHz

US$
US$
US$
US$

35.00
58.00
180.00
79.00

Table 1: Comparison of Single-Board Computers1 [10]

The BeagleBone Black has a strong support for vector processing, its greatest
advantage is the Neon SIMD2 integrated processor block[9]. The SIMD provides
parallel operations, meaning that during the execution of one instruction the
same operation can occur on up to 16 data sets in parallel at the same time.
This is a great advantage for the necessary image processing required by the
computer vision software. The SBC BeagleBone Black was then chosen as the
core processing unit for the embedded system.

Figure 4: SBC BeagleBone Black


To capture image in order to track the drivers eyes a simple USB camera
was used, almost any available USB camera with VGA quality or better would
work. For this project, we chose Nipponics NIP-VC76P webcam, because its
a cheap device and it was available for this project. The average market price
for this camera is US$20.00. This camera has a VGA quality image and USB
2.0 communication.

Figure 5: USB camera used to capture streaming video


The hardware was then assembled. The camera was plugged into the SBCs
1 Prices
2 Single

quoted from the distributors websites on the publish date of this paper
Instruction Multiple Data

USB input and the SBC was connected to the power supply being fed with 5
volts and 1 ampere DC current. The final system is shown below.

Figure 6: Final system assembled

3.4

SBC environment

The open source Linux distribution Debian 7 was installed on the SBC. Debian
was chosen because it has a strong support community that maintains and fixes
many issues, leading to a very reliable and stable operational system, a required
property for this project system given the low fault-tolerance of the addressed
problem.
A serial communication through SSH protocol was then stablished between
the SBC and the desktop computer. The desktop computer had total access to
the SBCs super user, making it possible to install the necessary softwares, such
as drivers, compilers and interpreters.
The USB camera did not have a driver supporting the SBCs processor ARM
Cortex-A8 architecture, making the communication with the SBC a hard task
that was solved setting up low level system configurations.

3.5

Optimization

Even though SBCs have a much more powerfull processing capability then microcontrollers, they are normally still behind desktop computers, GPUs3 or
FPGAs4 . Therefore in order to execute the designed software, which requires
heavy processing, it was necessary to perform optimizations on the system. The
optimization took place on two fronts: the software and the hardware.
3 Graphics

Processing Unit
Gate Array

4 Field-programmable

On the software front, the frame size read as input from the camera was
shortened and the code used to interact with the SBC, such as monitoring the
processed image, removed. The shortest frame size that still gives concise result
obtained by trail and error was 160x120. That lead to a faster performance
due to the smaller number of pixels that had to be analyzed by the softare.
Removing the code used to interact with the software such as the code that
drawed rectangles for the recognized faces and eyes areas also made the software
faster for executing less instructions. At the beggining, an existent hypothesis
was that converting the image to shades of gray would make the analysis faster
due to less pixel color values, but it was confirmed false later.
On the hardware front the SBCs clock frequency was modified. BeagleBones default clock frequency operation is 300 MHz, which at first was clearly
below the project needs. To change that, the linux application cpufrequtils[12]
was used. This tool provides an interface to control the CPU frequency. The
SBCs clock frequency was then modified to run at a 1 GHz clock frequency, its
maximum safe clock speed according to Texas Instruments, the manufacturer.
With that, the hardwares processing power was substantially increased.

3.6

Measurement

To measure the systems performance the recognition time was measured varying
the inputs frame size. Furthermore, the softwares performance was compared
when running on the SBC and a desktop computer. The embedded systems
processor was the BeagleBones ARM Cortex-A8 and the desktop computers
R CoreTM i5.
the Intel
The measuring procedure was to calculate the time required to process the
eye detection, regardless of the output, after the face was detected. The time
was measured twenty times for each case and the standard deviation was taken.
To do that, the Python librarys function time.process time() was used. This
function is usefull to measure the performance because it takes into account
both system and user CPU time, as stated in the functions description from
Pythons website:
time.process time()
Return the value (in fractional seconds) of the sum of the system
and user CPU time of the current process. It does not include time
elapsed during sleep. It is process-wide by definition. The reference
point of the returned value is undefined, so that only the difference
between the results of consecutive calls is valid.

3.7

Testing

To measure the systems functionality four mutually exclusive and completely


exhaustive states were defined:
True Open: When the users eyes are open and the system detects it open
True Closed: When the users eyes are closed and the system detects it
closed
False Open: When the users eyes are closed and the system detects it
open
9

False Closed: When the users eyes are open and the system detects it
closed
These four states cover all the systems possible states.

Figure 7: True Open state

Figure 8: True Closed state

10

Figure 9: False Open state

Figure 10: False Closed state


To measure the systems accuracy on determining the users drowsiness condition, it was defined an efficiency ratio being:

Ef f iciency =

T rueOpen + T rueClosed
T rueOpen + T rueClosed + F alseOpen + F alseClosed

11

To classify each systems output into one of the four states the testing procedure was to use recorded videos instead of stream input and record the frame
time associated with each output so it could be validated by a human observer
later. Therefore, a one minute video was recorded with a person alternating her
eyes state between open and closed. Then, the software was executed and its
outputs recorded. Each output was associated with a videos time frame, after
the software finished its execution a human observer would check each output
and classify it as one of the four earlier defined states.
This measure procedure inserts a performance constrain to the software because it adds execution steps that makes the software slower and changes its
accuracy. However, it is still useful to compare the performance between the
SBC and the desktop, since it introduces constraints for both platforms. The
image below shows the softwares output associated with a time frame.

Figure 11: Softwares output associated with videos time frame

4
4.1

Results
Performance

The systems performance was measured varying the input video streams frame
size both on the SBC and the desktop computer, with a total of 4 different
frame sizes: 160x120, 240x180, 320x240 and 640x480. The measured results are
presented on Table 2 and ilustrated on Chart 1 below.

12

Frame Size
160x120
240x180
320x240
640x480

R CoreTM i5
Intel
0,012s 0, 009s
0,016s 0, 001s
0,04s 0, 01s
0,11s 0, 02s

ARM Cortex-A8
0,18s 0, 04s
0,33s 0, 05s
0,83s 0, 04s
2,18s 0, 07s

Table 2: Execution performance comparison between SBC and desktop computer

Figure 12: Performance analysis (in seconds)

4.2

Accuracy

The systems accuracy was measured as described on session 3.7. The procedure was performed with four different persons. For comparison reasons, the
measurement was also executed on the desktop computer, with a shorter period
of 20 seconds, due to the high number of outputs that the desktop computer
was able to emit. The results are presented in the tables below.
Test
1
2
3
4

True Open
86
71
61
69

True Closed
69
62
55
66

False Open
22
23
38
31

False Closed
15
20
31
24

Total
192
176
185
190

Table 3: Accuracy test for the embedded system (60 seconds period)

13

Efficiency
81%
76%
63%
71%

Test
1
2
3
4

True Open
102
101
103
92

True Closed
92
72
74
95

False Open
9
14
11
21

False Closed
11
8
15
13

Total
214
195
203
221

Table 4: Accuracy test for the detection software running on the desktop computer (20 seconds period)

Discussion

On the performance side, even though the instructions used to measure added
a performance constraint the output was still concise enough to provide meaningful data. The frame size increase affected the systems performance in a
quadratic realation. That happens because the frame size is a two-dimension
pixels matrix and when the size vary the number of pixels vary quadratically,
so does the necessary processing.
Another clear behavior that can be observed is the performance difference
between the SBC (ARM Cortex-A8) and the desktop computer (Intel Core i5).
On average, it can get around 20 times faster on the desktop. That happens due
to the difference on the architecture nature, the desktop for example, has a much
more powerfull processor and hardware such as GPUs to perform graphic processing, the SBC on the other hand, has less processing capability and graphic
processing support.
The embedded system would present a better perfomance if the hardware
used were a more powerfull or specific device, such as FPGA or GPU. This type
of hardware is normally more expansive then an SBC, but it is still possible
to find a simple reasonable priced GPU or FPGA that can perform better not
making the systems cost unpractical.
A hardware approach that could potentially improve the systems performance considerably would be to overclock the SBC above the used 1GHz. With
this frequency, the board was not even warm yet, giving signals that with the
right cooling techniques it could reach a much higher clock frequency. Unfortunatel, with the low budget for this project, it was not possible to risk losing the
SBC by performing an overclock on it.
A software approach that could potentially improve the performance would
be to write the embedded code in a more low level language, such as assembly.
This approach would make the instructions be executed faster, due to the efficiency difference between high and low level languages. This approach was not
taken due to time constraint reasons for the project development.
On the accuracy side, we can observe from the two tables the link between
the systems performance and its accuracy efficiency. This happens because
when the image processing gets too slow the system skip frames and use less information to determine the output, leading to more mistakes. Therefore having
a more powerfull performance on the embedded system, on both hardware and
software sides, would provide more accurate results.

14

Efficiency
91%
89%
87%
85%

With the results obtained, we had a 72,75% accuracy efficiency. This is still
too small for this application, once it deals with peoples lives. Furthermore, the
system was developed not taking into consideration conditions such as changes
in the drivers head position, drivers with different height and the different light
conditions faced by a car.
A big potential for perfomance improvement was identified. Even thought
the system presents humble results on accuracy, it shows great potential, once
it was build with a budget less then US$80.00, in contrast with the current
markets systems price on the range of $2,000 to $12,000.

Conclusion

This project summarizes the knowledge obtained by the student during its Computer Engineering undergraduate degree at University of Campinas. Knowledge
acquired from many fields of study in computer engineering were used. The fields
that mainly provided the necessary skills to produce this project were:
Computer Vision
Image Processing
Embedded Systems
Computer Architecture
Computer Network
Computer Vision knowledge was used to understand how to develop a software to detect drowsiness condition by analyszing the users face image. Image
Processing knowledge was used to optimize and improve the detection software.
Embedded System knowledge was used to build and setup the embedded systems hardware, to develop the embedded software taking into consideration
the systems constraints, and to combine both into an integrated system. Computer Architecture knowledge was used to define the best fit hardware for the
application and to set it up and optimze its performance. Computer Network
knowledge was used to understand how to develop network communication in a
SBC using protocosl to set its environment and communication up.
The project demonstrated to be a product not yet ready to be used for
real life application. However, it presented promising results showing its strong
potential. Some conditions are still needed to be taken into account on its
operation and more testing techniques must be performed to comprove its real
capability of providing a working system, but the large space for improvement
identified showed the prospective of providing a working system.
This project synthesizes the students computer engineering undergraduate
degree for combining both software and hardware skills to propose a solution
for a problem that affects society directly. Furthermore, the project proposes a
low-cost technology solution, providing a wider household access to it.

15

References
[1] DENATRAN - Departamento Nacional de Transito
http://www.denatran.gov.br
[2] ONSV - Observat
orio Nacional de Seguranca Viaria
http://www.onsv.org.br
[3] CEBELA - Centro Brasileiro de Estudos Latinos Americanos
http://www.cebela.org.br
[4] CEMSA - Centro de Estudo Multidisciplinar em Sonolencia e Acidentes
http://www.cemsa.com.br
[5] FUNDASONO - Fundac
ao Nacional do Sono
http://www.fundasono.org.br
[6] Fords Driver Alert
http://www.euroncap.com/rewards/ford driver alert.aspx
[7] Sleep Review: Drowsy Driving Technologies
http://www.sleepreviewmag.com/2013/11/alert-alive
[8] Python
https://www.python.org/
[9] Cortex-A8
http://processors.wiki.ti.com/index.php/Cortex-A8
[10] SBC Comparison
http://en.wikipedia.org/wiki/Comparison of single-board computers
[11] OpenCV
http://docs.opencv.org/modules/objdetect/doc/cascade classification.html
[12] CPU frequency scaling
https://wiki.archlinux.org/index.php/CPU frequency scaling

16

Drowsiness Dettection Software

import numpy a s np
import cv2
f a c e c a s c a d e = cv2 . C a s c a d e C l a s s i f i e r ( h a a r c a s c a d e f r o n t a l f a c e d e f a u l t . xml )
e y e c a s c a d e = cv2 . C a s c a d e C l a s s i f i e r ( h a a r c a s c a d e e y e . xml )
count =0
cap = cv2 . VideoCapture ( 0 )
cap . s e t ( cv2 . cv . CV CAP PROP FRAME WIDTH, 1 6 0 )
cap . s e t ( cv2 . cv . CV CAP PROP FRAME HEIGHT, 1 2 0 )
w h i l e ( True ) :
r e t V a l , img =cap . r e a d ( )
f a c e s = f a c e c a s c a d e . d e t e c t M u l t i S c a l e ( img , 1 . 3 , 3 )
f o r ( x , y , w, h ) i n f a c e s :
r o i c o l o r = img [ y : y+h , x : x+w ]
eyes = eye cascade . d e t e c t M u l t i S c a l e ( r o i c o l o r , 1 . 1 , 7)
i f ( l e n ( e y e s ) >= 2 ) :
i f ( count < 1 0 ) :
count += 1
else :
i f ( count > 0 ) :
count = 1
i f ( count >5):
p r i n t open
else :
print closed

17

You might also like