You are on page 1of 5

Adaptive Security System using Biometric Technolgy

Abdalla I.Abrwais, Jalal Sarar


Dept. of Electrical and Electronic Engineering,
Misurata University,
Misurata, Libya

Matlab is presented in [6]. In [7] the author presents a new


design method that can be used to identify the speaker's voice
using the extraction of the MFCCs from wavelet transform of
the voice.

AbstractSpeech recognition systems are made to process


wards of system's admin speaker to execute specific tasks such as
remote computer controlling. This procedure makes it possible to
use the speaker's voice to verify his/her identity. Many papers
present different ways to implement speech recognition systems,
however, in this paper, we propose a simple method to enhance
and implement an adaptive biometric recognition system. This
system uses three algorithms namely the Mel Frequency Cepstral
Coefficient (MFCC), the fast Fourier transform (FFT) and
discrete wavelet transform (DWT) to extract features from
speaker's voice which will be used later as data history to modify
the extracted features of the same user when he/she uses the
system next time. Results show that this new method enhances
the recognition system compared with other methods such as
MFCC Algorithm.

Input
Voice
Speaker
Model

Futures
Extraction
Ref
Model

Keywords: Arduino, MFCC, biometrics, FFT.

Decision

Accept
Or
Reject

Fig.1 Speaker Verification [4]

I.INTRODUCTION

In this paper, a different approach in proposed to enhance the


VRS. The approach makes use of data history to update the
extracted features of the speaker in a manner that related to the
time schedule of the speaker. For example if the speaker has
uses the system at morning between 8 and 9 AM every day. In
this case, the system will take this time and how many times in
his consideration as one of the features. Moreover, practical
implementation of the proposed algorithm has been
implemented by means of a microcontroller named
ATimeg128. This new technique can be used in high security
systems such as banks and governmental agencies.

The human body contains a set of biometric identities. The


structure of the vocal tract is unique for every person.
Therefore, this voice information could be used for design
security systems. However it still has many technical voice
recognition problems [1].
voice recognition is a broad subject, mainly problems that
make the commercial and personal use are rare in general such
as is the impact of noise [2]. Voice recognition is the process
by which a system identifies voice of person and spoken
password words. Voice recognition can split into two broad
types [3]:

The rest of the paper is organized as follows. In Section II,


the methodology of the proposed VRS is presented. This is
then followed by a description of the wavelet algorithm in
Section III. The feature extraction using the proposed VRS is
analyzed with the description of some simulation results in
Section IV. In Section V, the circuit implementation using the
"ATmeg228" microcontroller is presented. Results obtained
from computer simulations and micro-controller under some
specific conditions are presented in Sections VI. Finally,
Section VII concludes the paper.

1. Text dependent is about the keywords of the phrases for


the voice recognition.
2. Text independent is not specific on the text being said
and is more flexible.
Usually any recognition system are composed from the main
blocks that shown in Fig. 1. All voice recognition systems
(VRS) depend on the features that extracted from the voice.
These features are then compared with a list of saved
reference features. If the result in within an acceptant range of
the reference template, the voice is accepted, otherwise it
rejected [4].

II.METHODOLOGY
The research methodology adopted in this work can be
graphically described in block diagram shown in Fig. 2. As
shown in Fig.2, the first step is the preprocessing step which
detect the silence period of the input voice. To do so, a
multiple level DWT is used to calculate the different threshold
levels. This step is very important step in any front end VRS
and it will be explained in more details in section IV.

In General VRS have to serve two different phases. The first is


referred to the training phase while the second is the testing
phase [5].
Many works have been introduced in the literature that focuses
on the implementation and enhancement of VRS. In [5], the
author implements a VRS Using Mel Frequency Cepstral
Coefficients (MFCC) and Discrete Wavelet Transform (DWT)
algorithms. An algorithm is studied and verified by means of
1

Ref. Model
Input voice
output

Futures
Extraction

Preprocessing

Silence detection

Matching

FFT

Windowing

Mel-frequency warping

Threshold
Modified

Data History
Mallat's DWT

Fig. 2 Process of adaptive voice recognition system

Mallat's DWT
DWT

output
Fig. 4 block diagram of the logarithms

The Second Step that comes after the preprocessing step is a


feature extraction step in which FFT, Mallat's DWT and
MFCC are used to extract the voice features. Finally, a
comparison between the extracted features and the reference
model is made to decide wheatear the voice is accepted or
rejected. This decision is also stored in memory, so next time,
the system will take the information of how any times he/she
tries to login and at what time he/she tries to make this
decision. This step will make the decision more accurate than
before.

. System use DWT to calculate the power of noise to four


signals for one person, then the maximum power is used to
start point out the silent detection. For example, let the word
"YES" as an the input to the system shown in Fig.4. By
comparing the time domain analyses of time ward before the
silence detection stage, as show in Fig.5, and after it, as shown
in Fig.6, we notice that the silent detection stage is shifted to
the left.
NO word
1

III.WAVELET THEORY

Normalized amplitude

0.5

The fundamental idea behind wavelets is to analyze signal


according to scale. It has gained a lot of interest in the area of
signal processing, numerical analysis and mathematics during
recent years [7,8]. The wavelet transform is an advanced
technique of signal analysis. It was developed as an alternative
to the short time Fourier transform [9] to overcome problems
related to its frequency and time resolution properties. The
standard DWT multi-resolution lter bank is illustrated in
Fig.3.

-0.5

-1

1000

2000

3000

4000

5000
Samples

6000

7000

8000

9000

10000

8000

9000

10000

Fig. 5 the word "YES" signal.

It consists of recursively applied high pass lters for


generating details and low pass lters for generating
approximations of the input signal. In the DWT the output of
each lter is down sampled by a factor of two before being
passed to the next level in the multi-resolution analysis. This
process produces DWT coefcients [9].

yes word
1
0.8

Normalized amplitude

0.6
0.4
0.2
0
-0.2
-0.4
-0.6

Signa
l

-0.8
-1

1000

2000

3000

4000

5000 6000
Samples

7000

Fig. 6 the word "YES" after Silence Detection.

Fig.3 DWT transform

Second, hamming window for each individual frame is


applied. So as to minimize the signal discontinuities at both
the beginning and at the end of the frame. The concept here is
to minimize the spectral distortion by using the window to
taper the signal to zero at the beginning and the end of the
frame. Fig. 7 shows the results of the windowing process
when the word "YES" is used.

IV.FEATURE EXTRACTION
A block diagram of the structure of logarithms processor is a
shown in Fig. 4.
First, the input voice signal is applied to the silence detection
stage, which is very important in any front end speaker
recognition system. This stage, basically started uttering the
word.

V.CIRCUIT IMPLEMENTION

0.6

To test the algorithm presented in Fig. 4 a microcontroller


named "ATmeg228" is used with the aid of MATLAB
program. The whole circuit board used in this in experiment is
shown in Fig. 9. Microcontroller is used for automatically read
the output data according to the condition of MATLAB
software. After the voice input is accepted as an admin user,
the microcontroller there will activate the LCD display to
display WELCOME, green LED indicator will turn on, and
the magnet door lock will open. At the meanwhile, if the voice
input is rejected and the system determine as the impostor
user, microcontroller will activate the buzzer to turn on, red
LED indicator will turn on, LCD display will display
SORRY and the magnet door lock remain lock. Microphone
is used to record the user voice.

0.5

Normalized amplitude

0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4

1000

2000

3000

4000

5000
6000
Samples

7000

8000

9000

10000

Fig.7 the word "YES" after Windowing using a Hamming Window.

Third, after the windowing process, the signal in converted to


the frequency domain, X(), by means of Fast Fourier
Transform (FFT). The results of this stage is shown in Fig. 8
for the word "YES".
80
70

Normalized amplitude

60
50
40
30
20
10
0

Fig.9 The Circuit implementation


0

1000

2000

3000

4000

5000 6000
Samples

7000

8000

9000

10000

In this paper, we designed a GUI used for programming. Fig.


10 shows training window, in this window we can select the
ID number and we can recorded password word four times to
calculate threshold value that we needed in the silent detection
stage. Then the program will calculate feature extraction
parameter and it will save them in as a data base. Figure 11
shows testing window, in this window, each user will enter
his/her ID number, then the system will ask to hear the
password. After that the acceptance or rejection will be shown
on the screen.

Fig. 8 the word "YES" after The Fourier Transform.

As mentioned earlier, psychophysical studies have shown that


human perception of the frequency contents of sounds for
speech signals does not follow a linear scale. Thus for each
tone with an actual frequency, f, measured in Hz, a subjective
pitch is measured on a scale called the mel scale. The mel
frequency scale is a linear frequency spacing below 1000 Hz
and a logarithmic spacing above 1000 Hz. As a reference
point, the pitch of a 1 kHz tone, 40 dB above the perceptual
hearing threshold, is defined as 1000 mels. Therefore we can
use the following approximate formula to compute the mels
for a given frequency f in Hz [8]:
Mel(f) = 2595 * log(1+ f / 700 )

(1)

The resultant coefficients are called the Mel Frequency


Cepstral Coefficients (MFCC). Now, we can convert them to
the time domain using the Discrete Cosine Transform (DCT)
and the energy within each triangular window is obtained and
followed by the DCT to achieve better compaction within a
small number of coefficients and results are known as MFCC.
Finally we use the DWT to filter out the noise and save only
the low pass component in a reference file.

Fig. 10 Training window

VI. SIMULATION AND IMPLEMENTATION RESULTS


The following test results were obtained by simulating the
algorithm based recognition approach on MATLAB and from
the micro-controller "ATmeg228. Once the correct voice is
recognized, a window similar to that in Fig. 11 will appear
with the correct admin information, i.e., name and ID number.
Actually, from the entries the VRS will indicate the Mean
Square Error (MSE) obtained for admin who tested the
System. A threshold MSE of 1 was set for this experiment.
The experiment is repeated to recognize the admin voice for
100 times. Table 1 shows some of the results of this
experiment. The system failed to recognize administrators
voice for five times. Thus our study able to muster a 95%
accuracy for this particular voice recognition system.

97
98
99

0.9753
1.5148
0.1110

Welcome
Reject
Welcome

100

0.5141

Welcome

From Tables 1 and 2, the system has proved successful,


reaching accuracy to 95%, as the system still have a small
percentage of error of around 5% depending on the data stored
in each time. In Table 1 the MSE for user 8 was 1.024 but the
system gave acceptance based on historical data to show that
this person is usually used the system 3 times a day.
Table 2. Results of 5 different admins data base used to
vary the system security
voice
recorder
imposter A

years old
25

number of total
tries
100

number of
accepted tries
95

imposter B

20

imposter C

16

20

19

imposter D

34

15

13

imposter E

45

17

16

imposter F

24

50

CONCLUTION

In this paper a new VRS scheme is proposed, tested and


verified by means of microcontroller. The voice recognition
algorithm is developed by using DWT method to extract the
silence detection and feature of the voice signal. The extracted
information being used in the proposed VRS are the voice
features as well as the timing table of such user. The reference
information is being stored in training phase and compare with
the captured information in the testing phase to match both
results and entering times. If the system is successfully
recognize the authenticate users information, he/she will be
accepted, otherwise rejected. Therefore, the output results are
divided into two categories which are accepted and rejected. If
accepted, the micro-controller will activate the magnet door to
unlock. If the output is rejected, the micro-controller will
remain the magnet door as lock and the buzzer will alarm for
less than 1 second.

Fig. 11 Testing window

The same experiment has been repeated for different


imposters shown in Table 2, which shows the old of each
imposter, the total number of tries and the number of times
that the user succeeded entering the system.
Table 1. Results obtained from testing accuracy
of the proposed VRS
ID

Admin's MSE

Result

0.8166

Welcome

0.8575

Welcome

0.9105

Welcome

0.7185

Welcome

0.3445

Welcome

0.6574

Welcome

7
8
9

0.3493
1.0240
0.4610

Welcome
Welcome
Welcome

10

0.4566

Welcome

11

0.7713

Welcome

12

0.2649

Welcome

.....

REFERENCES
[1] Wei Han, Cheong-Fat Chan, Chiu-Sing Choy and Kong-Pang Pun, "An
Efficient MFCC Extraction Method in Speech Recognition", IEEE
ISCAS, 2006.
[2] De Krom G., "Consistency and reliability of voice quality ratings for
different types of speech fragments", J Speech Lang Hear Res., Oct.
1994.
[3] Campbell JP., "Speaker recognition: a tutorial", Proceedings of the IEEE,
85(9):143762, Sep., 1997.
[4] Shrawankar U, Thakare VM., "Techniques for feature extraction in speech
recognition system: a comparative study", International Journal of
Computer Applications in Engineering, Technology and Sciences
(IJCAETS), 41218, 2013.
[5] J. Xie and S. Jiang, A simple and fast algorithm for global k means
clustering, in the 2nd international workshop of Education Technology
and Computer Science (ETCS), vol. 2, pp. 3640, 2010
[6] A. Katsamanis, G. Papandreou, and P. Maragos, " Face Active Appearance
Modeling and Speech Acoustic Information to Recover Articulation",
IEEE Transactions on Audio, Speech, and Language Processing, Vol. 17,
No. 3, pp.411-422, 2009.

[7] Bhupinder Singh, Rupinder Kaur, Nidhi Devgun, Ramandeep Kaur, The
process of Feature Extraction in Automatic Speech Recognition System
for Computer Machine Interaction with Humans: A Review, IJARCSSE,
vol. 2, Issue 2, Feb. 2012.
[8] Muda L, Begam KM, Elamvazuthi I., "Voice recognition algorithms using
Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time
Warping (DTW) techniques", Journal of Computing, vol. 2(3):138143,
2010.
[9] Nearey TM., "Speech perception as pattern recognition", J Acoust. Soc.
Am., 101(6):324154, 1997.