System AppendPDF Proof Hi

Journal of Experimental & Theoretical Artificial Intelligence
Fo
Perspective Neural Network Algorithms for Dynamic

rP
Biometric Pattern Recognition in the Space of
Interdependent Features
ee
Journal: Journal of Experimental & Theoretical Artificial Intelligence

rR
Manuscript ID TETA-2018-0078
Manuscript Type: Original Article

ev
statistical functionals, handwriting signature features, voice parameters,

Keywords: "wide hybrid" neural networks, small training sample, informativeness of
biometric features
iew
Note: The following files were submitted by the author for peer review, but cannot be converted to
PDF. You must view these files (e.g. movies) online.
On
Figures.zip
ly
URL: http://mc.manuscriptcentral.com/teta
Page 1 of 35 Journal of Experimental & Theoretical Artificial Intelligence
1
2
3
4
Perspective Neural Network Algorithms for Dynamic Biometric
5
6
Pattern Recognition in the Space of Interdependent Features
7
8
9
A.E. Sulavkoa, S.S. Zhumazhanovaa*
10
11 a
Federal State Educational Institution of Higher Education "Omsk State Technical
12
13 University", Radio Engineering Faculty, Omsk, Russia
14
15
16 Correspondence: e-mail: samal_shumashanova@mail.ru
17
18
Fo
These authors contributed equally to this work.

19
20
21
rP
22
23
24
ee
25
26
27
28
rR
29
30
31
ev
32
33
34
iew
35
36
37
38
39
40
On
41
42
43
44
ly
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Journal of Experimental & Theoretical Artificial Intelligence Page 2 of 35
1
2
3
4
Perspective neural network algorithms for dynamic biometric pattern
5
6
recognition in the space of interdependent features
7
8
9 Abstract: A model of neurons for biometric authentication, capable of efficient
10 processing of highly dependent features, based on the agreement criteria (Gini,
11
12 Сramer-von-Mises, Kolmogorov-Smirnov, the maximum of intersection areas of
13
probability densities) is proposed. An experiment was performed on comparing
14
15 the efficiency of neurons based on the proposed model and neurons on the basis
16
17 of difference and hyperbolic Bayesian functionals capable of processing highly
18
Fo
dependent biometric data. Variants of construction of hybrid neural networks,

19
20 that can be trained on a small number of examples of a biometric pattern (about
21
rP
20), are suggested. An experiment was conducted to collect dynamic biometric

22
23 patterns, in the experiment 90 people entered handwritten and voice patterns
24
during a month. Intermediate results on recognition of subjects based on hybrid
ee
25
26 neural networks were obtained. Number of errors in verification of a signature
27
28 (handwritten password) was less than 2%, verification of a speaker by a fixed
rR
29 passphrase was less than 6%. The testing was carried out on biometric samples,
30
31 obtained after some time period after the formation of training sample.
ev
32
33
34 Keywords: statistical functionals, handwriting signature features, voice
iew
35 parameters, "wide hybrid" neural networks, random variable distribution law,

36
37 small training sample, informativeness of biometric features.
38
39
40 Introduction
On
41
42
43 In recent years, there is a struggle for the reliability of biometric systems of personality
44
ly
45 recognition. Interest in solving the problem of increasing the reliability of biometric

46
47 authentication is caused by high financial losses from cybercrime operations around the
48
49
50 world. The available estimates of these financial losses reach 375-575 billion USD per
51
52 year [1]. The market of biometric protection means is increasing very rapidly. Studies in
53
54 this area are also progressive, there are all new ways for person recognition: gait and
55
56
57
gestures [2], parameters of electroencephalograms [3], finger’s pressing force on each
58
59 key during password phrase typing [4]. Searching for new approaches is came due to
60
1
2
3 the drawbacks of traditional authentication technologies. Many researchers are working
4
5
6
to combine the benefits of biometric systems and password protection.
7
8 The password-based protection method is vulnerable, because the password is
9
10 alienated from its owner. In a sense, secret dynamic biometric patterns of voice,
11
12
keystroke and handwriting dynamics of subjects can be considered as analogues of the
13
14
15 password. These patterns can contain a secret that can be pronounced, typed or written.
16
17 This specialty potentially multiplies their protective properties and determines the
18
Fo
19 perspectives for their research. Secret patterns are more difficult to falsify than static
20
21
rP
22 patterns (fingerprint, etc.). Static patterns can not be hidden, but can be copied. From
23
24 the security point of view, an open biometric pattern can only be used for identification
ee
25
26 purposes, authentication process requires the recognition of the secret pattern. At the
27
28
rR
29
same time, in practice the use of secret biometric pattern is constrained by the lack of
30
31 reliability of made decisions. It is required to increase the probability of correct subject
ev
32
33 recognition to the level reached for the fingerprint.
34
iew
35
The complexity of this scientific task is due to the fact that most of the dynamic
36
37
38 biometric parameters are correlated and low-informative. A dynamic biometric pattern
39
40 contains significantly less information about its owner than the static one. It is required
On
41
42 to find a pattern recognition method that can work with low-informative correlated
43
44
ly
45 biometric features and be trained on a small number of examples. Very promising

46
47 mathematical apparatus, in this case, are hybrid neural networks with a small number of
48
49 layers, called "wide" neural networks.
50
51
52
The present study is devoted to the development of perspective neurons that can
53
54 be used to build rapidly learning neural network machines in the tasks of dynamic
55
56 biometric patterns verification.
57
58
59
60
1
2
3 1. The main directions of research in world science
4
5
6 Real practice and standards in the field of information security (set of GOST R 52633,
7
8 ISO / IEC 24745: 2011, ISO / IEC 24761: 2009, ISO / IEC 19792: 2009) require
9
10
11
protection of biometric templates from compromise while them stored and transmitted
12
13 through communication channels. There are two approaches for human recognition,
14
15 allowing to protect his/her biometric pattern from recovery: "fuzzy extractor" and
16
17
artificial neural networks (ANNs). Regardless of an approach, the reliability of the
18
Fo
19
20 pattern verification procedure is determined by the probability (or percentage) of errors
21
rP
22 of the 1st and 2nd kind - False Rejection Rate (FRR) and False Acceptance Rate (FAR),
23
24 respectively (with FAR = FRR it’s meant about the Equal Error Rate (EER)).
ee
25
26
27 Recognition can be performed in verification mode (one-to-one comparison), and
28
rR
29 identification (one-to-many comparison) one. The verification mode is used when

30
31 implementing the biometric authentication procedure.
ev
32
33
34
“Fuzzy extractors” are based on the use of antinoise coding with respect to
iew
35
36 biometric data to compensate for errors resulting from the inability to accurately
37
38 reproduce the biometric pattern [1, 5, 6]. The approach has many fundamental
39
40
On
drawbacks [7], as a result its modifications are ineffective when verifying signature
41
42
43 patterns (FRR= 4.5%, FAR=1.5% [1]), voice (FRR=FAR=20% [5]) and keystroke
44
ly
45 dynamics (FRR=10.4%, FAR=2.1% [6]).

46
47 The approach based on ANN is more promising. Despite all the variety of
48
49
50 artificial neural networks construction, their models have some common characteristics:
51
52
53 (1) Ability to learn leading to the improvement in the quality of the problem
54
55 solution.
56
57
58
(2) Presence of interconnected computational elements.
59
60
1
2
3 Particular attention is currently being paid to the "deep learning market", key
4
5
6
players of which are NVIDIA Corporation, Intel Corporation, Google, Inc. and
7
8 Microsoft Corporation. Companies are investing in creating deep learning opportunities
9
10 for penetration of the technology into their products. According to the data of Grand
11
12
View Research the global deep learning market value was estimated at 272.0 million
13
14
15 USD in 2016. In November, 2016, GE Healthcare announced a partnership with the
16
17 University of California in San Francisco to develop a library of deep learning
18
Fo
19 algorithms to help clinicians in more accurate and effective procedures for diagnosing
20
21
rP
22 and treating patients. In many problems, the idea of deep learning has really been
23
24 brought to effective practical solutions (for example, speech recognition).
ee
25
26 Many researchers of biometric methods of information protection also follow
27
28
rR
29
the development of deep learning networks (convolutional ANNs) [8]. First time, the
30
31 term "deep learning " was used in 1986 after the appearance of R. Dechter’s work [9].
ev
32
33 But multilayer neural networks and the algorithm, using backpropagation for their
34
iew
35
training, are offered by Galushkin in 1974 [10]. In 2002, Jeffrey Hinton improved the
36
37
38 algorithm, applying Boltzmann machines to learn the lower layers of neurons [11]. This
39
40 approach formed the basis of modern "deep" ANNs. To date, "deep" learning is usually
On
41
42 understood as the iterative tuning of multilayer forward propagation neural networks, in
43
44
ly
45 which the "back propagation error" algorithm is used in one form or another. In the
46
47 most general case, it has 2 implementations: batch or stochastic gradient descent (in the
48
49 second case, simulation annealing, genetic and other algorithms are used to optimize
50
51
52
learning) [12]. So far, there is no real competitive product for biometric authentication
53
54 based on "deep" networks. In real working conditions such product is extremely
55
56 difficult to train because of the following reasons:
57
58
59
60
1
2
3 (1) Iterative learning algorithms lose stability when the ANN structure becomes
4
5
6
more complicated. In particular, it is not possible to configure neurons with a
7
8 large number of inputs. The more inputs the neuron has, the less influence each
9
10 of these inputs has on its output. There are so-called "false" local quality
11
12
maxima. In [13] this effect is called the "blindness" of the training machine.
13
14
15 (2) To learn ANN, using classical algorithms, in the presence of several layers it’s
16
17 required a large training sample (hundreds and thousands of examples) [7]. This
18
Fo
19 is not enough in biometric applications because the system should be guaranteed

20
21
rP
22 to be trained on only a few examples (preferably about 10-20).

23
24 (3) To speed up and improve the quality of multilayered ANN learning, we have to
ee
25
26 get rid of low-informative and correlated features. However, many dynamic
27
28
rR
29
biometric patterns predominantly consist of such features and almost do not
30
31 contain highly informative uncorrelated features. The need to exclude most of
ev
32
33 the features leads to a lack of effectiveness of recognizing machine. All of that is
34
iew
35
manifestations of one more general problem, also called the "curse of
36
37
38 dimension" [7, 13, 14].
39
40 (4) The backpropagation has the exponential computational complexity, which does
On
41
42 not allow it to be implemented on a weak processor without remote connection
43
44
ly
45 to a server [7]. A secret biometric pattern has not to be sent over the network,
46
47 thus it is possible to violate the requirement of GOST R 52633.0 - not to
48
49 compromise the biometric pattern.
50
51
52
53
Any change in the "deep" ANN (and in general multi-layer ANN trained by the
54
55 iterative algorithm) leads to a loss of learning stability, and an increase in the number of
56
57 layers or neurons leads to an increase in the required volume of the training sample. To
58
59
date, there is no method for ANN optimization that allows to train ANN by
60
1
2
3 backpropagation with the use of 20-30 examples of dynamic biometric patterns.
4
5
6
Therefore, a more correct approach in this task is the use of "wide" ANNs, which have
7
8 several advantages:
9
10
11 (1) High work speed allowing to implement these algorithms on a low-performance
12
13
computing device.
14
15
16 (2) The possibility of using absolutely stable training procedures on a small number
17
18
Fo
of examples, regardless of the complexity of the ANN (training is performing

19
20 layer by layer, each neuron is trained independently of the other neurons of the
21
rP
22
23 network).
24
ee
25 (3) High potential for increasing the reliability of decision-making. When building
26
27 "wide" networks, it is decided to abandon the backpropagation method in any
28
rR
29
30
form. As a result, that network can consist of neurons which are based not only
31
ev
32 on different activation functions, but also on any functionals. For comparison,

33
34 the weighted summation functional (1) is always used in the perceptron:
iew
35
36
37 q
38 y    ja j ,
39 j 1
40
(1)
On
41
42
43 where aj is the value of the j-th feature, q is the number of features processed by
44
ly
45 the functional (dimensionality of the functional), and μj is the input neuron’s

46
47 weight associated with the j-th feature (j-th input). Many functionals process
48
49
50 data much more efficiently than perceptron adder (summation) and are able to
51
52 work with highly correlated data.
53
54
55 Fast non-iterative algorithm for learning and testing perceptrons with small
56
57
58 number of layers ("wide" ANNs) was first proposed in Russia several years ago to solve
59
60 problems of biometric authentication. The algorithm generates a whole class of
1
2
3 potentially more effective methods of hybrid ANNs learning, consisting of a large
4
5
6
number of neurons and a small number of layers ("wide" hybrid ANNs). These
7
8 networks have common features with radial-basis function networks, but are more
9
10 flexible. Such a network can have cross-links and consist of neurons based on different
11
12
functionals and activation functions.
13
14
15 Using "wide" single-layer ANNs on the basis of single functional, it is possible
16
17 to achieve much more acceptable results in the verification of biometric patterns
18
Fo
19 (keystroke dynamics - FRR=5,5%, FAR=6.9%, handwriting dynamics - FRR=1.2%,

20
21
rP
22 FAR=0.8%) [15] The result can be significantly improved using full potential of these
23
24 networks.
ee
25
26
27
28 2. Particularity of hybrid "wide" ANNs construction
rR
29
30
Studies show that a "wide" network needs to be built from neurons which are based on
31
ev
32
33 different functionals F which should be selected based on the correlation between
34
iew
35 features. Feature is a concrete attribute characterizing patterns (classes) having a certain

36
37 physical meaning. For biometric applications, a feature is a specific biometric
38
39
40 parameter. To obtain the vector of features’ values ā = {a1, a2, ..., aN}, the sample of
On
41
42 biometric data entered by a subject is analyzed by certain methods depending on the
43
44
ly
type of biometric pattern, where N is the total number of features of the pattern. Further,
45
46
47
the features are entered on the inputs of the first layer’s neurons of the "wide" network,
48
49 but the synapses are not created in a random way, but depending on the correlation
50
51 between the features, and also regarding the features’ informativeness.
52
53
Hybrid "wide" ANN surpasses the reliability of a "wide" network of the same
54
55
56 type, including classical neurons. Increasing the number of its layers makes sense, if
57
58 different kinds of neurons are consistently used to enrich the input data about the
59
60
1
2
3 pattern. Fig. 1 schematically represents one of the version for neural network
4
5
6
construction.
7
8 With the increase of number of "wide" ANN’s neurons, the complexity of
9
10 training does not increase, which is extremely important. Any layer of a wide artificial
11
12
neural network is built based on the idea that the neurons can make errors but it is
13
14
15 desirable for them to make different errors. A neuron quantizes a value at the functional
16
17 output, converting it into 0 or 1 (or in a binary code if it has several outputs, this
18
Fo
19 research does not consider such cases), depending on approving (disapproving) a certain
20
21
rP
22 hypothesis. A final decision about ownership of an input pattern to the reference pattern
23
24 (user’s template) is taken based on the Hamming distance (or its weighted version) from
ee
25
26 a binary code generated by the artificial neural network to a certain correct value of the
27
28
rR
29
code (containing all zeros, for example, or equal to a personal access key or a personal
30
31 digital signature). Thus, a growth of a number of neurons in a layer cannot increase a
ev
32
33 number of errors, it can increase the time of decision-making linearly. However, if
34
iew
35
neuron outputs are strongly correlated, it does not worth growing a number of neurons.
36
37
38
39 3. On informativeness and correlation of dynamic biometric features
40
On
41
42 Dynamic biometric features include the features of a password or signature handwriting,
43
44
ly
a voice password, gait, typing a phrase on a keyboard, and etc. In the present work, we
45
46
47
will consider in detail only voice and handwritten patterns.
48
49 The handwritten pattern is a time dependent function of pen tip coordinates x(t),
50
51 y (t), pen tip pressure p(t) and pen tip velocity vxy(t). Before the training and recognition
52
53
procedures, each pattern is transformed into a vector of the following features’ values:
54
55
56 energy-normalized amplitudes of p(t) and vxy(t) functions’ harmonics (corresponding to
57
58 the frequencies of signers’ hand gestures - about 0.1-10 Hz); correlation coefficients
59
60 between x(t), y(t), p(t); the distance between certain points in three-dimensional space
1
2
3 (the third dimension is p(t)), the characteristics of pattern’s appearance; Daubechy's
4
5
6
wavelet transform coefficients for the basis D6 of the functions p(t) and vxy(t) (analyzed
7
8 frequency range is 0.1-10 Hz), and others). The method of their calculation is described
9
10 in detail in [1]. The total number of features was 335.
11
12
A voice password is a speech sampled signal, the size of one sample of which
13
14
15 was 16 bits (with a lower quality, there were loss of features’ informativeness), and the
16
17 sampling rate was 8000 Hz. The most informative frequency of the main tone (for men -
18
Fo
19 80 to 210 Hz, for women - 150 to 320 Hz), also information about the speaker is present
20
21
rP
22 in the overtone (up to 4000 Hz). By Kotel'nikov's theorem: the coding of a continuous
23
24 signal of a certain frequency without loss is possible when it is sampled at twice
ee
25
26 frequency, i.e. 8000 Hz. Two methods were used to calculate the voice features in the
27
28
rR
29
present work. One of them is based on the direct Fourier transform, the second one – on
30
31 the window transformation with the subsequent integration of the values of each
ev
32
33 harmonic of all the windows. 612 features were calculated totally. More details about
34
iew
35
them are reported in [16].
36
37
38 If to abstracts from a particular application of the theory of pattern recognition,
39
40 then all possible features differ only in informativeness and mutual correlation
On
41
42 dependence. The informativeness of a feature is understood as how well it characterizes
43
44
ly
45 recognizable pattern. The higher the informativeness of a feature is, the less error of
46
47 dividing the pattern classes using the feature becomes. The sum of the probabilities of
48
49 the 1st and 2nd kinds errors of two subjects recognition by a certain feature tends to the
50
51
52
area of intersection of the probability density functions of the feature’s values
53
54 characterizing corresponding pattern classes. In the present study, the informativeness
55
56 of a group of N features was estimated by the formula:
57
58
59
60
1
2
3  n 
4 N
  Scчi ( j ) 
5   I j  i 1 

6 j 1 n
7  
8 I (N )   ,
9 N
10
11
12 where Sosi (j) is the intersection area of two probability density functions of the j-th
13
14 feature’s values characterizing the pattern "Own" and the pattern "Stranger"
15
16
respectively for the pattern class of i number, n is the number of classes of patterns
17
18
Fo
19 available for the study (for example, the number of subjects, for which an assessment of
20
21 the informativeness of certain biometric feature is provided). The lower the value of I is,
rP
22
23 the more informative the feature(s) is (Fig. 2). If necessary, this estimate can be
24
ee
25
26 reversed (I=1-I) or translated into bits of information: Ibit=-log2 I [17]. It is desirable
27
28
rR
that the informativeness of features, processed by one neuron, was comparable

29
30 (Ij±0.1).A scale of features’ informativeness was proposed by authors of the article
31
ev
32
33 (Table 1).
34
iew
35 For handwriting [1] and voice [16] the following ratio of features’
36
37 informativeness is typical: 1% and 3% – very informative, 45% and 9.5% – medium
38
39
40
informative, 50% and 87% – low-informative, 4% and 0.5% – very low-informative
On
41
42 features.
43
44 Real biometric data are correlated, generalized estimates of this dependence for
ly
45
46 dynamic biometric patterns (according to the data of 90 subjects) are shown in Table. 2.
47
48
49 For different subjects, the nature of this statistical relationship may differ significantly.
50
51 Therefore, it is required to look for functionals that will work well in the space of
52
53 correlated features.
54
55
56
57 4. Construction of Bayes-Hamming neurons in "wide" ANNs
58
59
60 At the moment, the most common point of view is the following: the correlation
1
2
3 dependence between features adversely affects the result of pattern recognition. In
4
5
6
accordance with the point, researchers, in the process of developing methods for
7
8 biometric authentication, are trying to exclude (or combine) highly dependent biometric
9
10 features, it is believed that they duplicate the information they contain. In fact, this is
11
12
not quite the right approach. The point is valid within the "fuzzy extractors" concept [1,
13
14
15 5, 6] or artificial neural networks [8], consisting of "classical" neurons based on the
16
17 weighted sum function (1). This is also true for many proximity measures (functionals),
18
Fo
19 in particular for quadratic forms and their analogues (for example, the Pearson measure,
20
21
rP
22 Euclid measure, chi-module) [4, 15]. Networks of quadratic forms also lose power if the
23
24 vector of input biometric parameters contains dependent values.
ee
25
26 Results of numerous recent researches show that there are functionals (proximity
27
28
rR
29
measures) able to make less number of erroneous decisions and learn using a less
30
31 number of instances than a neuron based on a weighted summation function (1). Certain
ev
32
33 proximity measures provide a less number of classification errors while dealing with
34
iew
35
attributes that have high cross correlation dependence (difference (2) [7, 17, 18],
36
37
38 correlation [14] and hyperbolic (3) [19] multi-dimensional Bayesian functionals):
39
40
On
41 q | mt  at | | m j  a j |
42 dt   |  |, j  t
43 j 1 t j
(2)
44
ly
45
46
47 q  (m  a ) 2 (m j  a j ) 2  q  (m  a ) 2 (m j  a j ) 2 
48 gt     t 2 t  , g    t t
 , j  t , (3)
j 1   2  t  j 1  2  2 
49  t j   t j 
50
51
52 where mj and sj are the mean (mathematical expectation) and the standard deviation
53
54
55 values of the j-th feature, calculated from the data of the training sample. It was
56
57 experimentally confirmed [7] that the higher dimension q of the Bayesian functional (2)
58
59 (the number of features entering on its input), and also the higher the coefficient of
60
1
2
3 equal correlation of these features, the fewer errors the functional performs. Thus, the
4
5
6
presence and strength of the relationship between the features is perceived as additional
7
8 information. This is quite logical, considering that the mutual correlation dependence
9
10 between certain biometric features for different subjects can significantly differ. Similar
11
12
properties were also proved for hyperbolic Bayesian functionals (3) [19].
13
14
15 The simplest variant of neuron construction involves the presence of one
16
17 functional F that processes the input vector āh (h – neuron number), whose elements are
18
Fo
19 a subset of the vector ā. At the training stage, a template E of the recognized pattern is
20
21
rP
22 formed, consisting of the parameters that the neuron’s functionals operate on (for
23
24 example, E = {m1, σ1, m2, σ2, ..., mN, σN}). The purpose of learning is to calculate these
ee
25
26 parameters deterministically, based on the data of the training sample. At the
27
28
rR
29
recognition stage, features are entering on the neuron input, the value of the functional
30
31 f=F(āh, Eh) is calculated, which is corrected by the activation function A(f). In the case
ev
32
33 of multidimensional Bayesian functionals (2) and (3), good results are shown with a
34
iew
35
threshold activation function that produces "0" or "1". For each Bayes-Hamming neuron
36
37
38 in the training stage, the response threshold is automatically determined based on
39
40 responses to the training sample data "Own" (described in [7, 17]). Thus, on the basis of
On
41
42 one of the functionals (2) or (3) and the threshold activation function, the Bayes-
43
44
ly
45 Hamming neuron is constructed with the structure shown in Fig. 3.

46
47
48
5. Construction of neurons in "wide" ANNs on the basis of other statistical
49
50 functionals
51
52
53 The algorithm GOST R 52633.5-2011 means deterministic training of one- or two-layer
54
55 perceptrons consisting in calculating mj, based on values of mj and sj characterizing the
56
57
patterns "Own" and "Stranger". According to GOST R 52633.5-2011, for the
58
59
60 construction of the "Own" template Eo it is recommended to use not less than 11
1
2
3 realizations (samples) of a biometric pattern of a person whose recognition is set up by
4
5
6
the system. To create a "Stranger" template Es it is recommended to take at least 64
7
8 samples from different people. After that, the algorithm will begin to distinguish the
9
10 pattern of the person from a pattern that does not belong to him (a separate ANN is
11
12
created for each person).
13
14
15 On a similar principle, it is proposed to build neurons based on other functionals.
16
17 If a neuron estimates the proximity of āh to Eoh and the proximity of āh to Esh with the
18
Fo
19 help of an analogous functional F, then such a neuron will be called symmetric. In the
20
21
rP
22 general case (Fig. 4), to evaluate the proximity of āh to the templates Eoh and Esh,
23
24 different functionals (F1 and F2) can be used, which can even use different subsets of
ee
25
26 features from the vector āh (for example, if some pattern has specific features that are
27
28
rR
29
not present in other patterns). We name such neurons asymmetric. Next we will
30
31 consider only symmetric neurons. The calculated values of the functionals f1=F(āh, Eoh)
ev
32
33 and f2=F(āh, Esh) are suggested to be compared with each other and to make a decision
34
iew
35
on their minimum, i.e. applying the binary threshold activation function of a neuron of
36
37
38 the form (4):
39
40
On
41 0, if f1  f 2
42 A( f1 , f 2 )  
43 1, if f1  f 2 (4)
44
ly
45
46
The activation function (4) quantizes data at the output F, as well as the threshold one,
47
48
49 but does not require the determination of the optimum threshold. In fact, such a neuron
50
51 (Fig. 4) performs identification of the pattern in the presence of two hypotheses: "Own"
52
53 and "Stranger". Nevertheless, in this case it is possible to name the mode of such neuron
54
55
56 operation by “pattern verification”, since it assesses whether āh belongs to a particular
57
58 subject or not.
59
60
1
2
3 For the construction of neurons according to Fig.4, functionals on the basis of
4
5
6
criteria for testing hypotheses about the distribution law of a random variable can be
7
8 used [17]: Pearson’s Chi-square, Cramer-von-Mises, Smirnov-Cramer-von-Mises, Gini,
9
10 Anderson-Darling, Watson, Frotsini and other criterions. The operation principle of
11
12
such statistical functionals is based on the following thing. Parameters of value
13
14
15 distribution for any feature are estimated based on a training set for a verifying pattern
16
17 (a subject, a human), that means the features are treated as a set of random values. Next,
18
Fo
19 the initial features values are normalized by formula (5), following which they may be
20
21
rP
22 presented as values of a certain normally distributed random variable with statistical

23
24 expectation equal zero and individual standard deviation (mo0 и σo1). At the decision-
ee
25
26 making stage features values are normalized in the same way (taking into account the
27
28
rR
29 parameters of the pattern template to be verified) and empiric distribution is formed.

30
31 However, the parameters of this distribution will differ, therefore deviations will be
ev
32
33 registered when the empiric and reference distributions are compared using any
34
iew
35
36 criterion mentioned above.
37
38
39 a j  m j(own)
40 aˆ  ,
 j (own )
On
41 (5)
42
43
44 where mj(own) and σj(own) are the mathematical expectation and the standard deviation of
ly
45
46
47
the j-th feature values for the pattern "Own". On the basis of values of ȃ, parameters of
48
49 reference distribution "Stranger" (ms0 and σs1) can also be calculated. Here and
50
51 further the case, where each j-th feature has a normal distribution of values (or close to
52
53
54
it), is considered.
55
56 Statistical functionals on the basis of the agreement criteria are implemented in 3
57
58 versions: integral (operating with probabilities), differential (operating with probability
59
60
1
2
3 densities) and integral-differential (operating with probabilities and probability density
4
5
6
of features’ values). Studies show that in the space of independent features some
7
8 integral functionals work more efficiently constructing a neuron based on the
9
10 architecture from Fig. 3, and the differential ones opposite – if we use the architecture
11
12
from Fig. 4. Neurons based on these functionals have their own characteristics. First of
13
14
15 all, a size of the training set K and a number of features N are interrelated values (a
16
17 number of features to a certain extent is an equivalent of the size of the training set).
18
Fo
19 The more features (the larger the dimension of the functional), the more normalized
20
21
rP
22 values ȃ can be obtained and the higher is the quality of training (curves of probability
23
24 densities or bar charts of relative frequencies are more precise). Thus, an increase in a
ee
25
26 number of features balances out a small size of the training set. This effect of increase in
27
28
rR
29
the quality of training is exactly opposite to the behaviour of wide networks where an
30
31 increase in a number of features, depending on a number of layers of the artificial neural
ev
32
33 network, leads to an exponential growth of the required size of the training set.
34
iew
35
It has been established that on the small "Own" training samples (10-20
36
37
38 samples) in the space of independent features with the informativeness value 0.7≥I≥0.3,
39
40 good results on the reliability of pattern recognition are shown by functionals based on
On
41
42 the Gini criteria (6)-(7) and Cramer-von-Mises (8)-(9), Kolmogorov-Smirnov (10), the
43
44
ly
45 maximum of the intersection area of the comparable probability density functions (11):
46
47
48 
49 DgI   | P(aˆ )  P (aˆ ) |daˆ,
 (6)
50
51
52
53 
54 DgD   | p(aˆ )  p (aˆ ) | daˆ ,
55  (7)
56
57
58 
59 KfMI   ( P(aˆ )  P (aˆ )) 2 daˆ,
60  (8)
1
2
3 
4 KfMD   ( p(aˆ )  p (aˆ )) 2 daˆ ,
5  (9)
6
7
8
KS  sup ( p(aˆ )  p (aˆ )), (10)
9    aˆ  
10
11
12 
 p( aˆ ), if p( aˆ )  p( aˆ ) (11)
13 MaxSq  1   (min( aˆ )   p( aˆ ), if p( aˆ )  p( aˆ )
ˆ
)da,
14 
15
16
17 where P(ȃ) is the empirical probability function, Ṗ(ȃ) is its reference description, p(ȃ) is
18
Fo
19
the empirical probability density function, ṗ(ȃ) is its reference description. The last
20
21
rP
22 criterion was proposed by the authors of this work for pattern recognition problems. The
23
24 solutions of the integral functionals Gini (6) and Cramer-von-Mises (8) are strongly
ee
25
26 correlated, as well as their differential analogs (7) and (9) are.
27
28
rR
29 Nevertheless, these functionals lose power if the correlation of features

30
31 increases. As will be shown below, many statistical functionals can be customized to
ev
32
33 handle highly dependent features. However, for this it is necessary to follow the variant
34
iew
35
36
of constructing the neuron shown in Fig. 4.
37
38
39 6. “Whitening transformation” for neurons based on the agreement criteria
40
On
41 functionals
42
43
44 The authors of this paper investigated the regularities between the parameters of
ly
45
46 functionals based on agreement criteria, informativeness of the features and their
47
48 correlation dependence. A close relationship between the average features’
49
50
51 informativeness I(N) and s (the standard deviation of the normalized values of the
52
53 features of the reference distribution "Stranger") is revealed (Fig. 5). This relationship is
54
55
closer to the functional one, the closer the value of informativeness Ij of each of the N
56
57
58 features, and becomes functional if the features are equally informative. Also, there is
59
60 almost the same relation between the analogous parameter of the empirical distribution
1
2
3 e(ās) generated by the pattern of "Stranger" and I(N). The relation varies significantly
4
5
6 if I(N)>0.7 (Fig. 5).
7
8 Correlation between features does not affect the parameters of the reference
9
10 distributions of mo(N), oN, ms(N), sN, but it affects the parameters of the empirical
11
12
13 distribution me(a) and e(a). The influence of the correlation dependence between the
14
15 features on the parameters of the empirical distribution is most noticeable, if ā belongs
16
17
to the pattern "Own" (ao). The stronger the relationship between the features aj, the
18
Fo
19
20 "thinner" and "higher" the probability density function p(ȃ) of the pattern "Own" (Fig.
21
rP
22 6) becomes. If features are dependent, then decreasing/increasing of a1 value causes a

23
24 similar change in the values of a2, a3, etc. Accordingly, the normalized values of ȃ are
ee
25
26
27 localized in a narrow interval, as a result of which me≠0, and e<1. The higher the
28
rR
29 dependence of the features, the more me value differs from zero value, and the smaller
30
31
value of e becomes. At the training stage several vectors of the correlated features
ev
32
33
34 values ā are introduced, so the reference parameters mo and o remain almost the same
iew
35
36
value as in the case of features independence, i.e. e(āo)<o, which is the cause of the 1st
37
38
39 kind errors. The informativeness of the features does not affect me(ā) and e(ā).
40
On
41 The described regularities are valid for both generated and real biometric data
42
43
44
(keystroke dynamics [6], voice [16], handwriting [1] and etc.), provided that the features
ly
45
46 are approximately equally informative and the distribution law of their values is close to
47
48 normal one. The physical nature of the features is not important.
49
50
The above data can be thought as a useless empiricism if there are no
51
52
53 constructive conclusions. If āh data are correlated, then it can be corrected by increasing
54
55 the values of e and h for the h-th neuron proportionally, remaining the same value of
56
57
58 o. Such technique makes it possible to eliminate the correlation dependence between
59
60 the elements of the vector aj and simultaneously go to the space of more informative
1
2
3 features. All the input empirical distributions and the reference distribution "Stranger"
4
5
6
should be normalized according to the formula (12):
7
8
9    
10  o  K o    o  K o 
11 's   s     ; 'e  e     ,
 Ko
  Kс

12
13



k 1
e ( aсk ) 




k 1
e ( aсk ) 
 (12)
14
15
16
17 where k is the number of the training sample "Own", Ko is the volume of the training
18
Fo
19 sample "Own". Note that the parameter Δσ can be used to estimate the multidimensional
20
21 correlation with the appropriate justification (which is not the purpose of the article).
rP
22
23
24
The found regularities can be exploited if I(N)≤0.7. The greater I(N ) value (greater than
ee
25
26 0.7), the effect of correlation on the parameters of the reference distribution "Stranger"
27
28 becomes tangible. At I(N)=0.9, the values of s and s(N, as) for dependent features
rR
29
30
31 differ by an average of 3 times (Fig. 5).
ev
32
33
34
7. Testing of neurons in problems of speaker and signer recognition
iew
35
36
37 An experiment was performed to evaluate the effectiveness of Bayes-Hamming neurons
38
39
(Fig. 3) and neurons based on the agreement criteria (Fig. 4) in verifying handwritten
40
On
41
42 and voice patterns of a subject. In the experiment, 90 subject participated each of which
43
44 during a month entered a certain handwritten pattern (signature or password) using a
ly
45
46 Wacom graphic tablet, as well as a voice pattern "permit access" using a Sony F-V120
47
48
49 microphone. Once a week, each subject entered more than 20 voice and handwritten
50
51 patterns. As a result, more than 100 voice and 100 handwritten patterns were received
52
53 from each subject (the total number of samples of each type exceeded 104).
54
55
56
The testing of neurons was carried out separately for verification of handwritten
57
58 and voice patterns of a person. Unique neurons were constructed for each subject and
59
60 trained to recognize the subject’s biometric pattern. Regardless of the type of
1
2
3 recognizable pattern (voice or handwriting dynamics), the first 20 biometric samples of
4
5
6
the subject were used to create his template "Own". To create a template "Stranger"
7
8 corresponding to the same subject, 64 samples obtained from other subjects (one sample
9
10 from each one) were used. So, for each of the 90 subjects templates "Own" and
11
12
"Stranger" of the handwriting dynamics and voice pattern were created (180 templates
13
14
15 "Own" and 180 templates "Stranger"). The remaining samples were used to assess the
16
17 reliability of subjects recognition.
18
Fo
19 In the experiment, the aspect of subject’s dynamic biometric patterns

20
21
rP
22 changeability in time was considered, since training was performed on biometric data
23
24 obtained on the first day of the experiment, and testing of neurons and their networks
ee
25
26 was performed using next days’ data. This approach is the closest to the real conditions
27
28
rR
29
of application of methods of biometric authentication of subjects. Patterns changeability
30
31 in time was not considered in previous experiments [1, 6, 7, 15, 16].
ev
32
33 As can be seen from Table. 2, the voice pattern contains on average more than
34
iew
35
25% of dependent features (|r|> 0.5), handwritten pattern - more than 15%. On average
36
37
38 16 of the 335 handwriting dynamic’s features are highly dependent ones (for each
39
40 subject this number is different). On the basis of any Bayesian functional (2) and (3)
On
41
42 relatively to any of the 16 features, it is possible to form a set of variants of neurons
43
44
ly
45 with dimension q from 2 to 16, the total number of which is equal to the sum of the CqN
46
47
combinations (from N to q), N = 16, 16≥q≥2. The number of possible configurations of
48
49
50 neurons for processing of voice features is greater. Each of these neurons will be
51
52 unique. However, this is a very excessive amount (the solutions of most of the neurons
53
54 will be strongly correlated). Therefore, within the experiment, the neuron construction
55
56
57 technique was used [7], by which it was determined how many inputs each neuron
58
59 would have depending on  (the interval of equal correlation between features, in the
60
1
2
3 experiment  = 0.2). Features having numbers j and i enter the neuron associated with
4
5
6 the t feature if the difference in the coefficients of pair correlation between the j and t
7
8 features and the i and t features does not exceed . Construction of Bayes-Hamming
9
10
networks are described in detail in [7]. Construction of Bayes-Hamming neurons for
11
12
13 voice pattern recognition was similar one. To construct Bayes-Hamming networks is
14
15 better based on highly dependent features (| r |> 0.5), considering features with moderate
16
17 dependence (0.5>|r|>0.3) does not give advantages, but increases the time required for
18
Fo
19
20 processing of biometric data (Fig. 7) .
21
rP
22 The concept of networks construction of neurons based on the functionals (6)-

23
24 (11) is different. The dimensionality of one neuron should not be too low, otherwise the
ee
25
26
27
empirical distribution can not be constructed. According to Chebyshev's theorem, it is
28
rR
29 necessary to increase the dimension of functionals, but in practice it is more

30
31 advantageous to increase the number of unique neurons than to construct a small
ev
32
33
number of neurons based on high-dimensional functionals. Independent features have to
34
iew
35
36 be processed by neurons that do not use the transformation (12). Studies show that with
37
38 processing conditionally independent features (|r|<0.3), the number of inputs of the
39
40
On
neuron q is required to be set based on the informativeness of processed features. In this

41
42
43 case, the features need to be grouped by informativeness and distributed among the
44
ly
45 neurons (features, entering one neuron, have not differ in informativeness by more than
46
47 ΔI). It is recommended that the number of identical features in two different neurons is
48
49
50
not more than q/4. An increase in the number of identical features at the input of
51
52 different neurons leads to a large redundancy of the ANN and the correlation of its
53
54 outputs. The less informative features are, the more inputs a neuron should have. On a
55
56
similar basis, a network of quadratic forms and their analogues (Pearson-Hamming
57
58
59 networks, Chi-modules) has to be constructed. In the present work, independent features
60
1
2
3 were grouped at ΔI = 0.1, while the dimension of the functionals increased 1.5 times
4
5
6
with the transition from a less informative category of features to a more informative
7
8 one. The dimension of the functionals (6) - (11), processing features with I<0.1, was
9
10 q=3 (for quadratic forms in the analogous case, q=2). The effectiveness of the use of
11
12
networks of statistical functionals (6) - (11) and networks of quadratic forms is shown
13
14
15 in Fig. 8.
16
17 Processing of dependent features (|r|>0.3) with neurons based on the functionals
18
Fo
19 (6) - (11), the transformation (12) must be applied. Grouping dependent features by
20
21
rP
22 their informativeness is not required. Features, that have a significant correlation

23
24 dependence, are also comparable in informativeness. It is easy to see that if to represent
ee
25
26 two linearly dependent features - they will have equal probability density functions. The
27
28
rR
29
closer the relationship between characteristics to linear is, the more similar the
30
31 probability density functions, describing respectively the patterns of "Own" and
ev
32
33 "Stranger" (the intersection of these functions for both features will in most cases be
34
iew
35
within the range of ΔI), are. Studies have shown that, independently of the
36
37
38 informativeness of the dependent features, the dimension of the functional (6) - (11)
39
40 should be set within 25>q>15. A further increase of q does not lead to an improvement
On
41
42 of the result (the neuron is saturated). For each subject, a set of dependent features was
43
44
ly
45 determined at the stage of training. The results of testing the functionals (6) - (11),
46
47 considering the transformation (12), as well as the ANN on their basis, are shown in
48
49 Fig. 9 and 10. The use of 4.5% of extremely low-informative dependent features had to
50
51
52
be abandoned, since for I>0.7 the transformation (12) fails because of the deviation of
53
54 the estimates s, и e(ās) (Fig. 5).
55
56 It’s seen from Fig. 9 that the proposed transformation (12) actually increases the
57
58
59 efficiency of work with strongly correlated features of all the functionals considered on
60
1
2
3 the basis of the agreement criteria, in addition to the Cramer-von-Mises differential
4
5
6
criterion (in this case, the result is analogous). The proposed transformation (12)
7
8 increases the efficiency of the functionals DgI, DgD, KfMI, KS, MaxSq by 15% -25%
9
10 when recognizing handwritten patterns in the space of dependent features. The
11
12
efficiency of a single neuron based on one of the functionals DgI', DgD', KfMI', KS',
13
14
15 MaxSq' is approximately 2 times lower than the network of the Bayes-Hamming
16
17 neurons (Fig. 7 and 9). Nevertheless, the network of these functionals is even more
18
Fo
19 efficient than one from Bayesian difference and hyperbolic functionals (Fig. 7 and 10).
20
21
rP
22 It can be seen from Fig. 7, 8 and 10 that, using a relatively small part (15-25%) of the
23
24 dependent features, it is possible to achieve better results than using most of
ee
25
26 independent features. Thus, dependent features are more informative than independent
27
28
rR
29
ones. Thus, the correlation between the features must be perceived as a special kind of
30
31 information characterizing the recognized biometric patterns. It also follows from these
ev
32
33 figures that functionals based on agreement criteria make it possible to create more
34
iew
35
efficient neuronet pattern recognition algorithms than quadratic forms and
36
37
38 multidimensional Bayesian functionals.
39
40 At the final stage of the experiment, hybrid ANNs consisting of several
On
41
42 independent segments (subnets) were formed and tested to identify subjects:
43
44
ly
45
46  the network of quadratic forms (and their analogs) that processes independent
47
48 features;
49
50  the network based on agreement criteria that processes independent features;
51
52
53  the network based on modified agreement criteria (using the transformation
54
55 (12)), processing dependent features;
56
57
 the network of multidimensional Bayes functional that processes highly
58
59
60 dependent features.
1
2
3 Each subnet independently calculated the Hamming distance, normalized by the number
4
5
6
of neurons contained in it. Further, these distances were multiplied or summed. The
7
8 results are summarized and presented in Table. 3.
9
10 Summation of Hamming distances is not always an optimal approach.
11
12
Multiplication of the normalized Hamming distances from each subnet often gives the
13
14
15 best result if the solutions of two networks are combined, one of which processes
16
17 combinations of dependent features, another one – independent features (Table 3).
18
Fo
19 However, combining the solutions of all networks by means of multiplication or

20
21
rP
22 summation of the normalized Hamming distances does not lead to significant

23
24 improvements (Table 3). The most successful combination is a network of quadratic
ee
25
26 forms, combined with a network of modified agreement criteria, the solutions of which
27
28
rR
29
are combined using multiplication. In this case, the system can be configured for the
30
31 following indicators:
ev
32
33
34  recognizing signers FAR <0.001, FRR = 0.17;
iew
35
36
37  recognizing speaker FAR <0.001, FRR = 0.34.
38
39
40 The presented estimations of FAR, FRR and EER are obtained when ANN is
On
41
42 configured for the "average user". If to set the Hamming distance threshold (and its
43
44
ly
multiplication) to each subject individually, reduction in the number of errors for

45
46
47 another 10-20% can be achieved.
48
49
50
51 Conclusion
52
53 In presented work, two variants of neuron construction for wide neuronets were
54
55
56 considered, allowing much more efficient processing of biometric (and other) features
57
58 with high mutual correlation dependence in pattern recognition. The first variant is
59
60 expediently used together with the threshold activation function and Bayesian
1
2
3 functionals (difference, hyperbolic [7, 17, 18, 19]). The second one - together with
4
5
6
functionals that are built on the basis of criteria for testing hypotheses about the
7
8 distribution law of a random variable (the agreement criteria).
9
10 In the subject recognition tasks on the handwritten and voice patterns, several
11
12
functionals were tested on the basis of the agreement criteria of Gini, Cramer-von-
13
14
15 Mises, Kolmogorov-Smirnov. A criterion for the maximum intersection area of
16
17 comparable probability density functions is proposed. A transformation is proposed that
18
Fo
19 allows to modify these functionals to work with highly dependent features. A network
20
21
rP
22 of neurons based on these functionals showed less than 4% of human verification errors
23
24 by signature, and on the basis of multidimensional Bayesian functionals - less than 6%.
ee
25
26 In this case, combinations of only "bad" (correlated) features, which numbers were less
27
28
rR
29
than 15%, were used. This proves that the low-informative biometric patterns contain an
30
31 unused potential. There are even more than 85% of almost independent features of a
ev
32
33 signature, which can be processed with perceptrons, quadratic forms and other
34
iew
35
proximity measures. Variants of hybrid ANNs construction based on Bayesian
36
37
38 functionals, quadratic forms and agreement criteria, set for each user, allow to achieve
39
40 ERR <2% and create a biometric authentication system by a signature or handwritten
On
41
42 password with the following parameters: FAR<0.1% with FRR≈10%. For speaker
43
44
ly
45 recognition system for a control fixed phrase, these indicators were: FAR <0.1% for
46
47 FRR≈23-30% depending on the speaker (ERR<6%). When using secret voice
48
49 passwords, the percentage of errors have to be significantly lower. In practice, these
50
51
52
systems are able to successfully learn on 20 examples of a user's biometric patterns.
53
54 The effect of increasing of efficiency using hybrid ANNs is explained by the fact that
55
56 the output values of neurons will be slightly correlated, since functionals underlying the
57
58
neurons have different principles of operation. Accordingly, neurons’ solutions can be
59
60
1
2
3 combined with a Hamming measure or other methods to obtain a synergistic effect. The
4
5
6
question on how it is better to integrate the outputs of networks of quadratic forms,
7
8 Bayes-Hamming and agreement criteria remains open. It is required to test more
9
10 variants of solutions combinations formed by several ANN segments, each of which is
11
12
built on the basis of neurons of a certain type. The use of several methods of combining
13
14
15 neuron solutions makes it possible to create multilayer hybrid ANNs that can be trained
16
17 on a small number of examples.
18
Fo
19 The described approach can be extended to any biometric patterns (and not only
20
21
rP
22 ones). Important thing in the construction and training of ANN is not the physical nature
23
24 of features, but the indicator of their informativeness and correlations. A scale of
ee
25
26 features’ informativeness for patterns verification problems was proposed in this work.
27
28
rR
29 Acknowledgments
30
31 The research was supported by the Russian Science Foundation (project №17-71-10094).
ev
32
33 References
34
iew
35
36 (1) Lozhnikov, P.S., Sulavko, A.E., Eremenko, A.V., & Volkov, D.A. (2016).
37
38 Methods of Generating Key Sequences based on Parameters of Handwritten
39
40
On
41
Passwords and Signatures. Information, 7(4), 59.
42
43 (2) Frank, J., Mannor, S., & Precup, D. (2010, July) Activity and Gait Recognition
44
ly
45 with Time-Delay Embeddings. Proceedings of the AAAI (pp. 1581-1586).

46
47
Atlanta, GA, USA.
48
49
50 (3) Sohankar, J., Sadeghi, K., Banerjee, A., & Gupta. S.K.S. (2015, November). E-
51
52 BIAS: A Pervasive EEG-Based Identiﬁcation and Authentication System.
53
54 Proceedings of the 11th ACM Symposium on QoS and Security for Wireless and
55
56
57 Mobile Networks (pp. 165-172). Cancun, Mexico.
58
59
60
1
2
3 (4) Sulavko, A.E., Fedotov, A.A., & Eremenko, A.V. (2017, November).Users'
4
5
6
identification through keystroke dynamics based on vibration parameters and
7
8 keyboard pressure. Proceedings of XI International IEEE Scientific and
9
10 Technical Conference "Dynamics of Systems, Mechanisms and Machines"
11
12
(Dynamics) (pp. 1-7). Omsk, Russia.
13
14
15 (5) Monrose, F., Reiter, M. K., Li, Q., & Wetzel, S. (2000, May). Cryptographic key
16
17 generation from voice. Proceedings of the 2001 IEEE Symposium on Security
18
Fo
19 and Privacy (pp. 202-213). Oakland, CA, USA.

20
21
rP
22 (6) Lozhnikov, P.S., Sulavko, A.E., Eremenko, A.V., & Buraya, E.V. (2016,
23
24 November). Methods of generating key sequences based on keystroke dynamics.
ee
25
26 Proceedings of X International IEEE Scientific and Technical Conference
27
28
rR
29
"Dynamics of Systems, Mechanisms and Machines" (Dynamics) (pp. 1-5).
30
31 Omsk, Russia.
ev
32
33 (7) Ivanov, A.I., Lozhnikov, P.S., & Sulavko, A.E. (2017). Evaluation of signature
34
iew
35
verification reliability based on artificial neural networks, Bayesian multivariate
36
37
38 functional and quadratic forms. Computer Optics, 5, 765-774.
39
40 (8) Hafemann, L. G. et al. (2016, July). Writer-independent Feature Learning for
On
41
42 Offline Signature Verification using Deep Convolutional Neural Networks.
43
44
ly
45 Proceedings of 2016 International Joint Conference on Neural Networks

46
47 (IJCNN) (pp. 2576-2583). Vancouver, BC, Canada.
48
49 (9) Dechter, R. (1986, August). Learning while searching in constraint satisfaction
50
51
52
problems. Proceedings of the 5th National Conference on Artificial Intelligence
53
54 (pp. 178-183). Philadelphia, USA.
55
56
57
58
59
60
1
2
3 (10) Galushkin, A.I. (1974). Sintez mnogosloynykh sistem raspoznavaniya obrazov
4
5
6
[The Synthesis of Multi-layer Systems for Pattern Recognition] (in Russian),
7
8 Moscow: Energiya.
9
10 (11) Hinton, G. E. (2002). Training Products of Experts by Minimizing Contrastive
11
12
Divergence. Neural Comput., 14(8), 1771-1800.
13
14
15 (12) Yasuoka, Y., Shinomiya, Y., & Hoshino, Y. (2016, August). Evaluation of
16
17 Optimization Methods for Neural Network. Proceedings of 8th International
18
Fo
19 Conference on Soft Computing and Intelligent Systems and 17th International

20
21
rP
22 Symposium on Advanced Intelligent Systems. Sapporo, Hokkaido, Japan. doi:

23
24 10.1109/SCIS-ISIS.2016.0032.
ee
25
26 (13) Ivanov, A.I. (2012). Podsoznaniye iskusstvennogo intellekta:
27
28
rR
29
programmirovaniye avtomatov neyrosetevoy biometrii yazykom ikh obucheniya
30
31 [Subconscious of artificial intelligence: programming of neural network
ev
32
33 biometrics automata by the language of their learning] (in Russian). JSC, Penza:
34
iew
35
"PNIEI", 125 P.
36
37
38 (14) Ivanov, A.I., Kachajkin, E.I., & Lozhnikov, P.S. (2016, May). A Complete
39
40 Statistical Model of a Handwritten Signature as an Object of Biometric
On
41
42 Identification. Proceedings of 2016 International Siberian Conference on
43
44
ly
45 Control and Communications (SIBCON 2016) (pp. 1-5). Moscow, Russia.

46
47 (15) Lozhnikov, P.S. & Sulavko, A.E. (2017, November). Usage of quadratic form
48
49 networks for users’ recognition by dynamic biometric images. Proceedings of XI
50
51
52
International IEEE Scientific and Technical Conference "Dynamics of Systems,
53
54 Mechanisms and Machines" (Dynamics) (pp. 1-6). Omsk, Russia.
55
56 (16) Sulavko, A.E., Yeremenko, A.V., Borisov, R.V., & Inivatov, D.P. (2017).
57
58
Vliyaniye psikhofiziologicheskogo sostoyaniya diktora na parametry yego
59
60
1
2
3 golosa i rezul'taty biometricheskoy autentifikatsii po rechevomu parolyu
4
5
6
[Influence of speaker’s psychophysiological state on the parameters of speaker’s
7
8 voice and the results of biometric authentication on the voice password] (in
9
10 Russian). Komp'yuternyye instrumenty v obrazovanii, 4, 29-47, 2017.
11
12
(17) Ivanov, A.I. (2016). Mnogomernaya neyrosetevaya obrabotka biometricheskikh
13
14
15 dannykh s programmnym vosproizvedeniyem effektov kvantovoy superpozitsii
16
17 [Multidimensional neural network processing of biometric data with software
18
Fo
19 reproduction of quantum superposition effects] (in Russian), Monograph, Penza:

20
21
rP
22 "PNIEI", 133 P.
23
24 (18) Ivanov, A.I., Lozhnikov, P.S., & Serikova, Yu.I. (2016). Reducing the Size of a
ee
25
26 Sample Sufficient for Learning Due to the Symmetrization of Correlation
27
28
rR
29
Relationships Between Biometric Data. Cybernetics and Systems Analysis,
30
31 52(3), 379–385.
ev
32
33 (19) Ivanov, A.I. Lozhnikov, P.S., & Vyatchanin, S.E. (2017). Comparable
34
iew
35
Estimation of Network Power for Chisquared Pearson Functional Networks and
36
37
38 Bayes Hyperbolic Functional Networks while Processing Biometric Data.
39
40 Proceedings of 2017 International Siberian Conference on Control and
On
41
42 Communications (SIBCON 2017). Astana, Kazakhstan. doi:
43
44
ly
45 10.1109/SIBCON.2017.7998435.
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
1
2
3 Table 1. Proposed scale of features’ informativeness
4
5 I≤0,001 9.966≤ Ibit Top informative 0,999≤ Ī
6
7 0,001<I≤0,1 3.322< Ibit ≤9.966 Highly Informative 0,9≤ Ī<0,999
8
9
10 0,1<I≤0,3 1.737< Ibit ≤3.322 Very informative 0,7≤ Ī<0,9
11
12 0,3<I≤0,5 1< Ibit ≤1.737 Medium informative 0,5≤ Ī<0,7
13
14 0,5<I≤0,7 0.515< Ibit ≤1 Low-informative 0,3≤ Ī<0,5
15
16
17 0,7<I≤0,9 0.152< Ibit ≤0.515 Very low informative 0,1≤ Ī<0,3
18
Fo
19 0,9<I≤0,999 0.001< Ibit ≤0.152 Almost not informative 0,001≤ Ī<0,1

20
21 Ī<0,001
rP
0,999<I Ibit<0.001 Non-informative

22
23
24
ee
25 Table 2. The proportion of pair features combinations with a certain correlation

26
dependence for different types of biometric patterns
27
28
rR
Handwriting
29 Strength and direction of Voice
30
31
dynamics Keystroke dynamics [9]
ev
32 relation [10]
33 [11]
34
iew
35 r<-0.7 0.05% 0% 0%
36
37
38
-0.7≤ r <-0.5 0.35% 0.2% 0%
39
40 -0.5≤ r <-0.3 1% 0.8% 0.05%
On
41
42 -0.3≤ r <0.3 84.3% 74.1% 91.15%
43
44
ly
45
0.3≤ r <0.5 9.5% 17% 4.7%
46
47 0.5≤ r <0.7 3.5% 6.3% 3.55%
48
49 0.7≤ r <0.9 1% 1.5% 0.55%
50
51
52
0.9≤ r <1 0.3% 0.1% 0%
53
54
55
56
57
58
59
60
1
2
3 Table 3. Errors probabilities of subject recognition (EER) when using different
4
5 configurations of hybrid ANNs and types of biometric patterns
6
7 The method of
8 Types of neural functionals Handwritten Voice
9 combining of subnets
10 (combined subnets) patterns, EER patterns, EER
11 solutions
12
13
14 Quadratic forms + modified summation 0,026 0,065
15
16 agreement criteria multiplication 0,023 0,07
17
18
Fo
Bayesian functionals + summation 0,041 0,092

19
20
21 agreement criteria multiplication 0,043 0,086
rP
22
23 Bayesian functionals + summation 0,023 0,07
24
ee
25 agreement criteria +
26
27
28 quadratic forms + multiplication 0,023 0,075
rR
29
30 modified agreement criteria
31
ev
32
33
34
iew
35
36
37
38
39
40
On
41
42
43
44
ly
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Fo
19
20
21
rP
22
23
24
ee
25
26
27
28
rR
29 Figure 1. Schematic representation of a hybrid "wide" neural network

30
31
ev
32
33
34
iew
35
36
37
38
39
40
On
41
42
43
44
ly
45
46
47 Figure 2. Reference descriptions “Own” for some features with different relevance.
48
49
50
51
52
53
54
55
56
57
58
59
60
1
2
3 Figure 3. Neuron with one functional and threshold activation function.
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Fo
19 Figure 4. Neuron with two functionals and threshold activation function.

20
21
rP
22
23
24
ee
25
26
27
28
rR
29
30
31
ev
32
33
34
iew
35
36
37
38
39
40
On
41
42
43
44
ly
45 Figure 5. The change in the standard deviation of the reference (up) and empirical
46
47 distributions (bottom), generated by a stranger.
48
49
50
51
52
53
54
55
56
57
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Fo
19
20
21
rP
22
23
24
ee
25 Figure 6. Example of functions p(ȃ) и ṗ(ȃ): a. aj is independent, b. aj is dependent, c. aj

26
27 decorrelated with (12).
28
rR
29
30
31
ev
32
33
34
iew
35
36
37
38
39
40
On
41
42
43
44
ly
45
46
47
48 Figure 7. The results of verification of biometric patterns by the Bayes-Hamming
49
50 networks.
51
52
53
54
55
56
57
58
59
60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Fo
19
20
21
rP
22 Figure 8. The results of verification of biometric patterns in the space of independent

23
24 features using networks of quadratic forms and networks of statistical functionals (6)-
ee
25 (11).
26
27
28
rR
29
30
31
ev
32
33
34
iew
35
36
37
38 Figure 9. The results of verification of signatures by neurons on the basis of statistical
39
40 functionals.
On
41
42
43
44
ly
45
46
47
48
49
50
51
52
53
54
55
56
57
58 Figure 10. The results of verification of biometric patterns in the space of dependent
59
features using networks of statistical functionals (6)-(11) (with the conversion (12)).
60

System AppendPDF Proof Hi

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

System AppendPDF Proof Hi

Uploaded by

Copyright:

Available Formats

Journal of Experimental & Theoretical Artificial Intelligence

Perspective Neural Network Algorithms for Dynamic

Journal: Journal of Experimental & Theoretical Artificial Intelligence

Manuscript Type: Original Article

statistical functionals, handwriting signature features, voice parameters,

These authors contributed equally to this work.

dependent biometric data. Variants of construction of hybrid neural networks,

20), are suggested. An experiment was conducted to collect dynamic biometric

35 parameters, "wide hybrid" neural networks, random variable distribution law,

45 recognition. Interest in solving the problem of increasing the reliability of biometric

45 biometric features and be trained on a small number of examples. Very promising

29 identification (one-to-many comparison) one. The verification mode is used when

45 dynamics (FRR=10.4%, FAR=2.1% [6]).

19 is not enough in biometric applications because the system should be guaranteed

22 to be trained on only a few examples (preferably about 10-20).

of examples, regardless of the complexity of the ANN (training is performing

32 on different activation functions, but also on any functionals. For comparison,

45 the functional (dimensionality of the functional), and μj is the input neuron’s

19 (keystroke dynamics - FRR=5,5%, FAR=6.9%, handwriting dynamics - FRR=1.2%,

35 features. Feature is a concrete attribute characterizing patterns (classes) having a certain

that the informativeness of features, processed by one neuron, was comparable

45 Hamming neuron is constructed with the structure shown in Fig. 3.

22 presented as values of a certain normally distributed random variable with statistical

29 parameters of the pattern template to be verified) and empiric distribution is formed.

29 Nevertheless, these functionals lose power if the correlation of features

22 6) becomes. If features are dependent, then decreasing/increasing of a1 value causes a

19 In the experiment, the aspect of subject’s dynamic biometric patterns

22 The concept of networks construction of neurons based on the functionals (6)-

29 necessary to increase the dimension of functionals, but in practice it is more

neuron q is required to be set based on the informativeness of processed features. In this

22 their informativeness is not required. Features, that have a significant correlation

19 However, combining the solutions of all networks by means of multiplication or

22 summation of the normalized Hamming distances does not lead to significant

multiplication) to each subject individually, reduction in the number of errors for

45 with Time-Delay Embeddings. Proceedings of the AAAI (pp. 1581-1586).

19 and Privacy (pp. 202-213). Oakland, CA, USA.

45 Proceedings of 2016 International Joint Conference on Neural Networks

19 Conference on Soft Computing and Intelligent Systems and 17th International

22 Symposium on Advanced Intelligent Systems. Sapporo, Hokkaido, Japan. doi:

45 Control and Communications (SIBCON 2016) (pp. 1-5). Moscow, Russia.

19 reproduction of quantum superposition effects] (in Russian), Monograph, Penza:

19 0,9<I≤0,999 0.001< Ibit ≤0.152 Almost not informative 0,001≤ Ī<0,1

0,999<I Ibit<0.001 Non-informative

25 Table 2. The proportion of pair features combinations with a certain correlation

Bayesian functionals + summation 0,041 0,092

29 Figure 1. Schematic representation of a hybrid "wide" neural network

19 Figure 4. Neuron with two functionals and threshold activation function.

25 Figure 6. Example of functions p(ȃ) и ṗ(ȃ): a. aj is independent, b. aj is dependent, c. aj

22 Figure 8. The results of verification of biometric patterns in the space of independent

You might also like