Professional Documents
Culture Documents
Jair Minoro Abe1,3, Joo Carlos Almeida Prado2, and Kazumi Nakamatsu4
2 1 Information Technology Dept., ICET Paulista University Brazil Faculty of Philosophy, Letters and Human Sciences University of So Paulo Brazil 3 Institute For Advanced Studies University of So Paulo Brazil 4 School of Human Science and Environment/H.S.E. University of Hyogo Japan jairabe@uol.com.br, joaocarlos@autobyte.com.br, nakamatu@shse.u-hyogo.ac.jp
Abstract. In this work we sketch how Paraconsistent Artificial Neural Network PANN can be useful in speech signals recognition by using phonic traces signals. The PANN is built based on Paraconsistent Annotated Logic E and it allows us to manipulate uncertain, inconsistent and paracomplete information without trivialization.
1 Introduction
Many pattern recognition applications use statistical models with a large number of parameters, although the amount of available training data is often insufficient for robust parameter estimation. In order to overcome these aspects, a common technique to reduce the effect of data sparseness is the divide-and-conquer approach, which decomposes a problem into a number of smaller subproblems, each of which can be handled by a more specialized and potentially more robust model. This principle can be applied to a variety of problems in speech and language processing: the general procedure is to adopt a feature-based representation for the objects to be modeled (such as phones or words), learn statistical models describing the features of the object rather than the object itself, and recombine these partial probability estimates. Although this enables a more efficient use of data, other interesting techniques have been employed for the task. One of the most successful theories is the so-called artificial neural networks - ANN. ANN are computational paradigms based on mathematical models that unlike traditional computing have a structure and operation that resembles that of the mammal brain. ANN or neural networks for short, are also called connectionist systems, parallel distributed systems or adaptive systems, because they are composed by a series of interconnected processing elements that operate in parallel. Neural networks lack centralized control in the classical sense, since all the interconnected processing elements change or adapt simultaneously with the flow of information and adaptive rules. One of the original aims of ANN was to understand and shape the functional characteristics and computational properties of the brain when it performs cognitive
B. Gabrys, R.J. Howlett, and L.C. Jain (Eds.): KES 2006, Part II, LNAI 4252, pp. 844 850, 2006. Springer-Verlag Berlin Heidelberg 2006
845
processes such as sensorial perception, concept categorization, concept association and learning. However, today a great deal of effort is focused on the development of neural networks for applications such as pattern recognition and classification, data compression and optimization. Most of ANN known is based on classical logic or extensions of it. In this paper we are concerned in applying a particular ANN, namely the paraconsistent artificial neural network PANN, introduced in [4] which is based on paraconsistent annotated logic E [1] to speech signals recognition by using phonic traces signals. The PANN is capable of manipulating concepts like uncertainty, inconsistency and paracompleteness in its interior.
2 Background
Paraconsistent Artificial Neural Networks PANN is a new artificial neural network introduced in [4]. Its basis leans on paraconsistent annotated logic E [1]. Let us present it briefly. The atomic formulas of the logic E is of the type p(, ), where (, ) [0, 1]2 and [0, 1] is the real unitary interval (p denotes a propositional variable). p(, ) can be intuitively read: It is assumed that ps favorable evidence is and contrary evidence is . Thus, p(1.0, 0.0) can be read as a true proposition. p(0.0, 1.0) can be read as a false proposition. p(1.0, 1.0) can be read as an inconsistent proposition. p(0.0, 0.0) can be read as a paracomplete (unknown) proposition. p(0.5, 0.5) can be read as an indefinite proposition.
We introduce the following concepts (with 0 , 1): Uncertainty Degree: Gun(, ) = + - 1; Certainty Degree: Gce(, ) = - ; An order relation is defined on [0, 1]2: (1, 1) (2, 2) 1 2 and 1 2, constituting a lattice that will be symbolized by . With the uncertainty and certainty degrees we can get the following 12 output states: extreme states that are, False, True, Inconsistent and Paracomplete, and nonextreme states.
Table 1. Extreme and Non-extreme states
Extreme States
True False Inconsistent Paracomplete
Symbol
V F T
Non-extreme states
Quasi-true tending to Inconsistent Quasi-true tending to Paracomplete Quasi-false tending to Inconsistent Quasi-false tending to Paracomplete Quasi-inconsistent tending to True Quasi-inconsistent tending to False Quasi-paracomplete tending to True Quasi-paracomplete tending to False
Symbol
QVT QV QFT QF QTV QTF QV QF
846
Some additional control values are: Vcic = maximum value of uncertainty control = Ftct Vcve = maximum value of certainty control = Ftce Vcpa = minimum value of uncertainty control = -Ftct Vcfa = minimum value of certainty control = -Ftce For the discussion in the present paper we have used: Ftct = Ftce = . All states are represented in the lattice of the next figure.
847
uncertainty degree, Gce = resulting certainty degree, and X = constant of Indefinition. Using the concepts of basic PANC we can obtain the family of PANC considered in this work, as described in Table 2 below:
Table 2. Paraconsistent Artificial Neural Cells
PANC
Analytic connection PANNac Maximization PANNmax Minimization PANNmin Complementation PANNco Decision PANNde
Inputs
, , Ftct,, Ftce , , , Ftct , , Ftde
Calculations
c
Output
= 1 - , Gun, Gce, If |Gce| > Ftce then S1 = r and S2 = 0; If |Gun| > Ftct and |Gun| > | Gce| then S1= r r = (Gce + 1)/2 and S2 = |Gun|, if not S1 = and S2 = 0 none If , then S1 = , if not S1 = none
c
If
r
= 1-
If r > Vlv, then S1 = 1 (V), if r < Vlf, then S1 = 0 (F), if not, constant to be determined by the application
848
4 Practical Results
To test the theory presented here, we develop one computer system with the capability of capturing and converting a speech signal as a vector. After this, we analyze the percentage recognizing results shown below. With these studies we point out some of the most important features of PANN: firstly, the PANN recognition becomes better in every new recognition step, so it is a consequence of discarding contradicting signals and recognizing them by proximity, without trivializing the results. Finally, the performance and efficiency of the PANN is enough to recognize in real time, any speech signal. Now we show how the PANN was efficient in formants recognition. The tests were made in Portuguese and 3 pairs of syllables were chosen FA-VA, PA-BA, CAGA presenting one articulation and differences in sonority (see table 2). The speaker is an adult, masculine sex, 42 years old, Brazilian, from So Paulo city. Table 3 shows the recognizing capability. The recognizing percent in the first column is 100% because the PANN is empty and the syllables are just being learned. The process of recognition is made in the computer system as follows: in the step 2 the speaker says, for instance, the syllable FA. Then the PANN gives an output with
849
the calculations (favorable/contrary evidences, Gce, Gun) and asks to the speaker (operator) if the data is acceptable or not. If the answer is Yes, the PANN keep the parameters for the next recognition. If the answer is Not, the PANN recalculate the parameters in order to criticize the next recognition, till such data becomes belongs to False state (fig. 1), preparing for the next step to repeat the process (in this way, improves the recognition). This is performed by the neural cell PNACde (see table 3):
Steps
Syllable FA VA PA BA CA GA 1 100% 100% 100% 100% 100% 100% 2 87% 82% 83% 85% 82% 84% 3 88% 85% 86% 82% 87% 88% 4 91% 87% 90% 87% 89% 92% 5 90% 88% 88% 90% 88% 90% 6 92% 90% 95% 89% 92% 89% 7 95% 94% 89% 92% 90% 95% 8 94% 92% 94% 95% 94% 95% 9 94% 96% 95% 95% 92% 95% 10 95% 95% 95% 97% 95% 92% Average 91,78% 89,89% 90,56% 90,22% 89,89% 91,11%
Steps
Pairs FA-VA PA-BA CA-GA 1 70% 51% 62% 2 67% 59% 59% 3 72% 49% 61% 4 59% 53% 62% 5 65% 48% 58% 6 71% 52% 59% 7 64% 46% 60% 8 69% 47% 49% 9 66% 52% 63% 10 63% 48% 57% Average 66,60% 50,50% 59,00%
These adjustments are made automatically by the PANN except only the learning steps, which is made the intervention of the operator to feed the Yes/No data. More details are to found in [4]. Thus, from the second column on, PANN is recognizing and learning adjusts simultaneously, as well as adapting such improvements; this is the reason that the recognizing factor increases. In the example, we can see that after the sixth speech step, the PANN is able to recognize efficiently every signal with recognizing factor higher than 88%. Every signal lower than this factor can be considered as unrecognized.
850
Table 4 shows the recognition factor percentage when PANN analyzes a syllable with one different speech articulation. As we can see, when the PANN was learned the FA syllable (10 times) and it is asked the VA syllable for recognizing, the recognizing factor is never higher than 72%. For the remaining pairs of syllables, this factor showed lower.
6 Conclusions
The variation of the analyzed values are interpreted by the PANN and adjusted automatically by the system. Due the PANN structural construction, the network is able to identify small variations between the pairs of syllables chosen. One central reason is its capability of proximity recognition and discarding contradictory data without trivialization. In the examples above, we can define as recognized if the factor is higher than 88%, and non-recognized, if the factor is lower than 72%. The difference of 16% (between recognition and non-recognition) is enough to avoid mistakes in the interpretation of the results. Thus, PANN shows itself as a superior system in being capable to manipulate the factors described showing high accuracy in data analysis. The results presented in this paper show that PANN can be a very efficient structure for speech analysis. Of course, new concepts are necessary for a more complete study of speech production, but this is in course. We hope to say more in forthcoming papers.
References
[1] J. M. Abe, Fundamentos da Lgica Anotada (Foundations of Annotated Logics), in Portuguese, Ph. D. Thesis, University of So Paulo, So Paulo, 1992. [2] J. I. Da Silva Filho & J. M. Abe, Para-Analyzer and Inconsistencies in Control Systems, Proceedings of the IASTED International Conference on Artificial Intelligence and Soft Computing (ASC99), August 9-12, Honolulu, Hawaii, USA, 78-85, 1999. [3] J. I. Da Silva Filho & J. M. Abe, Paraconsistent analyzer module, International Journal of Computing Anticipatory Systems, vol. 9, ISSN 1373-5411, ISBN 2-9600262-1-7, 346352, 2001. [4] J. I. Da Silva Filho & J. M. Abe, Fundamentos das Redes Neurais Paraconsistentes Destacando Aplicaes em Neurocomputao, in Portuguese, Editra Arte & Cincia, ISBN 85-7473-045-9, 247 pp., 2001. [5] A. P. Dempster, Generalization of Bayesian inference, Journal of the Royal Statistical Society, Series B-30, 205-247, 1968. [6] R. Hecht-Nielsen, Neurocomputing. New York, Addison Wesley Pub. Co., 1990. [7] T. Kohonen, Self-Organization and Associative Memory. Springer-Verlag, 1984. [8] B. Kosko, Neural Networks for signal processing. USA, New Jersey, Prentice-Hall, 1992 [9] R. Sylvan & J. M. Abe, On general annotated logics, with an introduction to full accounting logics, Bulletin of Symbolic Logic, 2, 118-119, 1996. [10] L. Fausett, Fundamentals of Neural Networks Architectures, Algorithms and Applications, Prentice-Hall, Englewood Cliffs, 1994. [11] M..J. Russell & J.A. Bilmes, Introduction to the Special Issue on New Computational Paradigms for Acoustic Modeling in Speech Recognition, Computer Speech and Language, 17, 107-112, 2003.