Professional Documents
Culture Documents
Speech and Speaker Recognition System using Artificial Neural Networks and
Hidden Markov Model
* Corresponding author
Abstract— Aiming towards automatic machine learning by spectrum analysis depicted in figure 1 to machine voice
human, a methodology for speech recognition with speaker spectrum deprecated in figure 3 for speech recognition.
identification based on Hidden Markov Model for security is a
demand of science. Inspiring by the same, we propose a
methodology to identify speaker and detection of speech.
Within our research acquisition of speech signal, analysis of
spectrogram, neutralization, extraction of features for
recognition, mapping of speech using Artificial Neural
networks is presented. In our investigation such a method of
mapping is realized using back propagation rules of neural Fig. 1. Sample Input Spectrogram
networks. This algorithm is especially suitable for huge set of
input and output speech mapping. Additionally recognition of
speaker using Hidden Markov Model also will be presented in
this paper.
Noise
x'(t) = STFT{x(t)} ≡ X(τ, ω) = ³ x(t) ω(t −τ)e− jωt dt (3)
Reduction t =0
D. Reduction of Noise: Most of the electronic
Normalization recording equipment has an effect of noise on the recorded
sound signal. But when it is to be processed for speech
recognition, then the minimum noise also can weigh down
Recognition using ANN Speaker Recognition
the processes of neural networks during training and
processing. Hence we tried to reduce the noise before
processing the signal for ANN. The mathematical
representation is as
Output Generation
B
x '(t )
Fig. 4. Process Flow x ''(t ) = ³ log 2 (1+ )df (4)
0
n(t )
III. PROPOSED METHODOLOGY
Where, B is the bandwidth of the channel. After the
A. Strategy: The algorithm involves the acquisition of reduction of noise the signal is stable and ready to be
the speech signal, processing, matching with the training normalized,
312
313
308
∝ n
The input to this model is the processed speech
³ ¦AK(t)cos(ωi ( t) t +ϕi ( t)) ω(t −τ)e dt
− jωt
B
signal and output is the decision. The intermediate
hidden layers will adjust the weight for each node.
x''(t) =³log2[1+ −∝
t=0
]df During training the network is expandable, hence not
0
n(t) only the new data, new parameters also can be stored.
(5) X’’’ is considered as the input to the neural network,
where X’’’ is defined as
E. Normalization of Signal: Normalization of the n
sound signal is really effective when the signal need to X ''' = ¦ x '''i (8)
i =1
spectrogram analysis before recognition process.
Normalization of a sound signal can be peak normalization Combining the Eq. (7) and (8)
or loudness normalization. As our process is focused on
power spectrums, hence we choose the loudness ∝ n
B K i i
dt
n
1
¦ ³
t =0
Y = f ( [ [ log2 (1+ −∝
2
)df ] i ].WWW
I
. J. K)
i =1 n0 n(t)
(17)
Combing the Eq. (10), (11), (12), (13) and (17)
∝ n
313
314
309
, Which makes the final formulation for speech spectrogram analyzer fully meets these requirements with
recognition. 3.6/8/26/43/46/50/67 GHz rate, Displayed average noise
G. Recognition of Speaker using HMM: Hidden level: –152 dBm at 2 GHz; –148 dBm at 26 GHz (1 Hz
Markov Model (HMM) is a powerful statistical tool for bandwidth) and Typ. 77 dB ACLR for 3GPP, typ. 84 dB
modeling generative sequences that can be characterized by with noise correction. For the recording we have used
general purpose hardware for storage.
an underlying process generating an observable sequence.
HMMs have found application in many areas interested in
signal processing, and in particular speech processing,
phrase chunking, and extracting target information from
documents. Here we propose a HMM for recognition of
speaker [Fig 6]. The input of the Hidden Markov Model is
the data rejected by ANN.
Activation Acceptance
Processing
Parameters
Frequency
Rejected
Time
Fig. 8. Analysis of Spectrogram (Dataset 2)
314
315
310
12Hz to 586Hz with five different dataset generated from ACKNOWLEDGMENT
five different speaker. The input datasets are consisting of The parts of this research are supported by Prof. P.
intensity values from spectrum analysis of the speech signal. Rammohan Rao, Mr. L. Naveen Kumar, Mr. K. Sandeep
During the spectrogram analysis amplitude extraction of the and Prof. N. Subba Reddy.
speech signal is done, which is considered as the input to the
neural network. Before considering the amplitudes we REFERENCES
processed the signals for neutralization. As the effect of [1] C. T. Chen, W. D. Chang, «A Feedforward Neural
noise is reduced, we got the speech signal ranging from 0dB Network with Function Shape Autotuning», Neural
to 140dB. Still this signal is not ready to be process by the Networks, Vol.9, No 4, pp. 627-641, June 1996.
artificial neural network as the signal is not yet normalized [2] F. Piazza, A. Uncini, M. Zenobi, «Artificial Neural
as contains multiple peek variation due to syllable and Networks with Adaptive Polynomial Activation Function»,
accent effect. Therefore, we neutralize the speech signal
in Proceedings of IJCNN, Beijing, Cina, pp. II-343-349,
hence the long spectra got feasible to study and silent
Nov. 1992.
spectra got remarkably separable from the long spectra,
[3] F. Piazza, A. Uncini, M. Zenobi, «Neural Networks with
which makes the speech signal ready to be processed with
artificial neural networks. The ANN will recognize the Digital LUT Activation Function», in Proceedings of
speech signals based on the predefined acoustics IJCNN, Nagoya, Japan, pp. II-1401-1404, 1993.
parameters. The eight acoustic features that were used [4] J-N Hwang, S-R Lay, M Maechler, R.D. Martin, J.
included the four formants F1 through F4, the spectral Schimert, «Regression Modeling in Back-Propagation and
slope, harmonic difference H1-H2 and the aperiodicity & Projection Pursuit Learning» IEEE Transactions on Neural
periodicity contents in the speech signal. Moreover during Networks, 5(2), 342-353.
training the nodes are expandable to accommodate the new [5] S. Guarnieri, F. Piazza, A. Uncini, «Multilayer Neural
features. The use of hidden markov model finally decides Networks with Adaptive Spline-based Activation
the rejection or addition of new parameter in hidden layers. Functions» In Proceedings of the WCNN 95, Washington
The calculation of the cut-off value is pre-decided and in the D.C., USA, I695-I699, 1995.
processing phase, the calculation with the input data will be [6] L. Vecci, P. Campolucci, F. Piazza, A. Uncini,
done. During processing, the dataset will be justified with «Approximation Capabilities of Adaptive Spline Neural
cut-off value and if not rejected then the analysis of the Networks» In Proceedings of ICNN’97 Huston TX 96,
frequency, time and amplitude will be done and extraction ,1997.
of new feature will be calculated. [7] L. Vecci, F. Piazza and A. Uncini, «Learning and
In the last decade, it has continued with the EARS Approximation Capabilities of Adaptive Spline Activation
project, which undertook recognition of Mandarin and Function Neural Networks», accepted for publication in
Arabic in addition to English, and the GALE project, which
Neural Networks.
focused solely on Mandarin and Arabic and required
[8] E. Catmull, R. Rom, «A Class of Local Interpolating
translation simultaneously with speech recognition. Still
Splines», in R. E. Barnhill, R. F. Riesenfeld (ed.), Computer
those systems are not free from errors as most of times
theoretical algorithms are applied. We have produced Aided Geometric Design, Academic Press, NewYork, 1974,
system for testing on word “Hello” for five different pp. 317-326.
speakers [Table 1] and matched with the system voice. In [9] N. Benvenuto, M. Marchesi, F. Piazza and A. Uncini «A
the analysis of acceptance report [Fig.9] we have noticed Comparison between Real and Complex valued Neural
that more than eighty percent of the cases, without much Networks in Communication Applications», Proc. of Intern.
training, the data are accepted. In remaining fifteen percent Conference on Neural Networks, ICANN91, Espoo,
cases it need to extract new parameter for training and Finland, June 1991.
recognition. [10] H. Leung, S. Haykin, «The Complex Backpropagation
Algorithm», IEEE Trans Acoust. Speech and Signal
VI. CONCLUSION Process, Vol.ASSP- 39, pp.2101-2104, Sept. 1991.
We presented the application of artificial neural
networks and hidden markov model for speech and speaker
reorganization. The work is majorly focused to acquisition
of speech signal, analysis of spectrogram, neutralization,
extraction of features for recognition, mapping of speech
using Artificial Neural networks. Moreover additionally
recognition of speaker using Hidden Markov Model also
made into this work, which generates the new features for
recognition. This work will be generalized in future for
human lead machine learning.
315
316
311