You are on page 1of 5

JOURNAL OF TELECOMMUNICATIONS, VOLUME 13, ISSUE 1, MARCH 2012 19

Role of Haar and Daubechies Wavelet in Bangla Vowel Processing


S. Haque, A. U. Khan
Abstract Wavelet Transform (WT) is applied to the seven Bangla vowel /i/, /e/, //, /a/, / /, /o/, /u/ samples for analysis and synthesis. The performance of WT for synthesizing the selected Bangla vowels is measured by calculating Normalized Root Mean Square Error (NRMSE) between the original and synthesized signal and by calculating Retained Energy (RE) in the reconstructed waveform. It is observed from our study that WT with Haar, Daubechies wavelet at decomposition level 5 reproduces the Bangla vowel signal with a very small NRMSE and large RE. Among the tested wavelets, use of Haar wavelet produced NRMSE in the order of 10-15 and 92% RE, Daubechies wavelet produced NRMSE in the order of 10-11 and 98% RE. Although Daubechies wavelet produced NRMSE larger than Haar but RE in the first few coefficient obtained by Daubechies wavelet is much larger than Haar. Therefore, the performance of Daubechies wavelet is better than Haar wavelet for Bangla vowel reconstruction. Index TermsWavelet Transform, Bangla vowels, Haar, Daubechies

1 INTRODUCTION

peech analysis systems generally carry out analysis which is usually obtained via time-frequency representations such as Short Time Fourier Transforms (STFTs) or Linear Pre-dictive Coding (LPC) techniques. In some respects, these methods may not be suitable for representing speech; as they assume signal stationary within a given time frame and may therefore lack the ability to analyze localized events accurately. Furthermore, the LPC approach assumes a particular linear (allpole) model of speech production which strictly speaking is not the case. The main disadvantage of a Fourier expansion however, is that it has only frequency resolution and no time resolution [1]. This means that although all the frequencies present in a signal can be determined, the presence of disturbances in time is not known. Analysis is the process of breaking a complex signal into smaller parts to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle, though analysis as a formal concept is a relatively recent development. [2] In general, synthesis refers to a combination of two or more entities those together forms something new [3]. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into

speech.

Fig. 1: Analysis of a wave using Fourier Transform and Wavelet transform

In different languages, WT has been used for analyzing various speech corpora e.g. speech analysis, pitch detection, recognition, speech synthesis, speech segmentation [4],[5],[6] etc. But few works [7] have been reported on Bangla phoneme synthesis using WT. As a first phase of study on Bangla speech processing we selected the Bangla vowels in isolated utterance for the purpose of analysis and synthesis. The objective is to obtain better accuracy in speech processing using WT.

2 WT AND SPEECH SIGNAL PROCESSING


Fourier analysis consists of breaking up a signal into sine waves of various frequencies. Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original or mother wavelet.

S.H. Author is with the Electronics and Telecommunication Engineering Department, Daffodil International University, Bangladesh. A.U.K. is with the BanglaLion Communications Company, Bangladesh.

2012 JOT www.journaloftelecommunications.co.uk

20

A wavelet is a waveform of effectively limited duration that has an average value of zero. The results of the WT are many wavelet coefficients which are a function of scale and position.The main purpose of WT is to decompose arbitrary signals into localized contributions that can be labeled by a scale parameter. If we compare wavelets with sine waves, which are the basis of Fourier analysis. Sinusoids do not have limited duration they extend from minus to plus infinity. And where sinusoids are smooth and predictable, wavelets tend to be irregular and asymmetric as shown in Fig.1. From the pictures of wavelets and sine waves, it can be observed that signals with sharp changes might be better analyzed with an irregular wavelet than with a smooth sinusoid, just as some foods are better handled with a fork than a spoon. Furthermore, because it affords a different view of data than those presented by traditional techniques, it can compress or de-noise a signal without appreciable degradation [8]. Wavelet Decomposition and Reconstruction The discrete WT can be used to analyze, or decompose signals. This process is called decomposition or analysis. The other half of the story is how those components can be assembled back into the original signal without loss of information. This process is called reconstruction, or synthesis [8]. The mathematical manipulation that effects synthesis is called the inverse discrete wavelet transforms (IDWT). In mathematics, the Haar wavelet is a certain sequence of rescaled "square-shaped" functions which together form a wavelet family or basis [8]. Wavelet analysis is similar to Fourier analysis in that it allows a target function over an interval to be represented in terms of an orthonormal function basis. The Haar sequence is now recognized as the first known wavelet basis and extensively used as a teaching example in the theory of wavelets. Wavelet analysis involves filtering and down sampling, the wavelet reconstruction process consists of up sampling and filtering. Up sampling is the process of lengthening a signal component by inserting zeros between samples. Reconstruction Filters The filtering part of the reconstruction process also bears some discussion, because it is the choice of filters that is crucial in achieving perfect reconstruction of the original signal. Fig. 2 shows the process of decomposing and reconstructing a signal using WT. The different types of wavelets are Haar, Daubechies, Biorthogonal, Coiflets, Morlet, Symmlet. The downsampling of the signal components performed during the decomposition phase introduces a distortion called aliasing. It turns out that by carefully choosing filters for the decomposition and reconstruction

Fig. 2: Decomposition and Reconstruction procedure using Wavelet transform

Fig. 3: Haar and Daubechies Wavelet

phases that are closely related (but not identical), we can cancel out" the effects of aliasing. The low- and high-pass decomposition filters (L and H), together with their associated reconstruction filters (L' and H'), form a system of what is called quadrature mirror filters.

21

We have seen that it is possible to reconstruct our original signal from the coefficients of the approximations and details. It is also possible to reconstruct the approximations and details themselves from their coefficient vectors. As an example, let's consider how we would reconstruct the first-level approximation A1 from the coefficient vector cA1. We pass the coefficient vector cA1 through the same process we used to reconstruct the original signal. However, instead of combining it with the level-one detail cD1, we feed in a vector of zeros in place of the detail coefficients vector: The process yields a reconstructed approximation A1, which has the same length as the original signal S and which is a real approximation of it. Similarly, we can reconstruct the first-level detail D1, using the analogous process: The reconstructed details and approximations are true constituents of the original signal. In fact, we find when we combine them that A1 + D1 = S The coefficient vectors cA1 and cD1 because they were produced by downsampling and are only half the length of the original signal, cannot directly be combined to reproduce the signal. It is necessary to reconstruct the approximations and details before combining them. Extending this technique to the components of a multilevel analysis, we find that similar relationships hold for all the reconstructed signal constituents. That is, there are several ways to reassemble the original signal. Waveform of Haar and Daubechies wavelet is shown in Fig.3. In our work we used only Haar and Daubechies wavelet for decomposition and reconstruction of the selected Bangla phonemes.

Fig. 4: Block diagram of working procedure of analysis and synthesis of our selected vowel phonemes.

3 SPEECH MATERIAL
In the speech analysis scenario, first we need to collect the speech signal. Isolated Bangla vowels /i/, /e/, //, /a/, / /, /o/, /u/ were uttered three times by a native Bangla male speaker in a quiet room. We recorded the Bangla vowel signal at a frequency rate of 48 KHz in a stereo sound system. Then we down sample it to 10 KHz in mono system. Then the speech samples were normalized and were ready to be used for our work.

4 WORKING PROCEDURE
Each of the Bangla vowel signal was decomposed using Haar and Daubechies 4 wavelets at decomposition level 5 and the approximation and detail coefficients were calculated and stored for using in the reconstruction section as described in section 3. Using the calculated wavelet coefficients we reconstructed the vowels again using the chosen wavelets. We repeated these steps for decomposing wavelets. We repeated these steps for decomposing and reconstructing each of the 7 Bangla vowels. The working process is described in Fig, 4. After reconstructing the

Fig. 5: Waveform of Original, Reconstructed, Approximations, Details at 5 different scales and RE for vowel /i/ using Daubechies 4 wavelet

selected Bangla phonemes performance of the selected wavelets were measured using Eq. 1 and Eq.2.

22

5 PERFORMANCE MEASUREMENT OF THE SYNTHESIZED BANGLA VOWEL SIGNAL


We measured the performance of the synthesized Bangla vowel signal using two parameters, NRMSE and RE as given by Eq. 1 and Eq. 2.

In Eq.2

is the norm of the original signal and

is the norm of the reconstructed signal. For onedimensional orthogonal wavelets the retained energy is equal to the L2-norm recovery performance.

5 RESULT AND DISCUSSION


1. Normalised Root Mean Square Error (NRMSE) is calculated using Eq. 1.
In this section, we discuss the performance of the wavelets for reconstructing or synthesizing signal. We calculate the NRMSE and RE between the original and the reconstructed vowel at decomposition levels 5. Fig 5 shows a sample speech signal /i/ and approximations of the signal, at five different scales. These approximations are reconstructed from the coarse low frequency coefficients in the WT vector. This figure shows that the original speech data is still well represented by the level 5 approximation. The NRMSE of the reconstructed vowel waveform is calculated for all the seven vowels of Bangla and is found to be in the order of 10-11 or less. The RE of the first few coefficients of the WT is found to be more than 92%. It may be said that the reconstructed vowel waveform obtained by WT is almost similar to the original waveform. Therefore, we may say that WT preserves the important speech information with few parameters. The calculated the NRMSE and RE for all the vowels and plotted graphically as shown in Fig. 6.

(1) In Eq. 1, x(n) is the speech signal, r(n) is the reconstructed signal, and x(n) is the mean of the speech signal.

1. Retained Energy(RE) in First N/2 wavelet coefficients is given by Eq. 2.

(2)

6 CONCLUSION
This work deals with the study of Bangla vowel decomposition and reconstruction which is the basis of Bangla speech processing. We presented WT techniques and details of how to use them for Bangla vowel phoneme analysis and synthesis. Analyzing a signal by Haar and Daubechies wavelet at decomposition level 5 and reconstructing the signal make a scheme of analysis and synthesis. The analysis and synthesis was done by using Haar and Daubechies wavelet at decomposition level 5. Among the tested wavelets, use of Haar wavelet produced NRMSE in the order of 10-15 and 92% RE, Daubechies wavelet produced NRMSE in the order of 10-11 and 98% RE. Although Daubechies wavelet produced NRMSE larger than Haar but RE in the first few coefficient obtained by Daubechies wavelet is much larger than Haar. Therefore, the performance of Daubechies wavelet is better than Haar wavelet for Bangla vowel reconstruction.

Fig. 6: NRMSE and RE obtained by using Haar and Daubechies wavelet with WT for reconstructing the seven Bangla vowels

23

REFERENCES
[1] R. Polikar, The Wavelet Tutorial, 136,Rowan Hall, Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ 08028. June 1996. http://plato.stanford.edu/entries/analysis/ http://en.wikipedia.org/wiki/Synthesis I. Agbinya, Discrete Wavelet Transform Techniques in Speech Processing, IEEE Tencon Digital Signal Processing Applications Proceedings, IEEE, New York, NY, 1996, pp 514-519. S. Kadambe, and Boudreaux-Bartels, G.F., 1992, Applications of the Wavelet Transform for Speech Detection, IEEE Trans., on Information Theory, Vol.-38, no.2, pp 917-954. O. Farooq, S. Datta, Phoneme recognition using wavelet based features, Journal of Information Sciences Informatics and Computer Science, Vol-150, Issue 1-2, March 2003 Using Wavelet Transform for Bangla Phoneme Synthesis, S. Haque, S. A. Hossain, M.A. Sobhan, Proceedings of International Conference on Computer Processing of Bangla, 19th February, 2011, Dhaka, Bangladesh of Computing. http://www.mathworks.com/help/toolbox/

[2] [3] [4]

[5]

[6]

[7]

[8]

S. Haque received her B.Sc. and M.Sc. degree in Applied physics and electronics from Rajshahi University, Bangladesh. She joined Bangladesh Atomic Energy Commission as a scientific officer, in 1999. Since 1999, she was affiliated with the Department of Computer Science and Technology, Islamic University, Kushtia, Bangladesh. She is continuing her Ph.D. in Graduate School of Engineering and Science at the University of the Ryu-kyus, Okinawa, Japan. Since 2008 she is with the Daffodil International University, Shukrabad, Dhanmondi, Dhaka, Bangladesh. Her current research interest is speech, image and Bio-medical signal processing. A.U. Khan received his B.Sc. Engineering degree in Electronics and Telecommunication Engineering from Daffodil International University, Bangladesh in 2011.Then he joined in Telnet Communication Company as an Engineer in 2011. He is working as an Engineer in the core network of BanglaLion Communication since February, 2012.

You might also like