You are on page 1of 9

International Journal of Advances in Science and Technology, Vol. 4, No.

1, 2012

Novel Speech Processing Methodology to Shrink Spectral Masking for Hearing Impaired
Jayant Chopade1, Pravin Dhulekar2, Dr.S.L.Nalbalwar3 and Dr.D.S.Chaudhari4
1,2

Department of Electronics and Telecommunication, SNJBs College of Engineering, Chandwad, Maharashtra, India 1 jchopade@yahoo.com 2pravindhulekar@gmail.com

Department of Electronics and Telecommunication, Dr. B. A. T. University, Lonere, Maharashtra, India 3 slnalbalwar@dbatu.ac.in

Department of Electronics and Telecommunication, Govt. College of Engg., Jalgaon, Maharashtra, India 4 ddsscc@yahoo.com

Abstract
Auditory masking occurs when the perception of one sound is affected by the presence of another sound like noise or unwanted sound of the same duration as the original sound. Earlier studies have shown that binaural dichotic presentation, using critical bandwidth based spectral splitting with perceptually balanced comb filters, helps in reducing the effect of auditory masking for persons with moderate bilateral sensorineural hearing impairment. In the present study a spectral splitting of speech signals is done by using modified wavelet packets which is combination of discrete wavelet transform for one level of decomposition and wavelet packets for the second level of decomposition and presented dichotically that is odd bands of frequencies are given to right ear and even bands to left ear simultaneously, which shown the significant reduction in auditory masking compared to earlier methods. The performance of the proposed method is experimentally evaluated with speech signals of vowel-consonant-vowel syllables for fifteen English consonants.

Keywords: Auditory Masking, Binaural Dichotic Presentation, Sensorineural Hearing Impairment, Modified Wavelet Packets, Cochlea. 1. Introduction
If two sounds of two different frequencies (pitches) are played at the same time, two separate sounds can often be heard rather than a combination tone. This is otherwise known as frequency resolution or frequency selectivity. This is thought to occur due to filtering within the cochlea, also known as critical bandwidths, in the hearing organ of inner ear. A complex sound is split into different frequency components and these components cause a peak in the pattern of vibration at a specific place on the cilia inside the basilar membrane within the cochlea. These components are then coded independently on the auditory nerve which transmits sound information to the brain. This individual coding only occurs if the frequency components are different enough in frequency, otherwise they are coded at the same place and are perceived as one sound instead of two [1]. The auditory masking is categorized depending upon the occurrence of masker, one being nonsimultaneous masking, which occurs when the signal and masker are not presented at the same time. This can be split into forward masking and backward masking. Forward masking is when the masker is presented first and the signal follows it. Backward masking is when the signal precedes the masker; while the other is Simultaneous masking is a frequency-domain version of temporal masking, and tends to occur in sounds with similar frequencies, in masking a sound is made inaudible by a masker, a noise or unwanted sound of the same duration as the original sound [1]. The greatest masking is when the masker and the signal are the same frequency and these decreases as the signal frequency moves further away from the masker frequency. This phenomenon is called on-frequency masking and occurs because the masker and signal are within the same auditory filter. The simultaneous masking reduces

January

Page 15 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012 the frequency resolution significantly, so it is more severe compared to the non-simultaneous masking. The auditory masking occurs because the original neural activity caused by the first signal is reduced by the neural activity of the other sound [2]. The objective of our investigation is to split the speech signals with help of modified wavelet packet to form complementary bands that are dichotically presented(presenting two different signals to the two ears is referred to as dichotic presentation) which will considerably solve the problem of auditory masking compared to the earlier methods [3]. The discrete wavelet transform divide the signal spectrum into frequency bands that are narrow in the lower frequencies and wide in the higher frequencies. This limits how wavelet coefficients in the upper half of the signal spectrum are classified. Wavelet packets divide the signal spectrum into frequency bands that are evenly spaced and have equal bandwidth and will be explored for use in identifying transient and quasisteady-state speech [4].The processing schemes were developed as spectral splitting with modified wavelets packets based on ten frequency bands as the performance by hearing-impaired subjects saturated around eight channels, while performance by normal-hearing subjects sustained to 1216 channels in higher background noise [5]. Three different Simulink models were developed based on modified wavelet packet with Daubechies, Symlets and Biorthogonal wavelet functions. The inverse wavelet packet transform was used to synthesize speech components from the wavelet packet representation. To synthesize the speech component, wavelet coefficients were used. Table 1 shows the frequency order nodes that correspond to natural order for decomposition levels of 0 to 2, whereas Table 2 and Table 3 shows frequency bands based on quasi-octave. Table 1. Frequency ordered terminal nodes for depths 0 to 2.

7 f0

8 f1

9 f2

10 f3

11 f4

12 f5

13 f6

14 f7

15 f8

16 f9

Frequency Table 2. Ten frequency bands for spectral splitting with compression(For left ear). Filter for left ear Band
1 3 5 7 9

Centre frequency (kHz)


0.078125 0.390625 0.78125 1.5625 3.125

Pass band frequency (kHz)


0-0.15625 0.3125-0.46875 0.625-0.9375 1.250-1.875 2.500-3.75

January

Page 16 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012 Table 3. Ten frequency bands for spectral splitting with compression(For right ear). Filter for right ear Centre frequency kHz Pass band frequency kHz
0.234375 0.546875 1.09375 2.1875 4.375 0.15625-0.3125 0.46875-0.625 0.9375-1.25 1.875-2.5 3.75-5

Band
2 4 6 8 10

During the process of frequency transformation, as poles were changed, compression was achieved, and useful to the hearing impaired having high frequency impairment and changes in the acoustic attributes such as the averaged power spectrum and formant transitions were observed [6].

2. Materials and methods


2.1 The speech material

Earlier studies have used CV, VC, CVC, and VCV syllables. It has been reported earlier that greater masking takes place in intervocalic consonants due to the presence of vowels on both sides [7]. Since our primary objective is to study improvement in consonantal identification due to reduction in the effect of masking, so VCV syllables are used. For the evaluation of the speech processing strategies, a set of fifteen nonsense syllables in VCV context with consonants / p, b, t, d, k, g, m, n, s, z, f, v, r, l, y / and vowel /a/ as in farmer were used. The features selected for study were voicing (voiced: / b d g m n z v r l y / and unvoiced: / p t k s f /), place (front: / p b m f v /, middle: / t d n s z r l /, and back: / k g y /), manner (oral stop: / p b t d k g l y /, fricative: / s z f v r /, and nasals: / m n /), nasality (oral: / p b t d k g s z f v r l y /, nasal: /m n /), frication (stop: / p b t d k g m n l y /, fricative: / s z f v r /), and duration (short: / p b t d k g m n f v l / and long: /s z r y /).

2.2

The Speech processing strategies

For many signals, the low-frequency content is the most important part. It is what gives the signal its identity. The high-frequency content, on the other hand, imparts flavor or nuance. Consider the human voice. If you remove the high-frequency components, the voice sounds different, but you can still tell what's being said. However, if you remove enough of the low-frequency components, you hear gibberish. In basic filtering process, the original signal passes through two complementary filters and emerges as two signals. Unfortunately, if we actually perform this operation on a real digital signal, we wind up with twice as much data as we started with. Suppose, for instance, the original signal consists of 1000 samples of data. Then the resulting signals will each have 1000 samples, for a total of 2000.These signals A and D are interesting, but we get 2000 values instead of the 1000.

2.3

Multilevel decomposition by DWT

There exists a more subtle way to perform the decomposition using wavelets. By looking carefully at the computation, we may keep only one point out of two in each of the two 2000-length samples to get the complete information. This is the notion of down sampling. We produce two sequences called cA and cD, which includes down sampling, produces DWT coefficients [8].The decomposition process can be iterated, with successive approximations being decomposed in turn, so that one signal is broken

January

Page 17 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012 down into many lower resolution components. This is called the wavelet decomposition tree. Fig. 1 shows multilevel decomposition by DWT up to level 3.

Figure 1. Multilevel Decomposition by DWT up to Level 3.

2.4

Multiple level decomposition by wavelet packets

The problem with DWT is that it splits only approximations at each decomposition level while in wavelet packet analysis, the details as well as the approximations can be split also approximation coefficient and detail coefficient for both low and high frequency are found out. Fig. 2 shows multilevel decomposition by Wavelet Packets up to level 3.

Figure 2. Multilevel Decomposition by Wavelet Packets up to Level 3. Wavelet packet method is a generalization of wavelet decomposition that offers a richer range of possibilities for signal analysis. In wavelet analysis, a signal is split into an approximation and a detail. The approximation is then itself split into a second-level approximation and detail, and the process is repeated. For n-level decomposition, there are n+1 possible ways to decompose or encode the signal. This yields more than 22^ (n-1) different ways to encode the signal [9].

2.5

Modified wavelet packets

The discrete wavelets transform results in a logarithmic frequency resolution. High frequencies have wide bandwidth whereas low frequencies have narrow bandwidth. Wavelet Packets allow for the segmentation of the higher frequencies into narrower bands. Wavelet packets are efficient tools for speech analysis, involve using two-band splitting of the input signal by means of filtering and downsampling at each decomposition level. Designing the wavelet packets filterbank involves choosing the decomposition tree and then selecting the filters for each decomposition level of the tree.

January

Page 18 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012 For each decomposition level, there is a different time-frequency resolution. Once the decomposition tree has been selected, the next step involves selecting an appropriate wavelet filter for each decomposition level of the tree. Discrete wavelet transform for one level of decomposition and wavelet packets for the second level of decomposition referred as modified wavelet packets is used for investigation. Down-sampling and up-sampling are used with the wavelet based filter banks to exploit the spectral properties such as energy levels and perceptual importance. The signal is transform so that the power spectrum tends to concentrated into a few bands. MatLab software used to implement the wavelet packet based algorithm uses a natural order index to label nodes. Modified wavelet packet ware developed of different wavelets, using MatLab Simulink software. Three different Simulink models were developed based on modified wavelet packet with DB, Sym and Biorthogonal wavelet functions. Daubechies and Symlets wavelets are orthogonal wavelets that have the highest number of vanishing moments for a givens support width. Symlets are compactly supported wavelets with least asymmetry. Compactly supported biorthogonal wavelets for which symmetry and exact reconstruction are possible with FIR filters. The wavelet Filter Bank is a filter bank that offers a great deal of flexibility in terms of the choice of the basis filter and the decomposition tree structure. The standard DWT involves a dyadic tree structure in which the low-channel side is successively split down to a certain depth. We obtain the detail coefficients from the right-leaf node of each level and the approximation coefficients from the left-leaf node at the lowest level. WP filter bank involves choosing the decomposition tree and then selecting the filters for each decomposition level of the tree. For each decomposition level, there is a different time-frequency resolution. During the process of transformation, compression is achieved. A wavelet packet tree for a decomposition depth of 2 generated using the natural order index labeling of MatLab was presented in Figure 3, where the nodes represent the wavelet coefficients (at various decomposition stages) and the left and right branches represent the low- and high-pass filtering operations, respectively.

Figure 3. Decomposition tree for modified wavelet packets.

3. Experimental results
In this analysis we have decomposed fifteen nonsense syllables in VCV context with consonants / p, b, t, d, k, g, m, n, s, z, f, v, r, l, y / and vowel /a/ as in farmer, using modified wavelet packets with different wavelets like Daubechies, Symlets and Biorthogonal. In following figures we have shown the result of a consonant /z/ among the fifteen consonants. Fig. 4 shows the Odd bands (1, 3, 5, 7 and 9) obtained after decomposing VCV context /aza/ by bior2.4, while Fig. 5 shows the even bands (2, 4, 6, 8 and 10) of VCV context /aza/ by bior2.4.

January

Page 19 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012

Figure 4. Odd bands (1, 3, 5, 7 and 9) obtained after decomposing VCV context /aza/ by bior2.4.

Figure 5. Even bands (2, 4, 6, 8 and 10) obtained after decomposing VCV context /aza/ by bior2.4. After reconstructing odd bands we get the signal as shown in fig. 6 which is given to the right ear while the fig.7 shows the reconstruction of even bands which is given to the left ear. The signals shown in fig.6 and fig.7 are dichotically presented to the right and left ear respectively.

January

Page 20 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012

Figure 6. Reconstructed signal from odd bands for Right ear.

Figure 7. Reconstructed signal from even bands for Left ear.

4. Results and discussion


In speech perception, the information received from both the ears gets integrated. Hence splitting of speech signal into two complementary signals such that signal components likely to mask each other get presented to the different ears can be used for reducing the effect of increased masking [10]. This technique can be used for improving speech reception by persons with moderate bilateral sensorineural impairment, i.e. residual hearing in both the ears [11]. For reducing the effect of increased spectral or simultaneous masking, the scheme based on spectral splitting of speech signal, by using comb filters with complementary pass bands earlier. Splitting the speech signal and compressing the frequency bands is a possible solution to reduce the effect of increased masking. In this investigation, a processing scheme of modified wavelet packets has been investigated to split the speech signal into two complementary signals for binaural dichotic presentation. Combining the scheme of spectral splitting with compression for binaural dichotic presentation may help in improving the perception of various consonantal features by decreasing the effects of both types of masking i.e. simultaneous and non-simultaneous masking. In this study, a processing scheme of speech signal, by using modified wavelet packets with different types of wavelets with different orders has been used to decompose the speech signal for binaural dichotic presentation. The speech signals were decomposed at various levels to get low frequency and high frequency signal components. For each decomposition level, there is a different time-frequency resolution. Once the decomposition tree has been selected, the next step involves

January

Page 21 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012 selecting an appropriate wavelet type depending on orthogonally, symmetry such as Daubechies, Symlets and Biorthogonal wavelet functions. After combining the reconstructed signals of left ear and right ear, we get the reconstructed signal as shown in fig.8 Which can be used for diotic presentation (Presenting same signal to both the ears is known as diotic presentation) also here we have obtained least perfect reconstruction error compared to discrete wavelet transform and general wavelet packets. The smallest perfect reconstruction is 5.3640e15 which is obtained by using modified wavelet packets with bior 2.4.

Figure 8. Reconstructed signal for diotic presentation. Table 4 shows the perfect reconstruction error by Discrete Wavelet Transform, Wavelet Packets and Modified Wavelet Packets with wavelets like Daubechies, Biorthogonal and Symlets. Table 4. Perfect reconstruction error. Type of Wavelet Discrete Wavelet Transform Wavelet Packet Modified Wavelet Packet
7.3227e-15 5.3640e-15 6.6128e-12

DB Bior Sym

2.8039e-11 4.6052e-11 7.6677e-12

2.6199e-011 4.1728e-011 7.3185e-012

The above result shows that Modified Wavelet Packets provide best reconstruction with least error and helps to reduce the auditory masking and improves the auditory perception better than traditional filter-bank spectral strategies.

January

Page 22 of 101

ISSN 2229 5216

International Journal of Advances in Science and Technology, Vol. 4, No.1, 2012

5. References
[1] B. C. J. Moore. An Introduction to Psychology of Hearing. London: Academic, 4th ed. 1997, pp. 89-140. [2] L. R. Rabiner and R. W. Schafer. Digital Processing of the Speech Signals, Englewood Cliffs, NJ: Printice Hall, 1978, pp.35-53. [3] P.A.Dhulekar, J.J.Chopade and Dr.S.L.Nalbalwar.Wavelet Packet Analysis for reducing the effect of spectral masking to improve auditory perception, 3rd International conference on Electronics Computer Technology (ICECT), Kanyakumari, India, Apr 2011,vol.2,pp.409-413. [4] P.N.Kulkarni and P.C. Pandey. Optimizing the comb filters for spectral splitting of speech to reduce the effect of spectral masking, in Proc. International conference on signal processing, Communication and Networking, Chennai, India, Jan 2008 pp.69-73. [5] S. Mallat. A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. on Patt. Anal. Machine Intell, 1989, vol. 11(2), pp. 674-694. [6] Baskent and Deniz. Speech recognition in normal hearing and sensorineural hearing loss as a function of the number of spectral channels, J. Acoust. Soc. Am., 2006, vol. 120(5), pp. 29082925. [7] T. Arai, K. Yasu, and N. Hodoshima. Effective speech processing for various impaired listeners, Proc. 18th International Congress Acoustics (ICA), 2004, pp. 1389 - 1392. [8] J. R. Dubno and A. B. Schaefer. Comparison of frequency selectivity and consonant recognition among hearing-impaired and masked normal-hearing listeners, J. Acoust. Soc. Am., 1992, vol. 91(4), pp. 21102121. [9] I. Cheikhrouhou, R.B. Atitallah, K. Ouni, A.B. Hamida, N. Mamoudi and N. Ellouze. Speech analysis using wavelet transforms dedicated to cochlear prosthesis stimulation strategy, 1st Intern. Symp. On Control, Communications and Signal Processing, 2004, pp. 639642. [10] G. Tognola, F. Grandori and P. Ravazzani. Wavelet analysis of click evoked otoacoustic emissions, IEEE Trans. Biomed. Eng., 1998, vol. 45, pp. 686-697. [11] D. S. Chaudhari and P. C. Pandey. Dichotic presentation of speech signal using critical filter bank for bilateral sensorineural hearing impaired. Proceedings of 16th International Conf. on Acoustics, Seattle, Washington, 1998, vol. 1, pp. 213-214.

January

Page 23 of 101

ISSN 2229 5216

You might also like