Harmonic Enhancement with Noise Reduction of

Speech Signal by Comb Filtering

Yu Cai1,2, Jianping Yuan1, Chaohuan Hou1, Jun Yang1,2, Bian Wu3

1. Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
2. Graduate University of Chinese Academy of Sciences, Beijing, China
3. School of Electronics and Information Engineering, Sichuan University, Chengdu, China

Abstract—As single channel speech enhancement, common me- monic regeneration method is processed in frequency domain
thods based on short-time spectral amplitude estimation such as to recover the harmonics which have been destroyed. The
spectral-subtraction and wiener filtering are considered to be wiener filtering ensures the accuracy of pitch estimation for the
simple and effective for suppressing the additive noise in noisy comb filter, and the comb filtering further suppresses the resi-
environments. However, these techniques may distort the high dual noise after wiener filtering and improves the performance
frequency harmonics of speech, which are important to the lan- of the harmonic recovering. This new method can be potential-
guage understanding. To solve this problem, we propose a new ly combined with previous algorithms but performs better than
approach. It works by using a comb filter in time domain after either one separately for enhancing the sound quality and
wiener filtering. With the harmonics of voiced speech enhanced
speech clarity. We believe the proposed method has both theo-
and most of the noise suppressed, another comb filter in frequen-
retical and practical merits.
cy domain is used to recalculate the suppression gain function,
renewing the high frequency harmonics that have been com-
pressed. This new method is theoretically derived, and experi- II. WIENER FILTERING
mentally demonstrates an improvement for harmonic enhance- In additive noise model, assume that y(t), the noisy input
ment and noise reduction.
signal, is composed of the clean speech signal x(t) and the un-
Keywords-harmonic enhancement; noise reduction; comb correlated additive noise signal n(t),
filtering y (t ) = x(t ) + n(t ). (1)

I. INTRODUCTION Taking the short-time Fourier transform, gives

It is found that loss of hearing associated with age often be- Y (n, k ) = X (n, k ) + N (n, k ), (2)
gins in higher frequencies. Speech’s energy is concentrated in
lower frequencies, while high frequency tones are crucial to the where n and k designate the short-time frame index and the
sound quality and speech intelligibility. The classical algo- frequency bin respectively. Over the duration of frame, the
rithms based on short-time spectrum estimation for speech en- speech is assumed stationary. A classical wiener filter is consi-
hancement like spectral subtraction [1]-[4] and wiener filtering dered as a suppression gain G(n,k) applied to each short-time
[3], [4] usually ignored that. They significantly improved the spectrum Y(n,k). The suppression gain can be [3]:
Signal-to-Noise Ratio (SNR), however, the limitation is that 1/ 2
those algorithms cannot enhance the high frequency speech ⎛ 2
E[ X ( n , k ) ] ⎞
harmonics even suppressing them where the SNR is small. Be- G (n, k ) = ⎜ ⎟ , (3)
⎜ E[ X (n, k ) 2 ] + E[ N (n, k ) ]2 ⎟
cause in low SNR speech frames, it is considered that only ⎝ ⎠
noise exists. A simple comb filter in [6] is able to enhance the in which, E denotes the expectation operator. Equation (3) can
voiced sounds. The main disadvantage of that method lies in be obtained as a function of estimation of a posteriori SNR and
the difficulty to detect the fundamental frequency exactly in a priori SNR, where the posteriori SNR estimation is
low SNRs. Additionally, it is not so effective on the unvoiced
speech. C. Plapous, C. Marro and P. Scalart addressed a me- Y ( n, k )
thod called Harmonic Regeneration Noise Reduction (HRNR) SNR post ( n, k ) = , (4)
[7]. Based on an estimation of the priori SNR [5], it further σˆ 2N (n, k )
created an artificial signal where the missing harmonics have
and according to the decision-directed approach [8], the estima-
been automatically regenerated.
tion of priori SNR is
In this paper, we propose a new method. It uses a classical
wiener filter first. Then a comb filter in time domain based on
accurate pitch estimation is used to enhance the voiced speech
harmonics and reduce the noise. After that, an improved har-

This work was supported by Commission of Science Technology and In-

dustry for National Defence, China (No. A1320070067).

978-1-4244-4131-0/09/$25.00 ©2009 IEEE

Xˆ (n − 1, k )
SNR prio ( n, k ) = λ
σˆ 2N (n, k )
Y (n, k ) − σˆ 2N (n, k )
+(1 − λ ) HWR[ ]
σˆ 2N (n, k ) (5)
Xˆ (n − 1, k )

σˆ 2N (n, k ) Figure 1. Amplitude frequency response of a typical comb filter (M=1,
Ck=[0.3 0.6 0.3], fs=8kHz, T0=1ms).
+(1 − λ ) HWR[ SNR post ( n, k ) − 1].

In which, HWR denotes the half-wave rectification, λ is the Such a comb filter after wiener filtering gets more exact
parameter which is approximate to 1, and Xˆ (n −1, k) denotes the pitch estimation, ensuring the effect of filtering. Moreover, it
provides an enhanced input signal for the harmonic recovering.
estimated spectrum of clean speech at previous frame.
σˆ 2N (n, k ) is the smoothed estimation of noise power spectral
density, using the recursive relation during speech absences: IV. HARMONIC RECOVERING
2 After comb filtering in the time domain, speech harmonics
σˆ 2N (n, k ) = ασˆ 2N (n − 1, k ) + (1 − α ) Y (n, k ) , (6) with lower power may be compressed. In this section, an im-
proved harmonic regeneration method is used to create a fully
where α is the smoothing factor (0<α<1). Then, the suppression harmonic signal, and a new suppression gain function will be
gain function (3) in wiener filter can be written as follows computed. The restored signal is obtained by
1/ 2
⎛ SNR ˆ ⎞ xh − w (t ) = HWR[ xˆn′ (t )],
prio ( n, k ) (10)
G (n, k ) = ⎜ ⎟ . (7)
⎜ 1 + SNR
ˆ ⎟
⎝ prio ( n, k ) ⎠ where HWR denotes the half-wave rectification and xˆn′ (t ) is
The enhanced speech frame in frequency domain is the enhanced signal by comb filtering in section 3. It is equal
to multiply xˆn′ (t ) by a function
Xˆ (n, k ) = G (n, k )Y (n, k ), (8)
⎧1, x > 0
p( x) = ⎨ . (11)
and correspondingly xˆn (t ) in the time domain. ⎩0, x < 0
Fig. 2 shows an enhanced voiced speech frame xˆn′(t) (solid line)
III. COMB FILTERING IN THE TIME DOMAIN and the corresponding function p( xˆn′ (t )) (dashed line). Notice
The voiced sounds in speech are quasi-periodic. For that, a that the signal p( xˆn′ (t )) is a square wave with a quasi-period of
comb filter can be used to enhance the speech component and T0, the pitch of the voiced speech.
compress the residual noise (white noise alike) after wiener
filtering. The expression of a simple comb filter in time domain Now we analyze the Fourier transform of p( xˆn′ (t )) [10]:
is given as follows 2π ∞

F [ p( xˆn′ (t )] =
∑ P(k )δ (w − k T ), (12)
xˆn′ (t ) = ∑ Ck xˆn (t − kT0 ). (9)
k =−∞ 0

k =− M
which means sampling the Fourier transform of the fundamen-
In which, T0 denotes the speech pitch, obtained by the approach tal element P(k) at a frequency of 2π/T0: the fundamental fre-
represented in [9], M is a small integer and Ck is the filter coef- quency. From equation (8) and (9), the Fourier transform of
ficient. The output signal xˆn′ (t ) is the delayed and weighted xh − w (t ) is
sum of input xˆn (t ) , which is the enhanced signal by wiener
filtering. When the delay corresponds with the fundamental
frequency, the periodic components will be strengthened and
those aperiodic will be suppressed or removed. Fig. 1 shows
the typical amplitude frequency response of the comb filter.
The filter reaches the peak at the integer multiple of the funda-
mental frequency F0 = 1/T0. Let fs be the sampling frequency,
that is, the amplitude frequency response function |H(w)| gets
the maximum value of the sum of Ck when ω = 2mπ / f sT0 and
the minimum of zero when ω = (2m + 1)π / f sT0 . Figure 2. An enhanced voiced speech frame xˆn′ (t ) (solid line) and the
relevant function p( xˆn′ (t )) (dashed line)
e jθ ∞
2π ucting, the overlap-and-add method is used. The proposed me-
F [ xh − w (t )] = F [ xˆn′ (t )]* ∑ P (k )δ ( w − k ), (13) thod was compared with the classical wiener filtering (WF) and
T0 k =−∞ T0 Harmonic Regeneration Noise Reduction method (HRNR)
where θ is the original phase. Formula (13) can be also consi- mentioned in section 1.
dered as a comb filter in the frequency domain, which has the To illustrate typical performance, we show the enhance-
same pitch as the signal at origin. Then, the distorted harmonics ment results of speech data corresponding to the sentence “The
can be restored using the adjacent harmonics. birch canoe slid on the smooth planks” from the IEEE sentence
Recalculate the priori SNR is as follows [7]: database [11]. It is corrupted by an additive babble noise from
the AURORA database [12] provided a SNR of 5 dB, sampling
ρ Xˆ (n, k ) + (1 − ρ ) X H −W (n, k )
2 at a frequency of 8 kHz and with a length of 2.81 seconds. A
ˆ ′ ( n, k ) =
SNR , (14) 160-point hamming window with 50% overlapped is used, ac-
σˆ N2 (n, k ) cordingly 20 milliseconds per frame. For parameters, we set λ
in (5) and α in (6) to be 0.98, M in (9) to be 1, vector Ck in (9)
in which, X H −W (n, k ) represents the Fourier transform of the to be [0.3 0.6 0.3] and ρ in (14) to be 0.5.
half-wave rectification signal xh − w (t ) and ρ (0<ρ<1) is the pa-
rameter to control the mixing degree of Xˆ (n, k ) and Fig. 4 shows four spectrograms. In which, (a) is the noisy
X H −W (n, k ) . Replacing the estimation SNR ˆ in (7) by speech contaminated by babble noise (SNR=5dB). (b) and (c)
prio ( n, k )
(14), gives the new noise suppression gain function G ′(n, k ) . present the noisy speech enhanced by WF and HRNR tech-
niques respectively, and (d) is by the proposed method in this
Finally, the enhanced signal frame in frequency domain is
paper. It appears that the proposed method has more complete
Xˆ final (n, k ) = G ′(n, k ) Xˆ ′(n, k ), (15) and clear harmonics with high frequency enhanced.
The segmental SNR improvements in various noisy envi-
where Xˆ ′(n, k ) is the Fourier transform of the harmonic en- ronments at different input SNRs are computed. They are ob-
hanced signal in (9). tained by WF, HRNR and the proposed method respectively.
Fig. 3 shows the effect of the proposed method on a frame. The segmental SNR is defined by [13]
As seen, compared to Harmonic Regeneration Noise Reduction
(HRNR) method, adding the time domain comb filtering pro-
posed in section 3 enhances the fundamental frequency and
harmonics of voiced speech, which makes the harmonic reco-
vering perform better. Furthermore, in the harmonic recovering,
most of the unvoiced sounds are preserved.


In experiments, firstly process the wiener filter on the en-
framed signal according to (8) with the noise suppression gain
function in (7). Then use the comb filter as (9). Lastly recalcu-
late the noise suppression gain function in (7) using (14) and
renew the harmonics by (15). For time-domain signal reconstr-

Figure 4. Spectrograms. (a) The noisy speech contaminated by babble noise

Figure 3. Effect of harmonic enhancement. (a) Noisy speech signal. (b) at a 5dB SNR. (b) Speech enha-nced by WF method. (c) Speech enhanced by
HRNR method. (c) Proposed method. HRNR method. (d) Speech enhanced by the proposed method.

10 ∑k Xˆ (n, k )
SegSNR = ∑ log σˆ 2 (n, k ) , (16)
M n∈M ∑ N k

where M denotes the number of speech frames. Fig. 5

presents a comparison of the segmental SNR improvements of
the three methods. As shown, under most of the situations, the
proposed method gives better performance in terms of SNR.

In this paper, we have proposed a new method for harmonic
speech enhancement with noise reduction. Based on the wiener
filtering, a comb filter in time domain is used to enhance the
harmonics of voiced sounds. After that, an improved harmonic
regeneration method is used to recover the harmonics that have
been distorted or suppressed. That is, creating a new noise sup-
pression gain function with the comb filtering in frequency
domain. The proposed method ensures the effect of the comb
filter in time domain for the exact pitch detection and sup-
presses the residual noise after wiener filter. More importantly,
the harmonics especially high-order of the speech are enhanced Figure 5. Comparison of segmental SNR improvement of the WF, HRNR
and proposed methods in different noisy environment. (a) Station. (b) Car. (c)
significantly, which makes the sound clearer and more intellig- Babble. (d) Airport.
[6] John R. Deller, John H. L. Hansen, and John G. Proakis, Discrete-Time
Processing of Speech Signal, IEEE Press, New York, NY, 1993.
[7] C. Plapous, C. Marro, and P. Scalart, “Speech Enhancement Using
The author Yu Cai wishes to acknowledge the support of Harmonic Regeneration”, Proc. IEEE Inc. Conf. Acoustics, Speech, and
Professor Xiaochuan Ma. He provided the resources and open Signal Processing, 2005, Vol. 1, pp. 157-160.
environment for carrying out this work, and gave some valua- [8] Y. Ephraim, and D. Malah, “Speech Enhancement Using a Minimum
ble advises. Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE
Trans. on Acoustics, Speech, and Signal processing, 1984, Vol. ASSP-
32, No. 6, pp.1109-1121.
REFERENCES [9] P. Boersma, “Accurate Short-Term Analysis of the Fundamental
[1] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of Speech Frequency and the Harmonic-to-Noise Ratio of Sa-mpled Sound”, IFA
Corrupted by Acoustic Noise”, Proc. IEEE Int. Conf. Acoustics, Speech, Proc. 17, 1993, pp. 97-110.
Signal Processing, 1979, pp. 208-211. [10] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buch, Discrete-
[2] Steven F. Boll, “Suppression of Acoustic Noise in Speech Using time Signal Processing, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ,
Spectral Subtraction”, IEEE Trans. on Acoustics, Speech, and Signal 1999.
Processing, 1979, Vol. ASSP-27, No.2, pp. 113-120. [11] IEEE Subcommittee, “IEEE Recommended Practice for Speech Quality
[3] Jae S. Lim, and Alan V. Oppenheim, “Enhancement and Bandwidth Measurements”, IEEE Trans. Audio Electro-acoustics, 1969, AU-17(3),
Compression of Noisy Speech”, IEEE proc. 1979, Vol. 67, No. 12, pp. pp. 225-246.
1586-1604. [12] Hirsch, H., and Pearce, D., “The AURORA Experimental Framework
[4] Philipos C. Loizou, Speech Enhancement: Theory and Practice, CRC for the Performance Evaluation of Speech Recognition Systems under
Press, Boca Raton, 2007. Noisy Conditions”, Proc. ASR 2000, 2000, pp. 181-188.
[5] P. Scalart, and J. Filho, “Speech Enhancement Based on a Priori Signal [13] S. Quackenbush, T. Barnwell, and M. Clements, objec-tive Measures of
to Noise Estimation”, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Speech Quality, Prentice-Hall, Englewood Cliffs, NJ, 1988.
Processing, 1996, Vol.2, pp. 629-632.

