You are on page 1of 2

Improved Method for Epoch Estimation in

Telephonic Speech Signals Using Zero Frequency


Filtering
Abstract:
The zero frequency filtering (ZFF) based epoch estimation has received a growing
attention for clean or studio speech signals. The ZFF based epoch estimation uses
the impulse like excitation characteristics at the zero frequency (DC) region in
speech. As the lower frequency regions in telephonic speech are significantly
attenuated, ZFF approach gives poor epoch estimation performance. Therefore,
the objective of author is to propose refinements to the existing ZFF based epoch
estimation algorithm for improved epoch estimation in telephonic speech. The
strength of the impulses at the zero frequency region are enhanced by computing
the Hilbert envelope (HE) of the speech which in turn improve the epoch
estimation performance significantly.

Introduction:
Epochs are the locations in speech which gives glottal closure instants for voiced
speech segments and onset of bursts or frication in unvoiced segments. There are
important speech processing tasks such as prosody modification, speech
enhancement etc. which use the knowledge of epochs in speech. The presence of
vocal tract responses in the speech spectrum make the accurate estimation of
epochs a challenging task. There are many methods available in the literature for
the estimation epochs from speech.
Among all methods, the zero frequency filtering (ZFF) method provides the
accurate epochs directly estimated from speech by a simple algorithm with
reduced number of parameters. For instance, epoch estimation from all other
methods are performed on the LP residual which is computed by setting the
speech block processing parameters such as frame size, frame shifts, order of
prediction etc. The ZFF method exploits the impulse like nature of the excitation
waveform As the frequency domain representation of an impulse is a unit step
function which is spread uniformly across all the frequencies including the zero
frequency of speech spectrum, the impulse characteristics at the zero frequency
region of speech is analyzed by filtering speech using a zero frequency resonator
(ZFR). Even though ZFF approach gives accurate epochs with respect to the

ground truth epochs obtained from electro glottal grams (EGG) for clean speech
signal, epoch estimation performance is found to be degraded for high pass
filtered speech or telephonic speech signals. In the speech signals with telephonic
quality, the spectral energy below the average pitch range is significantly
attenuated. As a result spurious zero crossings get introduced in the zero
frequency filtered signal which in turn result in increasing the falsified epoch
estimations. The performance of the epoch estimation in high pass filtered and
telephonic speech is improved by the ZFF of HE of speech. Even though a better
epoch estimation performance is obtained with reduced speech parameter
dependence, the epoch identification accuracy was significantly compromised. In
the present work, strength of the zero frequency filtered signal (ZFFS) obtained
from the input telephonic speech is refined by filtering the ZFFS segment through
a resonator located at the average F0 estimated from the segment. The
approximate F0 is estimated from the short time fourier transform of each
ZFFS segment.
Applications:
Prosody modification
Speech enhancement
Speech Analysis
Approach:
In the present work, strength of the ZFFS obtained from the input telephonic
speech is refined by filtering the ZFFS segment through a resonator located at the
average F0 estimated from the segment. The approximate F0 is estimated from
the short time fourier transform of each ZFFS segment. This method was originally
proposed for the instantaneous fundamental frequency estimation from distant
speech signals (speech signals collected through a microphones kept at a
distance). The method proposes refinements to the ZFFS obtained from the HE of
telephonic speech to improve the epoch identification accuracies.

Reference:
D. Govind, R. Vishnu and D. Pravena, Improved Method for Epoch Estimation in
Telephonic Speech Signals Using Zero Frequency Filtering IEEE International Conference on
Signal and Image Processing Applications (ICSIPA), 2015.

You might also like