Speech Enhancement Presentation - Group Meeting - 09172015

A Tour Through the
Wonderful World
of
Speech Enhancement
Femi Odelowo
Definition
Speech enhancement is concerned with improving
some perceptual aspect of speech that has been
degraded by additive noise
Speech
Enhancement Theory and Practice, P. C. Loizou
Perceptual aspects typically are the quality and/or

intelligibility of the source signal
Algorithms could be broadly grouped depending on
whether there is a single source or multiple sources
Single microphone or single channel speech enhancement
Microphone array or multichannel noise enhancement
Focus is single channel enhancement
Signal Model
The additive noise model is the most commonly
considered model
STFT
where is the noisy signal, is the desired speech signal,

and is the additive noise
The noise is assumed to be independent of the speech
signal
Processing Flow/Block Diagram

The noisy signal is broken into overlapping frames
Individual frames are processed
The enhanced speech signal is reassembled using the
overlap add method
STFT
Phase
Paramet
er
Estimati
on
Gain
Calculatio
n
Spectral
Modificati
on
Inverse
STFT
Algorithms
Spectral subtraction
Conceptually the simplest to design/implement
Based on the assumed additive nature of the noise
Statistical model-based algorithms

Based on a statistical estimation framework
Includes the Wiener and several minimum mean-square error
(MMSE) algorithms
Subspace algorithms
Based on a linear algebra framework
Typically use eigenvalue/eigenvector decomposition or SVD
Machine learning algorithms

The big bad new kid on the block
Problems With Classical Methods

Algorithms need a good noise and/or SNR estimate
Mathematical accuracy is not necessarily the best!
Noise estimation is worse with lower SNR
Enhanced sound is plagued with a distorted background

Referred to as musical noise
Poor performance in non-stationary noise
Examples using the Wiener Filter
The Wiener filter seeks to minimize the MMSE E[e2(n)]
Exhibition Noise, 10dB Signal,

Simple VAD
True Noise PSD vs. Noise PSD Estimates for f = 500
-20
-25
-10
-30
-20
-30
-40
PS (dB)
PN (dB)
-35
-45
-50
True Noise
Est. Noise,
Est. Noise,
Est. Noise,
Est. Noise,
Est. Noise,
-55
-60
-65
-70
Clean Signal PSD vs. Signal PSD Estimates for f = 500
0.5
1.5
Time (sec)
2.5
-40
-50
-60
= 0.7
= 0.9
=1
=2
=5
-70
-80
-90
0.5
1.5
Time (sec)
True Signal
Est. Signal, = 0.7
Est. Signal, = 1
Est. Signal, = 5
2.5
Exhibition Noise, 10dB Signal, IMCRA

Algorithm
True Noise PSD vs. Noise PSD Estimates With imcra Noise Estimation for f = 500
Clean Signal PSD vs. Signal PSD Estimates With imcra Noise Estimation for f = 500
0
-30
-10
-35
-20
-40
-30
-45
-40
PS (dB)
PN (dB)
-25
-50
-55
True Noise
Est. Noise,
Est. Noise,
Est. Noise,
Est. Noise,
Est. Noise,
-60
-65
-70
0.5
1.5
Time (sec)
2.5
-50
-60
= 0.7
= 0.9
=1
=2
=5
-70
True Signal
Est. Signal, = 0.7
Est. Signal, = 1
Est. Signal, = 5
-80
-90
0.5
1.5
Time (sec)
2.5
Exhibition Noise, 10dB Signal,

Enhanced Speech
Simple VAD
Improved MCRA Noise Estimatio
Noisy Signal
Enhanced Signal, oracle PSDs
Enhanced Signal, = 0.7
Enhanced Signal, = 1
Restaurant Noise, 10dB Signal,

Enhanced Speech
Simple VAD
Improved MCRA Noise Estimatio
Noisy Signal
Enhanced Signal, oracle PSDs
SNR & Wiener Gain Estimation, Car

Noise, 10dB
40
True vs. Estimated SNR for Representative Frequency Bin
True SNR
DD SNR Estimate
Anderson DD Estimate
-10
Weiner Gain (dB)
SNR (dB)
20
0
-20
-40
-60
0
True vs. Estimated Weiner Gains for Representative Frequency Bin
-20
-30
-40
-50
0.5
1.5
Time (sec)
2.5
-60
0
True
DD Gain
Anderson DD Gain
0.5
1.5
Time (sec)
2.5
Wiener Gains, Car Noise, 10dB

Signal
1
Ideal vs. Realized Weiner Gains, Ephraim-Malah DD SNR Update
0.9
0.8
Wiener Gains
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-50
Ideal
Realized
-40
-30
-20
-10
SNR (dB)
10
20
30
40
50
Important Statistical Models

Statistical models are based on a probabilistic model of
the DFT components of speech and additive noise
Signal Model:
Short Time Spectral Amplitude (STSA) estimator
Also called the Ephraim-Malah estimator
Obtained as
Log Spectral Amplitude (LSA) estimator

Also due to Y. Ephraim and D. Malah
Obtained as
A variant of this algorithm, the optimally-modified LSA (OMLSA) by I. Cohen estimator is typically used as a benchmark
for the classical algorithms
A Machine Learning Approach

Can we learn a gain function based on the SNR
estimates that performs better than the Wiener gain?
A generalized additive model (GAM) was fitted to the
true Wiener gain using the decision-directed SNR, a
posteriori SNR, and noise estimates as covariates
A GAM is a flexible modeling framework in which a
linear predictor depends on either parametric or nonparametric functions of predictor variables
Results showed improved performance over Wiener
filtering.
Performance of the GAM Model

4
5
Mean COMP SIG Score
Mean PESQ Score
3.5
3
2.5
Learned Response
True Signal/Noise WF
DD Wiener Filter
2
1.5
dB
10
3.5
3
Learned Response
DD Wiener Filter
2.5
0
dB
10
15
4.5
3.5
Mean COMP OVL Score
Mean COMP BAK Score
15
3
2.5
Learned Response
DD Wiener Filter
2
1.5
4.5
dB
10
15
4
3.5
3
2.5
Learned Response
DD Wiener Filter
2
1.5
dB
10
15
Performance of the GAM Model

(contd.)
0dB Signals
3
Learned Response
DD Wiener Filter
2.5
1.5
airport
babble
car
exhibition restaurant
Noise Types
station
street
2.5
babble
car
Noise Types
station
street
train
15dB Signals
3.8
3.4
Learned Response
DD Wiener Filter
3.2
3
2.8
Mean PESQ Score
Mean PESQ Score
Learned Response
DD Wiener Filter
3.6
3.6
Learned Response
DD Wiener Filter
3.4
3.2
3
2.6
airport
2
airport
train
10dB Signals
3.8
5dB Signals
3.5
Mean PESQ Score
Mean PESQ Score
3.5
babble
car
Noise Types
station
street
train
2.8
airport
babble
car
Noise Types
station
street
train
Other Machine Learning Approaches

Independent Component Analysis
Non-negative Matrix Factorization
Deep Neural Networks
Very recent and have produced the best results
Some interesting results from the publication Yong Xu et. al
are at http://
home.ustc.edu.cn/~xuyong62/demo/SE_DNN_taslp.html
More research is needed on how to obtain the best
performance
Other Research Areas

Speech enhancement based on phase spectrum
modification
Phase spectrum compensation (PSC) algorithm by K. Wojcicki
et. al performed as well or slightly better than the STSA
estimator
Research results suggest the analysis window used and
sidelobe attenuation levels are important
Enhancement utilizing both magnitude and phase

correction
Idea is to gain the best of both worlds
Results varied when the PSC and STSA estimator were
combined
Questions/Discussion

Speech Enhancement Presentation - Group Meeting - 09172015

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Enhancement Presentation - Group Meeting - 09172015

Uploaded by

Copyright:

Available Formats

A Tour Through the

Perceptual aspects typically are the quality and/or

Focus is single channel enhancement

where is the noisy signal, is the desired speech signal,

Processing Flow/Block Diagram

Statistical model-based algorithms

Machine learning algorithms

Problems With Classical Methods

Enhanced sound is plagued with a distorted background

Poor performance in non-stationary noise

Examples using the Wiener Filter

The Wiener filter seeks to minimize the MMSE E[e2(n)]

Exhibition Noise, 10dB Signal,

Clean Signal PSD vs. Signal PSD Estimates for f = 500

Exhibition Noise, 10dB Signal, IMCRA

Exhibition Noise, 10dB Signal,

Improved MCRA Noise Estimatio

Enhanced Signal, = 0.7

Restaurant Noise, 10dB Signal,

Improved MCRA Noise Estimatio

Enhanced Signal, = 0.7

SNR & Wiener Gain Estimation, Car

True vs. Estimated SNR for Representative Frequency Bin

Weiner Gain (dB)

True vs. Estimated Weiner Gains for Representative Frequency Bin

Wiener Gains, Car Noise, 10dB

Ideal vs. Realized Weiner Gains, Ephraim-Malah DD SNR Update

Important Statistical Models

Log Spectral Amplitude (LSA) estimator

A Machine Learning Approach

Performance of the GAM Model

Mean PESQ Score

Mean COMP OVL Score

Mean COMP BAK Score

Performance of the GAM Model

Mean PESQ Score

Mean PESQ Score

Mean PESQ Score

Mean PESQ Score

Other Machine Learning Approaches

Other Research Areas

Enhancement utilizing both magnitude and phase

You might also like