Welcome to Scribd!

Solving The Cocktail Party Problem Using Deep Neural Networks

Uploaded by

0% found this document useful (0 votes)

17 views2 pages

This document discusses using deep neural networks to solve the "cocktail party problem" of separating speech from multiple speakers in an audio mixture recorded by a single microphone. Recently, impressive results were achieved by mapping time-frequency bins of an audio spectrogram to an embedding space and then clustering bins to speakers. However, the technique's generalizability is unclear, as mixtures were artificially created and lack variability in microphones, reverberation, speech type, and noise. The proposed research will analyze the DNN's robustness and adapt it to increase performance in more realistic scenarios through experiments with TensorFlow.

Original Description:

Solving the cocktail party problem using deep neural networks

Original Title

RobustDNNforSS

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

17 views2 pages

Solving The Cocktail Party Problem Using Deep Neural Networks

Uploaded by

Firas Jolha

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Solving the cocktail party problem using deep neural networks

For many years the cocktail party problem has been

considered the holy grail of speech processing. To solve the
cocktail party problem, the speech signals of all speakers that
are being recorded by a single microphone have to be
retrieved. However, these people can speak simultaneously,
which makes the source (or speaker) separation problem
much harder. Furthermore, most applications require the
separation algorithm to be speaker independent, which means
that no prior information on the speaker is known.

If we would be able to determine speech tracks for every speaker present, this would be a great help
in applications such as hearing aids and automatic transcriptions of meetings as well as a
preprocessing stage in voice command applications and natural language interfaces such as Siri,
Google Now, Corona and so on.

Recently (2016), major steps have been made in solving

the cocktail party problem using Deep Neural Networks
(DNNs). In general, DNNs try to retrieve high level
features from low level (or input) features, using
multiple layers of hidden units. For this task we want to
know which parts (time-frequency bins) of the recorded
audio spectrogram belong to which speaker. The
proposed network in [1] maps each bin of the audio
spectrogram to a so called embedding space where
afterwards a simple clustering mechanism is used to
assign bins to the corresponding speaker.
Impressive results are achieved. However, the
generalizability and robustness of this technique can be
questioned. The multi-speaker mixture is artificially
created by mixing together two or more independent utterances. These utterances come from the Wall
Street Journal (WSJ) database, where studio recording are made of sentences from the Wall Street
Journal being read out loud. It is unclear how the DNN would perform in other scenarios where one
or more of the following changes are made:
 Microphone: Different microphones have different transfer functions in the frequency
domain. For example, there will be differences in the spectrogram of a mixture recorded with
high quality microphone compared to a (cell)phone.
 Reverberation: How well does the DNN cope with reverberation? Is there a difference
between outdoors and indoors? How much reverberation can be allowed?
 Read versus spontaneous speech: The WSJ database consists of read sentences. The way we
talk in spontaneous manner is different from the way we read out loud.
 (Non-) stationary noise: In the original experiments of [1], there are no added noise sources
but only speech. Does speech source separation still work in the presence of stationary noise
(e.g. a fan) and non-stationary noise (refrigerator, construction site, music, …)?

In the first phase you will try to analyze whether the DNN struggles towards robustness. Since this
technique is new, little research has already been done. Afterwards you will research how to adapt
the network to increase performance in these more realistic scenarios. Experiments will be done using
TensorFlow, a toolkit for research using DNNs. Baseline code will be provided.
Promotor
Hugo Van hamme (ESAT-A 02.84)
Supervision
Jeroen Zegers (ESAT-A 02.87)
Workload
Literature and study: 20%
Analysis and problem statement: 40%
Implementation and experimenting: 40%
Number of students
1

[1] Hershey, J.R; Chen, Z; Le Roux, J.; Watanabe, S., “Deep Clustering: Discriminative Embeddings
for Segmentation and Separation”, IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), 2016, 31-35

EDB Postgres Migration Guide v51.0.1
Document97 pages
EDB Postgres Migration Guide v51.0.1
Prasad K
No ratings yet
CEGP019671: Human Computer Interaction
Document4 pages
CEGP019671: Human Computer Interaction
Devika Rankhambe
No ratings yet
ZipCar Case Study Chapter 2.2 MIS 204
Document2 pages
ZipCar Case Study Chapter 2.2 MIS 204
crushx21
100% (1)
Red Hat Security (RH415)
Document1 page
Red Hat Security (RH415)
Maher Mechi
0% (2)
Result B Tech 2 Yr
Document2 pages
Result B Tech 2 Yr
prashant
No ratings yet
Data Quality Remediation
Document9 pages
Data Quality Remediation
Xavier Martinez Ruiz
100% (1)
Recognizing Speech Commands Using Recurrent Neural Networks With Attention - by Douglas Coimbra de Andrade - Towards Data Science
Document9 pages
Recognizing Speech Commands Using Recurrent Neural Networks With Attention - by Douglas Coimbra de Andrade - Towards Data Science
fcbolarin
No ratings yet
The DIRHA Simulated Corpus
Document6 pages
The DIRHA Simulated Corpus
turkusowy_smok2136
No ratings yet
Deep Learning For Environmentally Robust Speech Recognition - An Overview of Recent Developments PDF
Document28 pages
Deep Learning For Environmentally Robust Speech Recognition - An Overview of Recent Developments PDF
福福
No ratings yet
Vol7no3 331-336 PDF
Document6 pages
Vol7no3 331-336 PDF
Nghị Hoàng Mai
No ratings yet
v1 Covered
Document32 pages
v1 Covered
mrunal shethiya
No ratings yet
Se 4
Document5 pages
Se 4
pravin2275767
No ratings yet
Spasov Ski 2015
Document8 pages
Spasov Ski 2015
Abdelkbir Ws
No ratings yet
Irjet V7i6804
Document7 pages
Irjet V7i6804
Trung Tín Nguyễn
No ratings yet
D V 3: S T - S C S L: EEP Oice Caling EXT TO Peech With Onvolutional Equence Earning
Document16 pages
D V 3: S T - S C S L: EEP Oice Caling EXT TO Peech With Onvolutional Equence Earning
mrbook79
No ratings yet
D V 3: S T - S C S L: EEP Oice Caling EXT TO Peech With Onvolutional Equence Earning
Document16 pages
D V 3: S T - S C S L: EEP Oice Caling EXT TO Peech With Onvolutional Equence Earning
Abu Sufiun
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
Document3 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
International Organization of Scientific Research (IOSR)
No ratings yet
Speech Recognition Using Neural Networks IJERTV7IS100087
Document7 pages
Speech Recognition Using Neural Networks IJERTV7IS100087
Ibrahim Lukman
No ratings yet
Evaluation of Spatial Audio Reproduction Methods (Part 1) : Elicitation of Perceptual Differences
Document14 pages
Evaluation of Spatial Audio Reproduction Methods (Part 1) : Elicitation of Perceptual Differences
Sania Saeed
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
Document24 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
jwalith
No ratings yet
Else Vier Neur Da QR
Document7 pages
Else Vier Neur Da QR
khaled
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
Document13 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
jnanesh582
No ratings yet
Acessible Presentation of Information For People With Visual
Document24 pages
Acessible Presentation of Information For People With Visual
panfiladannunzio
No ratings yet
ISSE: An Interactive Source Separation Editor: April 2014
Document11 pages
ISSE: An Interactive Source Separation Editor: April 2014
hhyj
No ratings yet
Chaitanya Asawa
Document8 pages
Chaitanya Asawa
Mohammadreza Rostam
No ratings yet
New Insights Into The Noise Reduction Wiener Filter
Document17 pages
New Insights Into The Noise Reduction Wiener Filter
Bhuvan Blue
No ratings yet
Emotion Recognition From Speech With Recurrent Neural Network Emotion Recognition From Speech With Recurrent Neural Network
Document31 pages
Emotion Recognition From Speech With Recurrent Neural Network Emotion Recognition From Speech With Recurrent Neural Network
Amith T
No ratings yet
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
Document6 pages
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
EighthSenseGroup
No ratings yet
Paper 4
Document14 pages
Paper 4
Navneet
No ratings yet
Detection and Classification of Acoustic Scenes and Events
Document14 pages
Detection and Classification of Acoustic Scenes and Events
Pavan Yashoda Pk
No ratings yet
Research Paper
Document15 pages
Research Paper
Avishek Chakraborty
No ratings yet
10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)
Document5 pages
10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)
Hyndu Chowdary
No ratings yet
Lab9: Speech Synthesis
Document13 pages
Lab9: Speech Synthesis
Michel Olvera
No ratings yet
W N: A G M R A: AVE ET Enerative Odel For AW Udio
Document15 pages
W N: A G M R A: AVE ET Enerative Odel For AW Udio
Abu Sufiun
No ratings yet
1 Paper
Document9 pages
1 Paper
Sanjai Siddharth
No ratings yet
User Manual PDF
Document164 pages
User Manual PDF
Arturo Guzman
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
Document6 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
Ayyappan
No ratings yet
Sign Language Recognition Using Convolutional Neur
Document12 pages
Sign Language Recognition Using Convolutional Neur
Jagdish Bhatta
No ratings yet
Audio Signal Classification
Document39 pages
Audio Signal Classification
Nilanka Harshani
No ratings yet
Echo Cancellation Thesis
Document5 pages
Echo Cancellation Thesis
afjrqxflw
100% (2)
Devi Priya SECOND PAPER
Document7 pages
Devi Priya SECOND PAPER
Naga Raju G
No ratings yet
Smart Sound System Applied For The Extensive Care of People With Hearing Impairment
Document13 pages
Smart Sound System Applied For The Extensive Care of People With Hearing Impairment
Anonymous 9IlMYYx8
No ratings yet
Application of Depplearning and Intro To Autoencoders
Document28 pages
Application of Depplearning and Intro To Autoencoders
Bhavani G
No ratings yet
Computer Vision Lip Reading: Gtilton@stanford - Edu
Document5 pages
Computer Vision Lip Reading: Gtilton@stanford - Edu
Radhakrishnan R
No ratings yet
Targeted Voice Separation
Document4 pages
Targeted Voice Separation
International Journal of Innovative Science and Research Technology
No ratings yet
DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm
Document7 pages
DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm
ssn2kor
No ratings yet
Partially Supervised Speaker Clustering
Document15 pages
Partially Supervised Speaker Clustering
Oleg Nikitin
No ratings yet
Robust Automatic Speech Recognition: A Bridge to Practical Applications
From Everand
Robust Automatic Speech Recognition: A Bridge to Practical Applications
Jinyu Li
No ratings yet
Research Paper On Silent Sound Technology
Document6 pages
Research Paper On Silent Sound Technology
aflbsybmc
100% (1)
Speech Acoustics Project
Document22 pages
Speech Acoustics Project
yeasir089
No ratings yet
Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13
Document26 pages
Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13
Mouni Sharon
No ratings yet
Soundsense: Scalable Sound Sensing For People-Centric Applications On Mobile Phones
Document20 pages
Soundsense: Scalable Sound Sensing For People-Centric Applications On Mobile Phones
jagdishmehta_online123
No ratings yet
Speech Recognition UTHM
Document30 pages
Speech Recognition UTHM
Dineshwaran Daniel Gunalan
No ratings yet
BIOEN 303 Final Project Report
Document21 pages
BIOEN 303 Final Project Report
chaocharliehuang
No ratings yet
Spleeter A Fast and Efficient Music Source Separat
Document4 pages
Spleeter A Fast and Efficient Music Source Separat
Zekeriya Akbaba
No ratings yet
Inceptra
Document29 pages
Inceptra
Mangalanageshwari
No ratings yet
Mohini Dey - Capstone
Document52 pages
Mohini Dey - Capstone
Gautham Krishna Kongattil
No ratings yet
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Document10 pages
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Jose Luis Rojas Aranda
No ratings yet
Research Paper On Voice Morphing
Document4 pages
Research Paper On Voice Morphing
hipcibqlg
100% (3)
Experimental Study On Extreme Learning Machine Applications For Speech Enhancement
Document13 pages
Experimental Study On Extreme Learning Machine Applications For Speech Enhancement
JB
No ratings yet
VT217 Asurveyonvoiceconversionusingdeeplearning
Document15 pages
VT217 Asurveyonvoiceconversionusingdeeplearning
ChalaTamene
No ratings yet
Gestos. Steininger (2000)
Document3 pages
Gestos. Steininger (2000)
Jaime Puig Guisado
No ratings yet
DAFF Soft - DAGA2010
Document2 pages
DAFF Soft - DAGA2010
efvergara
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
Document14 pages
Term Paper ECE-300 Topic: - Speech Recognition
hemprakash_dev53
No ratings yet
Computer Vision For Music Identification
Document8 pages
Computer Vision For Music Identification
Quỳnh Nguyễn
No ratings yet
Base Paper
Document10 pages
Base Paper
CYRUS EDWINSON 21ITR008
No ratings yet
Proposal Writing: Comments For The ECSA
Document38 pages
Proposal Writing: Comments For The ECSA
Firas Jolha
No ratings yet
Lecture 3
Document19 pages
Lecture 3
Firas Jolha
No ratings yet
Lecture 1
Document3 pages
Lecture 1
Firas Jolha
No ratings yet
Contoh Surat Motivasi
Document1 page
Contoh Surat Motivasi
Reska Nurul Fadila
No ratings yet
ROC - Reciver Operating Characteristic: Multiple Classes
Document3 pages
ROC - Reciver Operating Characteristic: Multiple Classes
Firas Jolha
No ratings yet
Lecture 2
Document24 pages
Lecture 2
Firas Jolha
No ratings yet
cs229 Notes1 PDF
Document30 pages
cs229 Notes1 PDF
Guilherme Germano
No ratings yet
Machine Learning Yearning v0-5
Document27 pages
Machine Learning Yearning v0-5
Karthik
No ratings yet
Documen New
Document1 page
Documen New
Firas Jolha
No ratings yet
Documen New
Document1 page
Documen New
Firas Jolha
No ratings yet
Machine Learning Yearning v0-5
Document27 pages
Machine Learning Yearning v0-5
Karthik
No ratings yet
Class Diagram of Library
Document1 page
Class Diagram of Library
Firas Jolha
No ratings yet
Good Evening
Document1 page
Good Evening
Firas Jolha
No ratings yet
Good Morning
Document1 page
Good Morning
Firas Jolha
No ratings yet
Hello World
Document1 page
Hello World
Firas Jolha
No ratings yet
Android 1 16x9
Document41 pages
Android 1 16x9
Firas Jolha
No ratings yet
TigerSecu 2MP PDF
Document38 pages
TigerSecu 2MP PDF
mandingos707
100% (1)
Data Quality
Document7 pages
Data Quality
Aaruni Giriraj
No ratings yet
Multicore Processor Report
Document19 pages
Multicore Processor Report
Dilesh Kumar
100% (1)
Pipe Riser
Document4 pages
Pipe Riser
Jg Glez
No ratings yet
V.S.B. Engineering College, Karur Department of Computer Science and Engineering University Practical Examination (October November 2019)
Document2 pages
V.S.B. Engineering College, Karur Department of Computer Science and Engineering University Practical Examination (October November 2019)
Sobiya D
No ratings yet
Wins Monitoring System - Users Manual For Schools
Document7 pages
Wins Monitoring System - Users Manual For Schools
Ann Joenith Comandante Solloso
100% (1)
Stream Processor Architecture
Document134 pages
Stream Processor Architecture
prajnith
No ratings yet
NetPinger 2010 Namespace
Document1 page
NetPinger 2010 Namespace
Amit Apollo Barman
No ratings yet
BAPI Good GI
Document18 pages
BAPI Good GI
Jalal Masoumi Kozekanan
100% (1)
Fatfree User Manual
Document33 pages
Fatfree User Manual
Lit Phen Tu
No ratings yet
Isodraft User Guide PDF
Document137 pages
Isodraft User Guide PDF
Polarograma
No ratings yet
Common Android Views Cheat Sheet
Document3 pages
Common Android Views Cheat Sheet
agung
No ratings yet
Social Robot Sypehul Bolabot
Document6 pages
Social Robot Sypehul Bolabot
Mada Sanjaya Ws
No ratings yet
Atoll 3.1.0 Wimax v2
Document159 pages
Atoll 3.1.0 Wimax v2
medsoulhi
No ratings yet
Binomial
Document14 pages
Binomial
Catherine Jane Baua Verdad
No ratings yet
MS Word 2007
Document7 pages
MS Word 2007
gel18
No ratings yet
EQQI1 MST
Document256 pages
EQQI1 MST
musecz
No ratings yet
OracleEloqua Emails UserGuide PDF
Document215 pages
OracleEloqua Emails UserGuide PDF
Avinash Bm
No ratings yet
CMSC 27100
Document3 pages
CMSC 27100
Kaushik Vasudevan
No ratings yet
Register Transfer Level Design With Verilog: Verilog Digital System Design Z. Navabi, Mcgraw-Hill, 2005
Document77 pages
Register Transfer Level Design With Verilog: Verilog Digital System Design Z. Navabi, Mcgraw-Hill, 2005
I am number 4
No ratings yet
LTRT 10630 Mediant 800b Gateway and e SBC Users Manual Ver 68 PDF
Document1,066 pages
LTRT 10630 Mediant 800b Gateway and e SBC Users Manual Ver 68 PDF
Walter Garzon
No ratings yet
IBM Maximo Data Loader
Document31 pages
IBM Maximo Data Loader
Dragomir Sorin
100% (1)
Advance Encryption Standard
Document41 pages
Advance Encryption Standard
gpaswathy
No ratings yet
Allslides Handout
Document269 pages
Allslides Handout
mamudu francis
No ratings yet