Professional Documents
Culture Documents
If we would be able to determine speech tracks for every speaker present, this would be a great help
in applications such as hearing aids and automatic transcriptions of meetings as well as a
preprocessing stage in voice command applications and natural language interfaces such as Siri,
Google Now, Corona and so on.
In the first phase you will try to analyze whether the DNN struggles towards robustness. Since this
technique is new, little research has already been done. Afterwards you will research how to adapt
the network to increase performance in these more realistic scenarios. Experiments will be done using
TensorFlow, a toolkit for research using DNNs. Baseline code will be provided.
Promotor
Hugo Van hamme (ESAT-A 02.84)
Supervision
Jeroen Zegers (ESAT-A 02.87)
Workload
Literature and study: 20%
Analysis and problem statement: 40%
Implementation and experimenting: 40%
Number of students
1
[1] Hershey, J.R; Chen, Z; Le Roux, J.; Watanabe, S., “Deep Clustering: Discriminative Embeddings
for Segmentation and Separation”, IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), 2016, 31-35