Resumen
In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.
Idioma original | Inglés |
---|---|
Páginas | 117-123 |
Número de páginas | 7 |
DOI | |
Estado | Publicada - 2016 |
Evento | Speaker and Language Recognition Workshop, Odyssey 2016 - Bilbao, Espana Duración: 21 jun. 2016 → 24 jun. 2016 |
Conferencia
Conferencia | Speaker and Language Recognition Workshop, Odyssey 2016 |
---|---|
País/Territorio | Espana |
Ciudad | Bilbao |
Período | 21/06/16 → 24/06/16 |
Nota bibliográfica
Publisher Copyright:© Odyssey 2016: Speaker and Language Recognition Workshop. All rights reserved.