Abstract
In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.
| Original language | English |
|---|---|
| Pages | 117-123 |
| Number of pages | 7 |
| DOIs | |
| State | Published - 2016 |
| Event | Speaker and Language Recognition Workshop, Odyssey 2016 - Bilbao, Spain Duration: 21 Jun 2016 → 24 Jun 2016 |
Conference
| Conference | Speaker and Language Recognition Workshop, Odyssey 2016 |
|---|---|
| Country/Territory | Spain |
| City | Bilbao |
| Period | 21/06/16 → 24/06/16 |
Bibliographical note
Funding Information:This work has been supported by ASLP-MUL?N (TIN2014-54288-C4-1-R), NAVEGABLE (MICINN, DPI2014-53525-C3-2-R), MA2VICMR (Comunidad Aut?noma de Madrid, S2009/TIC-1542), SENESCYT, and the Universidad Polit?cnica Salesiana de Ecuador.
Funding Information:
This work has been supported by ASLP-MULÁN (TIN2014-54288-C4-1-R), NAVEGABLE (MICINN, DPI2014-53525-C3-2-R), MA2VICMR (Comunidad Autónoma de Madrid, S2009/TIC-1542), SENESCYT, and the Universidad Politécnica Salesiana de Ecuador.
Publisher Copyright:
© Odyssey 2016: Speaker and Language Recognition Workshop. All rights reserved.
CACES Knowledge Areas
- 8217A Mechatronics
Fingerprint
Dive into the research topics of 'On the use of phone-gram units in recurrent neural networks for language identification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver