Abstract
New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.
| Original language | English |
|---|---|
| Article number | 8447373 |
| Pages (from-to) | 2033-2039 |
| Number of pages | 7 |
| Journal | Ieee Latin America Transactions |
| Volume | 16 |
| Issue number | 7 |
| DOIs | |
| State | Published - Jul 2018 |
Bibliographical note
Publisher Copyright:© 2003-2012 IEEE.
Keywords
- Language Recognition
- Neural Embedding
- Phone-grams
- Phonotactic approach
- Recurrent Neural Network
CACES Knowledge Areas
- 116A Computer Science
Fingerprint
Dive into the research topics of 'Language recognition using neural phone embeddings and RNNLMs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver