TY - JOUR
T1 - Language recognition using neural phone embeddings and RNNLMs
AU - Salamea, Christian Raul
AU - D'Haro, Luis Fernando
AU - Cordoba, Ricardo
N1 - Publisher Copyright:
© 2003-2012 IEEE.
PY - 2018/7
Y1 - 2018/7
N2 - New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.
AB - New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.
KW - Language Recognition
KW - Neural Embedding
KW - Phone-grams
KW - Phonotactic approach
KW - Recurrent Neural Network
UR - http://www.scopus.com/inward/record.url?scp=85052758311&partnerID=8YFLogxK
U2 - 10.1109/TLA.2018.8447373
DO - 10.1109/TLA.2018.8447373
M3 - Article
AN - SCOPUS:85052758311
SN - 1548-0992
VL - 16
SP - 2033
EP - 2039
JO - Ieee Latin America Transactions
JF - Ieee Latin America Transactions
IS - 7
M1 - 8447373
ER -