Language recognition using neural phone embeddings and RNNLMs

Christian Raul Salamea; Luis Fernando D'Haro; Ricardo Cordoba

doi:10.1109/TLA.2018.8447373

Language recognition using neural phone embeddings and RNNLMs

Christian Raul Salamea, Luis Fernando D'Haro, Ricardo Cordoba

Grupo de Investigación en Interacción, Robótica y Automática (GIIRA)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

6 Citas (Scopus)

Resumen

New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.

Idioma original	Inglés
Número de artículo	8447373
Páginas (desde-hasta)	2033-2039
Número de páginas	7
Publicación	Ieee Latin America Transactions
Volumen	16
N.º	7
DOI	https://doi.org/10.1109/TLA.2018.8447373
Estado	Publicada - jul. 2018

Nota bibliográfica

Publisher Copyright:
© 2003-2012 IEEE.

Acceder al documento

10.1109/TLA.2018.8447373

Otros archivos y enlaces

http://www.scopus.com/inward/record.url?scp=85052758311&partnerID=8YFLogxK

Citar esto

@article{b40ed05c29314ca48c86a262bae3e44f,

title = "Language recognition using neural phone embeddings and RNNLMs",

abstract = "New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.",

keywords = "Language Recognition, Neural Embedding, Phone-grams, Phonotactic approach, Recurrent Neural Network",

author = "Salamea, {Christian Raul} and D'Haro, {Luis Fernando} and Ricardo Cordoba",

note = "Publisher Copyright: {\textcopyright} 2003-2012 IEEE.",

year = "2018",

month = jul,

doi = "10.1109/TLA.2018.8447373",

language = "English",

volume = "16",

pages = "2033--2039",

journal = "Ieee Latin America Transactions",

issn = "1548-0992",

publisher = "IEEE Computer Society",

number = "7",

}

TY - JOUR

T1 - Language recognition using neural phone embeddings and RNNLMs

AU - Salamea, Christian Raul

AU - D'Haro, Luis Fernando

AU - Cordoba, Ricardo

PY - 2018/7

Y1 - 2018/7

N2 - New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.

AB - New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called phone-grams that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.

KW - Language Recognition

KW - Neural Embedding

KW - Phone-grams

KW - Phonotactic approach

KW - Recurrent Neural Network

UR - http://www.scopus.com/inward/record.url?scp=85052758311&partnerID=8YFLogxK

U2 - 10.1109/TLA.2018.8447373

DO - 10.1109/TLA.2018.8447373

M3 - Article

AN - SCOPUS:85052758311

SN - 1548-0992

VL - 16

SP - 2033

EP - 2039

JO - Ieee Latin America Transactions

JF - Ieee Latin America Transactions

IS - 7

M1 - 8447373

ER -

Language recognition using neural phone embeddings and RNNLMs

Resumen

Nota bibliográfica

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto