On the use of Phone-based Embeddings for Language Recognition

Christian Salamea, Ricardo De Córdoba, Luis Fernando D'Haro, Rubén San Segundo, Javier Ferreiros

Producción científica: Contribución a una conferenciaDocumentorevisión exhaustiva

2 Citas (Scopus)

Resumen

Language Identification (LID) can be defined as the process of automatically identifying the language of a given spoken utterance. We have focused in a phonotactic approach in which the system input is the phoneme sequence generated by a speech recognizer (ASR), but instead of phonemes, we have used phonetic units that contain context information, the socalled "phone-gram sequences". In this context, we propose the use of Neural Embeddings (NEs) as features for those phone-grams sequences, which are used as entries in a classical i-Vector framework to train a multi class logistic classifier. These NEs incorporate information from the neighbouring phone-grams in the sequence and model implicitly longer-context information. The NEs have been trained using both a Skip-Gram and a Glove Model. Experiments have been carried out on the KALAKA-3 database and we have used Cavg as metric to compare the systems. We propose as baseline the Cavg obtained using the NEs as features in the LID task, 24,7%. Our strategy to incorporate information from the neighbouring phone-grams to define the final sequences contributes to obtain up to 24,3% relative improvement over the baseline using Skip-Gram model and up to 32,4% using Glove model. Finally, the fusion of our best system with a MFCC-based acoustic i- Vector system provides up to 34,1% improvement over the acoustic system alone.

Idioma originalInglés
Páginas55-59
Número de páginas5
DOI
EstadoPublicada - 2018
Evento4th International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2018 - Barcelona, Espana
Duración: 21 nov. 201823 nov. 2018

Conferencia

Conferencia4th International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2018
País/TerritorioEspana
CiudadBarcelona
Período21/11/1823/11/18

Nota bibliográfica

Publisher Copyright:
© 4th International Conference, IberSPEECH 2018.

Areas de Conocimiento del CACES

  • 116A Computación

Huella

Profundice en los temas de investigación de 'On the use of Phone-based Embeddings for Language Recognition'. En conjunto forman una huella única.

Citar esto