PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE

David Romero, Luis Fernando D'Haro, Marcos Estecha-Garitagoitia, Christian Salamea

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

2 Citas (Scopus)

Resumen

In this paper, we describe a phonotactic language recognition model that effectively manages long and short n-gram input sequences to learn contextual phonotactic-based vector embeddings. Our approach uses a transformer-based encoder that integrates a sliding window attention to attempt finding discriminative short and long cooccurrences of language dependent n-gram phonetic units. We then evaluate and compare the use of different phoneme recognizers (Brno and Allosaurus) and sub-unit tokenizers to help select the more discriminative n-grams. The proposed architecture is evaluated using the Kalaka-3 database that contains clean and noisy audio recordings for very similar languages (i.e. Iberian languages, e.g., Spanish, Galician, Catalan). We provide results using the Cavg and accuracy metrics used in NIST evaluations. The experimental results show that our proposed approach outperforms by 21% of relative improvement to the best system presented in the Albayzin LR competition.

Idioma originalInglés
Título de la publicación alojada2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas6872-6876
Número de páginas5
ISBN (versión digital)9781665405409
DOI
EstadoPublicada - 2022
Evento47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapur
Duración: 23 may. 202227 may. 2022

Serie de la publicación

NombreICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volumen2022-May
ISSN (versión impresa)1520-6149

Conferencia

Conferencia47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
País/TerritorioSingapur
CiudadVirtual, Online
Período23/05/2227/05/22

Nota bibliográfica

Publisher Copyright:
© 2022 IEEE

Huella

Profundice en los temas de investigación de 'PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE'. En conjunto forman una huella única.

Citar esto