Resumen
In this paper, we describe a phonotactic language recognition model that effectively manages long and short n-gram input sequences to learn contextual phonotactic-based vector embeddings. Our approach uses a transformer-based encoder that integrates a sliding window attention to attempt finding discriminative short and long cooccurrences of language dependent n-gram phonetic units. We then evaluate and compare the use of different phoneme recognizers (Brno and Allosaurus) and sub-unit tokenizers to help select the more discriminative n-grams. The proposed architecture is evaluated using the Kalaka-3 database that contains clean and noisy audio recordings for very similar languages (i.e. Iberian languages, e.g., Spanish, Galician, Catalan). We provide results using the Cavg and accuracy metrics used in NIST evaluations. The experimental results show that our proposed approach outperforms by 21% of relative improvement to the best system presented in the Albayzin LR competition.
Idioma original | Inglés |
---|---|
Título de la publicación alojada | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings |
Editorial | Institute of Electrical and Electronics Engineers Inc. |
Páginas | 6872-6876 |
Número de páginas | 5 |
ISBN (versión digital) | 9781665405409 |
DOI | |
Estado | Publicada - 2022 |
Evento | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapur Duración: 23 may. 2022 → 27 may. 2022 |
Serie de la publicación
Nombre | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volumen | 2022-May |
ISSN (versión impresa) | 1520-6149 |
Conferencia
Conferencia | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 |
---|---|
País/Territorio | Singapur |
Ciudad | Virtual, Online |
Período | 23/05/22 → 27/05/22 |
Nota bibliográfica
Publisher Copyright:© 2022 IEEE