Abstract
In this paper, we describe a phonotactic language recognition model that effectively manages long and short n-gram input sequences to learn contextual phonotactic-based vector embeddings. Our approach uses a transformer-based encoder that integrates a sliding window attention to attempt finding discriminative short and long cooccurrences of language dependent n-gram phonetic units. We then evaluate and compare the use of different phoneme recognizers (Brno and Allosaurus) and sub-unit tokenizers to help select the more discriminative n-grams. The proposed architecture is evaluated using the Kalaka-3 database that contains clean and noisy audio recordings for very similar languages (i.e. Iberian languages, e.g., Spanish, Galician, Catalan). We provide results using the Cavg and accuracy metrics used in NIST evaluations. The experimental results show that our proposed approach outperforms by 21% of relative improvement to the best system presented in the Albayzin LR competition.
| Original language | English |
|---|---|
| Title of host publication | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 6872-6876 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781665405409 |
| ISBN (Print) | 9781665405409 |
| DOIs | |
| State | Published - 2022 |
| Event | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: 23 May 2022 → 27 May 2022 |
Publication series
| Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
|---|---|
| Volume | 2022-May |
Conference
| Conference | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 |
|---|---|
| Country/Territory | Singapore |
| City | Virtual, Online |
| Period | 23/05/22 → 27/05/22 |
Bibliographical note
Funding Information:This work has been supported by the Spanish projects AMIC (MINECO, TIN2017-85854-C4-4-R) and CAVIAR (MINECO, TEC2017-84593-C2-1-R) projects partially funded by the European Union. We also gratefully acknowledge the support of the Universidad Politécnica Salesiana.
Publisher Copyright:
© 2022 IEEE
Keywords
- acoustic systems
- Language recognition
- phonotactic information
- transformers
CACES Knowledge Areas
- 316A Software and Applications Development and Analysis
Fingerprint
Dive into the research topics of 'PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver