PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE

David Romero, Luis Fernando D'Haro, Marcos Estecha-Garitagoitia, Christian Salamea

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this paper, we describe a phonotactic language recognition model that effectively manages long and short n-gram input sequences to learn contextual phonotactic-based vector embeddings. Our approach uses a transformer-based encoder that integrates a sliding window attention to attempt finding discriminative short and long cooccurrences of language dependent n-gram phonetic units. We then evaluate and compare the use of different phoneme recognizers (Brno and Allosaurus) and sub-unit tokenizers to help select the more discriminative n-grams. The proposed architecture is evaluated using the Kalaka-3 database that contains clean and noisy audio recordings for very similar languages (i.e. Iberian languages, e.g., Spanish, Galician, Catalan). We provide results using the Cavg and accuracy metrics used in NIST evaluations. The experimental results show that our proposed approach outperforms by 21% of relative improvement to the best system presented in the Albayzin LR competition.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6872-6876
Number of pages5
ISBN (Electronic)9781665405409
DOIs
StatePublished - 2022
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: 23 May 202227 May 2022

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2022-May
ISSN (Print)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period23/05/2227/05/22

Bibliographical note

Funding Information:
This work has been supported by the Spanish projects AMIC (MINECO, TIN2017-85854-C4-4-R) and CAVIAR (MINECO, TEC2017-84593-C2-1-R) projects partially funded by the European Union. We also gratefully acknowledge the support of the Universidad Politécnica Salesiana.

Publisher Copyright:
© 2022 IEEE

Keywords

  • acoustic systems
  • Language recognition
  • phonotactic information
  • transformers

Fingerprint

Dive into the research topics of 'PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE'. Together they form a unique fingerprint.

Cite this