A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.
|Número de páginas||5|
|Publicación||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|Estado||Publicada - 2014|
|Evento||15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapur|
Duración: 14 sep. 2014 → 18 sep. 2014
Nota bibliográficaPublisher Copyright:
Copyright © 2014 ISCA.