Language Recognition using phonotactic-based Shifted Delta Coefficients and multiple phone recognizers

Luis Fernando D'Haro; Ricardo Cordoba; Christian Salamea; Javier Ferreiros

Language Recognition using phonotactic-based Shifted Delta Coefficients and multiple phone recognizers

Luis Fernando D'Haro, Ricardo Cordoba, Christian Salamea, Javier Ferreiros

Grupo de Investigación en Interacción, Robótica y Automática (GIIRA)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

4 Citas (Scopus)

Resumen

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

Idioma original	Inglés
Páginas (desde-hasta)	3042-3046
Número de páginas	5
Publicación	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Estado	Publicada - 2014
Evento	15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapur Duración: 14 sep. 2014 → 18 sep. 2014

Nota bibliográfica

Publisher Copyright:
Copyright © 2014 ISCA.

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{b7af1fbf37dc4f0eae3099f48a9be84c,

title = "Language Recognition using phonotactic-based Shifted Delta Coefficients and multiple phone recognizers",

abstract = "A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.",

keywords = "Language recognition, Parallel phone recognizers, Phone-log likelihood ratios, SDC",

author = "D'Haro, {Luis Fernando} and Ricardo Cordoba and Christian Salamea and Javier Ferreiros",

note = "Publisher Copyright: Copyright {\textcopyright} 2014 ISCA.; 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 ; Conference date: 14-09-2014 Through 18-09-2014",

year = "2014",

language = "English",

pages = "3042--3046",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Language Recognition using phonotactic-based Shifted Delta Coefficients and multiple phone recognizers. / D'Haro, Luis Fernando; Cordoba, Ricardo; Salamea, Christian et al.
En: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2014, p. 3042-3046.

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

TY - JOUR

T1 - Language Recognition using phonotactic-based Shifted Delta Coefficients and multiple phone recognizers

AU - D'Haro, Luis Fernando

AU - Cordoba, Ricardo

AU - Salamea, Christian

AU - Ferreiros, Javier

PY - 2014

Y1 - 2014

N2 - A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

AB - A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

KW - Language recognition

KW - Parallel phone recognizers

KW - Phone-log likelihood ratios

KW - SDC

UR - http://www.scopus.com/inward/record.url?scp=84910028546&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:84910028546

SN - 2308-457X

SP - 3042

EP - 3046

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014

Y2 - 14 September 2014 through 18 September 2014

ER -

Language Recognition using phonotactic-based Shifted Delta Coefficients and multiple phone recognizers

Resumen

Nota bibliográfica

Otros archivos y enlaces

Huella

Citar esto