Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition

L. F. D'Haro; R. Cordoba; C. Salamea; J. D. Echeverry

doi:10.1109/ICASSP.2014.6854623

Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition

L. F. D'Haro, R. Cordoba, C. Salamea, J. D. Echeverry

Grupo de Investigación en Interacción, Robótica y Automática (GIIRA)

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución de conferencia › revisión exhaustiva

32 Citas (Scopus)

Resumen

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of C_avg (we also present results for the official metric during the evaluation, F_act). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in C_avg.

Idioma original	Inglés
Título de la publicación alojada	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Editorial	Institute of Electrical and Electronics Engineers Inc.
Páginas	5342-5346
Número de páginas	5
ISBN (versión impresa)	9781479928927
DOI	https://doi.org/10.1109/ICASSP.2014.6854623
Estado	Publicada - 2014
Evento	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italia Duración: 4 may. 2014 → 9 may. 2014

Serie de la publicación

Nombre	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (versión impresa)	1520-6149

Conferencia

Conferencia	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
País/Territorio	Italia
Ciudad	Florence
Período	4/05/14 → 9/05/14

Acceder al documento

10.1109/ICASSP.2014.6854623

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

D'Haro, L. F., Cordoba, R., Salamea, C., & Echeverry, J. D. (2014). Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition. En 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 (pp. 5342-5346). Artículo 6854623 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854623

D'Haro, L. F. ; Cordoba, R. ; Salamea, C. et al. / Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition. 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 5342-5346 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{1926291b94bd4ae281cc8fc284d981c0,

title = "Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition",

abstract = "This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.",

keywords = "dimensionality reduction, Phone Log-Likelihood Ratios, SDC",

author = "D'Haro, {L. F.} and R. Cordoba and C. Salamea and Echeverry, {J. D.}",

year = "2014",

doi = "10.1109/ICASSP.2014.6854623",

language = "English",

isbn = "9781479928927",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "5342--5346",

booktitle = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014",

address = "United States",

note = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 ; Conference date: 04-05-2014 Through 09-05-2014",

}

D'Haro, LF, Cordoba, R, Salamea, C & Echeverry, JD 2014, Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition. En 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014., 6854623, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 5342-5346, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italia, 4/05/14. https://doi.org/10.1109/ICASSP.2014.6854623

Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition. / D'Haro, L. F.; Cordoba, R.; Salamea, C. et al.
2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 5342-5346 6854623 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución de conferencia › revisión exhaustiva

TY - GEN

T1 - Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition

AU - D'Haro, L. F.

AU - Cordoba, R.

AU - Salamea, C.

AU - Echeverry, J. D.

PY - 2014

Y1 - 2014

N2 - This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.

AB - This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.

KW - dimensionality reduction

KW - Phone Log-Likelihood Ratios

KW - SDC

UR - http://www.scopus.com/inward/record.url?scp=84905222864&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854623

DO - 10.1109/ICASSP.2014.6854623

M3 - Conference contribution

AN - SCOPUS:84905222864

SN - 9781479928927

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5342

EP - 5346

BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

Y2 - 4 May 2014 through 9 May 2014

ER -

D'Haro LF, Cordoba R, Salamea C, Echeverry JD. Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition. En 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 5342-5346. 6854623. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2014.6854623

Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition

Resumen

Serie de la publicación

Conferencia

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto