Factors that Affect i-Vectors Based Language Identification Systems

David Romero, Christian Salamea, Fernando Chica, Erick Narvaez

Resultado de la investigación: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

Resumen

The performance of a language identification (LID) system that uses i-vectors as features depends on several parameters, such as algorithm parameters and data parameters. In this study, an analysis of performance of a language identification system is considered, for which we focused only on data parameters in the “Back End” of the system, analyzing the influence of the amount of data and the speaker variability in the training phases of the UBM and the total variability Matrix T. Also, the Multiclass logistic regression (MLR) classifiers were analyzed, by balancing the classes of the database to train the classifiers on each language. These tests have been carried out in the Kalaka-3 database; we have used the average detection cost function (Cavg) to evaluate the performance. It is shown experimentally that in the training phase of the UBM, speaker variability is more important than a large amount of data. In the training phase of the total variability matrix T a better performance was obtained when a larger number of audios were used. And finally, balancing classes on each language to train the MLR classifiers allowed us to get a better performance only in certain languages. Using all of these proposed variations, we got a Cavg improvement of 37% in a standard language identification system.

Idioma originalInglés
Título de la publicación alojadaSmart Technologies, Systems and Applications - 1st International Conference, SmartTech-IC 2019, Proceedings
EditoresFabián R. Narváez, Diego F. Vallejo, Paulina A. Morillo, Julio R. Proaño
EditorialSpringer
Páginas154-164
Número de páginas11
ISBN (versión impresa)9783030467845
DOI
EstadoPublicada - 1 ene. 2020
Evento1st International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2019 - Quito, Ecuador
Duración: 2 dic. 20194 dic. 2019

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen1154 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia1st International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2019
País/TerritorioEcuador
CiudadQuito
Período2/12/194/12/19

Huella

Profundice en los temas de investigación de 'Factors that Affect i-Vectors Based Language Identification Systems'. En conjunto forman una huella única.

Citar esto