On the use of phone-gram units in recurrent neural networks for language identification

Christian Salamea; Luis Fernando D'Haro; Ricardo De Córdoba; Rubén San-Segundo

doi:10.21437/Odyssey.2016-17

On the use of phone-gram units in recurrent neural networks for language identification

Christian Salamea, Luis Fernando D'Haro, Ricardo De Córdoba, Rubén San-Segundo

Research Group on Interaction, Robotics and Automatics (GIIRA)

Research output: Contribution to conference › Paper › peer-review

7 Scopus citations

Abstract

In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.

Original language	English
Pages	117-123
Number of pages	7
DOIs	https://doi.org/10.21437/Odyssey.2016-17
State	Published - 2016
Event	Speaker and Language Recognition Workshop, Odyssey 2016 - Bilbao, Spain Duration: 21 Jun 2016 → 24 Jun 2016

Conference

Conference	Speaker and Language Recognition Workshop, Odyssey 2016
Country/Territory	Spain
City	Bilbao
Period	21/06/16 → 24/06/16

Bibliographical note

Funding Information:
This work has been supported by ASLP-MUL?N (TIN2014-54288-C4-1-R), NAVEGABLE (MICINN, DPI2014-53525-C3-2-R), MA2VICMR (Comunidad Aut?noma de Madrid, S2009/TIC-1542), SENESCYT, and the Universidad Polit?cnica Salesiana de Ecuador.

Funding Information:
This work has been supported by ASLP-MULÁN (TIN2014-54288-C4-1-R), NAVEGABLE (MICINN, DPI2014-53525-C3-2-R), MA2VICMR (Comunidad Autónoma de Madrid, S2009/TIC-1542), SENESCYT, and the Universidad Politécnica Salesiana de Ecuador.

Publisher Copyright:
© Odyssey 2016: Speaker and Language Recognition Workshop. All rights reserved.

Access to Document

10.21437/Odyssey.2016-17

Cite this

@conference{50fcda52a2ed43b5bcb288e511b877f3,

title = "On the use of phone-gram units in recurrent neural networks for language identification",

abstract = "In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.",

author = "Christian Salamea and D'Haro, {Luis Fernando} and {De C{\'o}rdoba}, Ricardo and Rub{\'e}n San-Segundo",

note = "Publisher Copyright: {\textcopyright} Odyssey 2016: Speaker and Language Recognition Workshop. All rights reserved.; Speaker and Language Recognition Workshop, Odyssey 2016 ; Conference date: 21-06-2016 Through 24-06-2016",

year = "2016",

doi = "10.21437/Odyssey.2016-17",

language = "English",

pages = "117--123",

}

TY - CONF

T1 - On the use of phone-gram units in recurrent neural networks for language identification

AU - Salamea, Christian

AU - D'Haro, Luis Fernando

AU - De Córdoba, Ricardo

AU - San-Segundo, Rubén

PY - 2016

Y1 - 2016

N2 - In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.

AB - In this paper we present our results on using RNN-based LM scores trained on different phone-gram orders and using different phonetic ASR recognizers. In order to avoid data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddings as a pre-processing step. Additional experiments to optimize the amount of classes, batch-size, hidden neurons, state-unfolding, are also presented. We have worked with the KALAKA-3 database for the plenty-closed condition [1]. Thanks to our clustering technique and the combination of high level phone-grams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores from an acoustic-based i-vector system and a traditional PPRLM system. This fusion provides additional improvements showing that they provide complementary information to the LID system.

UR - http://www.scopus.com/inward/record.url?scp=85073259296&partnerID=8YFLogxK

U2 - 10.21437/Odyssey.2016-17

DO - 10.21437/Odyssey.2016-17

M3 - Paper

AN - SCOPUS:85073259296

SP - 117

EP - 123

T2 - Speaker and Language Recognition Workshop, Odyssey 2016

Y2 - 21 June 2016 through 24 June 2016

ER -

On the use of phone-gram units in recurrent neural networks for language identification

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this