This paper explores a better way to learn word vector representations for language identification (LID). We have focused on a phonotactic approach using phoneme sequences in order to make phonotactic units (phone-grams) to incorporate context information. In order to take into consideration the morphology of phone-grams, we have considered the use of sub-word information (lower-order n-grams) to learn phone-grams embeddings using FastText. These embeddings are used as input to an i-Vector framework to train a multiclass logistic classifier. Our approach has been compared with a LID system that uses phone-gram embeddings learned through Skipgram that do not implement sub-word information, using Cavg as a metric for our experiments. Our approach to LID to incorporate sub-word information in phone-grams embeddings significantly improves the results obtained by using embeddings that are learned ignoring the structure of phone-grams. Furthermore, we have shown that our system provides complementary information to an acoustic system, improving it through the fusion of both systems.
|Title of host publication||Conversational Dialogue Systems for the Next Decade, IWSDS 2020|
|Editors||Luis Fernando D’Haro, Zoraida Callejas, Satoshi Nakamura|
|Publisher||Springer Science and Business Media Deutschland GmbH|
|Number of pages||10|
|State||Published - 2021|
|Event||11th International Workshop on Spoken Dialogue Systems, IWSDS 2020 - Madrid, Spain|
Duration: 21 Sep 2020 → 23 Sep 2020
|Name||Lecture Notes in Electrical Engineering|
|Conference||11th International Workshop on Spoken Dialogue Systems, IWSDS 2020|
|Period||21/09/20 → 23/09/20|
Bibliographical notePublisher Copyright:
© 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Copyright 2020 Elsevier B.V., All rights reserved.