Uso de técnicas basadas en one-shot learning para la identificación del locutor

Juan Chica; Christian Salamea

doi:10.26342/2020-64-12

Uso de técnicas basadas en one-shot learning para la identificación del locutor

Juan Chica, Christian Salamea

Grupo de Investigación en Interacción, Robótica y Automática (GIIRA)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

Resumen

A speaker identification system in order to be effective requires a large number of audio samples of each speaker, which are not always accessible or easy to collect. In contrast, systems based on meta-learning like one-shot learning, use a single sample to differentiate between classes. This work evaluates the potential of applying the meta-learning approach to text-independent speaker identification tasks. In the experimentation mel spectrogram, i-vectors and resample (downsampling) are used to both process the audio signal and to obtain a feature vector. This feature vector is the input of a siamese neural network that is responsible for performing the identification task. The best result was obtained by differentiating between 4 speakers with an accuracy of 0.9. The obtained results show that one-shot learning approaches have great potential to be used speaker identification and could be very useful in a real field like biometrics or forensic because of its versatility.

Título traducido de la contribución	Speaker identification using techniques based on one-shot learning
Idioma original	Español
Páginas (desde-hasta)	101-108
Número de páginas	8
Publicación	Procesamiento de Lenguaje Natural
Volumen	64
DOI	https://doi.org/10.26342/2020-64-12
Estado	Publicada - mar. 2020

Nota bibliográfica

Publisher Copyright:
© 2020 Sociedad Espanola para el Procesamiento del Lenguaje Natural. All rights reserved.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Palabras clave

Meta Learning
N-Way clasification
One-Shot learning
Siamese Neural Network
Speaker Identification
Text independent
Voxceleb1

Acceder al documento

10.26342/2020-64-12

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{4d810dec49ae494dbe463441d8954ead,

title = "Uso de t{\'e}cnicas basadas en one-shot learning para la identificaci{\'o}n del locutor",

abstract = "A speaker identification system in order to be effective requires a large number of audio samples of each speaker, which are not always accessible or easy to collect. In contrast, systems based on meta-learning like one-shot learning, use a single sample to differentiate between classes. This work evaluates the potential of applying the meta-learning approach to text-independent speaker identification tasks. In the experimentation mel spectrogram, i-vectors and resample (downsampling) are used to both process the audio signal and to obtain a feature vector. This feature vector is the input of a siamese neural network that is responsible for performing the identification task. The best result was obtained by differentiating between 4 speakers with an accuracy of 0.9. The obtained results show that one-shot learning approaches have great potential to be used speaker identification and could be very useful in a real field like biometrics or forensic because of its versatility.",

keywords = "Meta Learning, N-Way clasification, One-Shot learning, Siamese Neural Network, Speaker Identification, Text independent, Voxceleb1",

author = "Juan Chica and Christian Salamea",

year = "2020",

month = mar,

doi = "10.26342/2020-64-12",

language = "Espa{\~n}ol",

volume = "64",

pages = "101--108",

journal = "Procesamiento de Lenguaje Natural",

issn = "1135-5948",

publisher = "Sociedad Espanola para el Procesamiento del Lenguaje Natural",

}

TY - JOUR

T1 - Uso de técnicas basadas en one-shot learning para la identificación del locutor

AU - Chica, Juan

AU - Salamea, Christian

PY - 2020/3

Y1 - 2020/3

N2 - A speaker identification system in order to be effective requires a large number of audio samples of each speaker, which are not always accessible or easy to collect. In contrast, systems based on meta-learning like one-shot learning, use a single sample to differentiate between classes. This work evaluates the potential of applying the meta-learning approach to text-independent speaker identification tasks. In the experimentation mel spectrogram, i-vectors and resample (downsampling) are used to both process the audio signal and to obtain a feature vector. This feature vector is the input of a siamese neural network that is responsible for performing the identification task. The best result was obtained by differentiating between 4 speakers with an accuracy of 0.9. The obtained results show that one-shot learning approaches have great potential to be used speaker identification and could be very useful in a real field like biometrics or forensic because of its versatility.

AB - A speaker identification system in order to be effective requires a large number of audio samples of each speaker, which are not always accessible or easy to collect. In contrast, systems based on meta-learning like one-shot learning, use a single sample to differentiate between classes. This work evaluates the potential of applying the meta-learning approach to text-independent speaker identification tasks. In the experimentation mel spectrogram, i-vectors and resample (downsampling) are used to both process the audio signal and to obtain a feature vector. This feature vector is the input of a siamese neural network that is responsible for performing the identification task. The best result was obtained by differentiating between 4 speakers with an accuracy of 0.9. The obtained results show that one-shot learning approaches have great potential to be used speaker identification and could be very useful in a real field like biometrics or forensic because of its versatility.

KW - Meta Learning

KW - N-Way clasification

KW - One-Shot learning

KW - Siamese Neural Network

KW - Speaker Identification

KW - Text independent

KW - Voxceleb1

UR - http://www.scopus.com/inward/record.url?scp=85088390416&partnerID=8YFLogxK

U2 - 10.26342/2020-64-12

DO - 10.26342/2020-64-12

M3 - Artículo

AN - SCOPUS:85088390416

SN - 1135-5948

VL - 64

SP - 101

EP - 108

JO - Procesamiento de Lenguaje Natural

JF - Procesamiento de Lenguaje Natural

ER -

Uso de técnicas basadas en one-shot learning para la identificación del locutor

Resumen

Nota bibliográfica

Palabras clave

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto