Uso de técnicas basadas en one-shot learning para la identificación del locutor

Juan Chica, Christian Salamea

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

Resumen

A speaker identification system in order to be effective requires a large number of audio samples of each speaker, which are not always accessible or easy to collect. In contrast, systems based on meta-learning like one-shot learning, use a single sample to differentiate between classes. This work evaluates the potential of applying the meta-learning approach to text-independent speaker identification tasks. In the experimentation mel spectrogram, i-vectors and resample (downsampling) are used to both process the audio signal and to obtain a feature vector. This feature vector is the input of a siamese neural network that is responsible for performing the identification task. The best result was obtained by differentiating between 4 speakers with an accuracy of 0.9. The obtained results show that one-shot learning approaches have great potential to be used speaker identification and could be very useful in a real field like biometrics or forensic because of its versatility.

Título traducido de la contribuciónSpeaker identification using techniques based on one-shot learning
Idioma originalEspañol
Páginas (desde-hasta)101-108
Número de páginas8
PublicaciónProcesamiento de Lenguaje Natural
Volumen64
DOI
EstadoPublicada - mar. 2020

Nota bibliográfica

Publisher Copyright:
© 2020 Sociedad Espanola para el Procesamiento del Lenguaje Natural. All rights reserved.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Palabras clave

  • Meta Learning
  • N-Way clasification
  • One-Shot learning
  • Siamese Neural Network
  • Speaker Identification
  • Text independent
  • Voxceleb1

Huella

Profundice en los temas de investigación de 'Uso de técnicas basadas en one-shot learning para la identificación del locutor'. En conjunto forman una huella única.

Citar esto