Enabling the Latent Semantic Analysis of Large-Scale Information Retrieval Datasets by Means of Out-of-Core Heterogeneous Systems

Gabriel A. León-Paredes, Liliana I. Barbosa-Santillán, Antonio Pareja-Lora

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

1 Cita (Scopus)

Resumen

Latent Semantic Analysis (LSA) has already been widely and successfully applied in many applications for Natural Language Processing (NLP), usually working with fairly small or average sized datasets and no actual time constraints. Even so, LSA is a high time and space consuming task, which complicates its integration in real-time NLP applications (as, for example, information retrieval or question answering) on large-scale datasets. For this reason, an implementation of LSA that can both allow and accelerate as much as possible its execution on large-scale datasets would be most useful in these data-intensive, real-time NLP scenarios. However, to the best of our knowledge, such an implementation of LSA has not been achieved so far. Towards this end, a new, out-of-core, scalable, heterogeneous LSA (hLSA) system has been built and run on the clinical decision support large-scale dataset from the Text REtrieval Conference (TREC) 2015 competition. Results show that the out-of-core hLSA system can process this large-scale dataset (that is, 631,302 documents) with a full-ranked term-document matrix of 566 GB fairly fast and, besides, with a better precision (at least for one of the topics) than the TREC 2015 competing systems.

Idioma originalInglés
Título de la publicación alojadaSmart Technologies, Systems and Applications - 1st International Conference, SmartTech-IC 2019, Proceedings
EditoresFabián R. Narváez, Diego F. Vallejo, Paulina A. Morillo, Julio R. Proaño
EditorialSpringer
Páginas105-119
Número de páginas15
ISBN (versión impresa)9783030467845
DOI
EstadoPublicada - 1 ene. 2020
Evento1st International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2019 - Quito, Ecuador
Duración: 2 dic. 20194 dic. 2019

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen1154 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia1st International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2019
País/TerritorioEcuador
CiudadQuito
Período2/12/194/12/19

Nota bibliográfica

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Huella

Profundice en los temas de investigación de 'Enabling the Latent Semantic Analysis of Large-Scale Information Retrieval Datasets by Means of Out-of-Core Heterogeneous Systems'. En conjunto forman una huella única.

Citar esto