Resumen
Latent Semantic Analysis (LSA) has already been widely and successfully applied in many applications for Natural Language Processing (NLP), usually working with fairly small or average sized datasets and no actual time constraints. Even so, LSA is a high time and space consuming task, which complicates its integration in real-time NLP applications (as, for example, information retrieval or question answering) on large-scale datasets. For this reason, an implementation of LSA that can both allow and accelerate as much as possible its execution on large-scale datasets would be most useful in these data-intensive, real-time NLP scenarios. However, to the best of our knowledge, such an implementation of LSA has not been achieved so far. Towards this end, a new, out-of-core, scalable, heterogeneous LSA (hLSA) system has been built and run on the clinical decision support large-scale dataset from the Text REtrieval Conference (TREC) 2015 competition. Results show that the out-of-core hLSA system can process this large-scale dataset (that is, 631,302 documents) with a full-ranked term-document matrix of 566 GB fairly fast and, besides, with a better precision (at least for one of the topics) than the TREC 2015 competing systems.
Idioma original | Inglés |
---|---|
Título de la publicación alojada | Smart Technologies, Systems and Applications - 1st International Conference, SmartTech-IC 2019, Proceedings |
Editores | Fabián R. Narváez, Diego F. Vallejo, Paulina A. Morillo, Julio R. Proaño |
Editorial | Springer |
Páginas | 105-119 |
Número de páginas | 15 |
ISBN (versión impresa) | 9783030467845 |
DOI | |
Estado | Publicada - 1 ene. 2020 |
Evento | 1st International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2019 - Quito, Ecuador Duración: 2 dic. 2019 → 4 dic. 2019 |
Serie de la publicación
Nombre | Communications in Computer and Information Science |
---|---|
Volumen | 1154 CCIS |
ISSN (versión impresa) | 1865-0929 |
ISSN (versión digital) | 1865-0937 |
Conferencia
Conferencia | 1st International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2019 |
---|---|
País/Territorio | Ecuador |
Ciudad | Quito |
Período | 2/12/19 → 4/12/19 |
Nota bibliográfica
Publisher Copyright:© Springer Nature Switzerland AG 2020.