Similarity Visualizer Using Natural Language Processing in Academic Documents of the DSpace in Ecuador

Diego Vallejo-Huanga, Janneth Jaime, Carlos Andrade

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Due to the widespread use of the Internet, users have the ease of accessing collections of university academic documents stored in virtual libraries whose information is of an unstructured type. In recent years, the production and publication of scientific documents in Ecuador have increased considerably, so the search and classification of documents is a fundamental task within information retrieval computer systems. Intelligent search systems allow found information with a high degree of accuracy and similarity. For the development of this project, academic documents from the Ecuador Network of Open Access Repositories (RRAAE) were retrieved using a glossary of terms in the area of science and technology. For the recovery of documents, the web scraping technique was used and its results were stored in a cloud database in JSON format. In the recovered documents, NLP techniques were applied to clean and homogenize the unstructured information. Two similarity metrics were used to measure the divergence between the retrieved documents, and similarity matrices were generated based on the title, keywords, and abstract, which were then unified into a weighted matrix. The results of the system are displayed in a web interface that, through the use of graphs, shows the relationship between the linked documents. The operation of the similarity system was validated through functional tests through experimentation with a collection of 30 queries with indexed and non-indexed terms in the input of the information retrieval system. The experiments showed that for indexed terms, the system performs better.

Original languageEnglish
Title of host publicationInformation for a Better World
Subtitle of host publicationNormality, Virtuality, Physicality, Inclusivity - 18th International Conference, iConference 2023, Proceedings
EditorsIsaac Sserwanga, Anne Goulding, Heather Moulaison-Sandy, Jia Tina Du, António Lucas Soares, Viviane Hessami, Rebecca D. Frank
PublisherSpringer Science and Business Media Deutschland GmbH
Pages343-359
Number of pages17
ISBN (Print)9783031280313
DOIs
StatePublished - 2023
Event18th International Conference on Information for a Better World: Normality, Virtuality, Physicality, Inclusivity, iConference 2023 - Virtual, Online
Duration: 13 Mar 202317 Mar 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13972 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Information for a Better World: Normality, Virtuality, Physicality, Inclusivity, iConference 2023
CityVirtual, Online
Period13/03/2317/03/23

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Keywords

  • Artificial intelligence
  • Connected papers
  • DSpace
  • Graphs
  • Jaccard
  • University repositories
  • Vector cosine similarity
  • Web scraping

Fingerprint

Dive into the research topics of 'Similarity Visualizer Using Natural Language Processing in Academic Documents of the DSpace in Ecuador'. Together they form a unique fingerprint.

Cite this