Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

SimilaCode: Programming Source Code Similarity Detection System Based on NLP

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

Resumen

Some tools have been developed in the scientific field to detect similarities in texts; however, some software is not very efficient in detecting plagiarism in programming source codes. In computing, it is expected to find cases of plagiarism in the source code, and there are currently tools that measure the degree of similarity, but they require paid licenses. This scientific article proposes constructing a system that uses Natural Language Processing (NLP), vector space models, and similarity metrics to identify the degree of divergence between pairs of source codes in the Python programming language, with the possibility of extrapolating its applicability to other programming languages. The proposed system is structured in several modules, each with a specific function for both the back-end and front-end of the prototype deployed on the web. The experimentation was carried out using pairs of source codes subjected to modifications at a linguistic and structural level. The results show that our system, Similacode, can detect 100% similarities between source code pairs that have changed their comments. It was observed that the system could identify similarities, even when modifications have been made to the names of variables and functions, reaching levels of similarity higher than 88%. In addition, comparisons were made with two other plagiarism detection tools to assess the degree of similarity, obtaining results with less than 1% differences between the different software. The experiments in Similacode have yielded satisfactory results, demonstrating the system's efficiency in detecting similarities in the analyzed source codes.

Idioma originalInglés
Título de la publicación alojadaProceedings - 2023 15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas171-178
Número de páginas8
ISBN (versión digital)9798350383829
DOI
EstadoPublicada - 2023
Evento15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023 - Bali, Indonesia
Duración: 11 dic. 202313 dic. 2023

Serie de la publicación

NombreProceedings - 2023 15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023

Conferencia

Conferencia15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023
País/TerritorioIndonesia
CiudadBali
Período11/12/2313/12/23

Nota bibliográfica

Publisher Copyright:
© 2023 IEEE.

Areas de Conocimiento del CACES

  • 116A Computación

Huella

Profundice en los temas de investigación de 'SimilaCode: Programming Source Code Similarity Detection System Based on NLP'. En conjunto forman una huella única.

Citar esto