Abstract
Some tools have been developed in the scientific field to detect similarities in texts; however, some software is not very efficient in detecting plagiarism in programming source codes. In computing, it is expected to find cases of plagiarism in the source code, and there are currently tools that measure the degree of similarity, but they require paid licenses. This scientific article proposes constructing a system that uses Natural Language Processing (NLP), vector space models, and similarity metrics to identify the degree of divergence between pairs of source codes in the Python programming language, with the possibility of extrapolating its applicability to other programming languages. The proposed system is structured in several modules, each with a specific function for both the back-end and front-end of the prototype deployed on the web. The experimentation was carried out using pairs of source codes subjected to modifications at a linguistic and structural level. The results show that our system, Similacode, can detect 100% similarities between source code pairs that have changed their comments. It was observed that the system could identify similarities, even when modifications have been made to the names of variables and functions, reaching levels of similarity higher than 88%. In addition, comparisons were made with two other plagiarism detection tools to assess the degree of similarity, obtaining results with less than 1% differences between the different software. The experiments in Similacode have yielded satisfactory results, demonstrating the system's efficiency in detecting similarities in the analyzed source codes.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 171-178 |
Number of pages | 8 |
ISBN (Electronic) | 9798350383829 |
DOIs | |
State | Published - 2023 |
Event | 15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023 - Bali, Indonesia Duration: 11 Dec 2023 → 13 Dec 2023 |
Publication series
Name | Proceedings - 2023 15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023 |
---|
Conference
Conference | 15th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2023 |
---|---|
Country/Territory | Indonesia |
City | Bali |
Period | 11/12/23 → 13/12/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Code Clone
- Code Plagiarism
- Programming Languages
- Python
- Vector Cosine Model