Resumen
Detecting malware is crucial to avoid severe damage to a computer system. However, doing it by training Machine Learning algorithms can present complications since often there is imbalanced data. Therefore, one of the challenges faced by binary classification is learning to clearly distinguish between two classes when you have a much larger number of instances of one class than another. To decrease bias and to handle imbalance, some techniques increase or reduce the number of cases of the minority and majority classes, respectively. This paper analyzes the performance of three cost-sensitive classifiers, LR, DT, and RF, trained with an imbalanced malware detection dataset and four artificial datasets built using Near Miss, SMOTE, SMOTEENN, and SMOTETomek re-sample techniques. The results show that Near Miss achieves a proper balance between the classes so that the algorithms increase their overall performance, reaching balanced accuracies greater than 95%. On the other hand, the rest of the techniques slightly increase the ability of the classifiers to identify objects of the minority class. Meanwhile, Random Forest achieved balanced and high performance. Besides, the training and testing times for oversampling or hybrid techniques are far superior to those obtained by undersampling since the latter reduces the number of instances processed by the models.
Idioma original | Inglés |
---|---|
Título de la publicación alojada | Intelligent Systems and Applications - Proceedings of the 2023 Intelligent Systems Conference IntelliSys Volume 1 |
Editores | Kohei Arai |
Editorial | Springer Science and Business Media Deutschland GmbH |
Páginas | 496-507 |
Número de páginas | 12 |
ISBN (versión impresa) | 9783031477201 |
DOI | |
Estado | Publicada - 2024 |
Evento | Intelligent Systems Conference, IntelliSys 2023 - Amsterdam, Países Bajos Duración: 7 sep. 2023 → 8 sep. 2023 |
Serie de la publicación
Nombre | Lecture Notes in Networks and Systems |
---|---|
Volumen | 822 |
ISSN (versión impresa) | 2367-3370 |
ISSN (versión digital) | 2367-3389 |
Conferencia
Conferencia | Intelligent Systems Conference, IntelliSys 2023 |
---|---|
País/Territorio | Países Bajos |
Ciudad | Amsterdam |
Período | 7/09/23 → 8/09/23 |
Nota bibliográfica
Publisher Copyright:© 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.