Performance of Machine Learning Classifiers for Malware Detection Over Imbalanced Data

Paulina Morillo, Diego Bahamonde, Wilian Tapia

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva


Detecting malware is crucial to avoid severe damage to a computer system. However, doing it by training Machine Learning algorithms can present complications since often there is imbalanced data. Therefore, one of the challenges faced by binary classification is learning to clearly distinguish between two classes when you have a much larger number of instances of one class than another. To decrease bias and to handle imbalance, some techniques increase or reduce the number of cases of the minority and majority classes, respectively. This paper analyzes the performance of three cost-sensitive classifiers, LR, DT, and RF, trained with an imbalanced malware detection dataset and four artificial datasets built using Near Miss, SMOTE, SMOTEENN, and SMOTETomek re-sample techniques. The results show that Near Miss achieves a proper balance between the classes so that the algorithms increase their overall performance, reaching balanced accuracies greater than 95%. On the other hand, the rest of the techniques slightly increase the ability of the classifiers to identify objects of the minority class. Meanwhile, Random Forest achieved balanced and high performance. Besides, the training and testing times for oversampling or hybrid techniques are far superior to those obtained by undersampling since the latter reduces the number of instances processed by the models.

Idioma originalInglés
Título de la publicación alojadaIntelligent Systems and Applications - Proceedings of the 2023 Intelligent Systems Conference IntelliSys Volume 1
EditoresKohei Arai
EditorialSpringer Science and Business Media Deutschland GmbH
Número de páginas12
ISBN (versión impresa)9783031477201
EstadoPublicada - 2024
EventoIntelligent Systems Conference, IntelliSys 2023 - Amsterdam, Países Bajos
Duración: 7 sep. 20238 sep. 2023

Serie de la publicación

NombreLecture Notes in Networks and Systems
ISSN (versión impresa)2367-3370
ISSN (versión digital)2367-3389


ConferenciaIntelligent Systems Conference, IntelliSys 2023
País/TerritorioPaíses Bajos

Nota bibliográfica

Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.


Profundice en los temas de investigación de 'Performance of Machine Learning Classifiers for Malware Detection Over Imbalanced Data'. En conjunto forman una huella única.

Citar esto