Abstract
Software defect prediction is crucial for reducing costs and improving quality. According to a Cutter Consortium report, software defects cause an estimated annual loss of $1.56 trillion in global productivity. Additionally, Tricentis reported that over 30% of software development projects failed due to undetected defects. Undetected defects can increase maintenance costs, delay deliveries, and compromise security, particularly in critical applications such as financial or medical systems. A significant challenge is dealing with imbalanced data, where there are more defect-free modules than defective ones, making detection difficult. This study proposes a four-phase approach: loading and transforming data, using balancing techniques, applying machine learning models, and explaining predictions. Techniques such as SMOTE, ADASYN, and RandomUnderSampling were used to balance the data, applied to models like Random Forest, Gradient Boosting, and SVM. The JM1 dataset, containing software quality metrics and 80% defect-free modules, was used for analysis. Data preprocessing involved imputation, encoding, and normalization. Results show that Random Forest and Gradient Boosting, combined with balancing techniques, achieved the best performance in defect identification. In the future, advanced algorithms such as XGBoost and LightGBM will be explored, and parameter optimization will be conducted to further enhance results. This approach aims to improve defect detection in software and to be applied in other fields.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 10th International Congress on Information and Communication Technology - ICICT 2025 |
| Editors | Xin-She Yang, Simon Sherratt, Nilanjan Dey, Amit Joshi |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 409-420 |
| Number of pages | 12 |
| ISBN (Print) | 9789819664405 |
| DOIs | |
| State | Published - 2025 |
| Event | 10th International Congress on Information and Communication Technology, ICICT 2025 - London, United Kingdom Duration: 18 Feb 2025 → 21 Feb 2025 |
Publication series
| Name | Lecture Notes in Networks and Systems |
|---|---|
| Volume | 1416 LNNS |
| ISSN (Print) | 2367-3370 |
| ISSN (Electronic) | 2367-3389 |
Conference
| Conference | 10th International Congress on Information and Communication Technology, ICICT 2025 |
|---|---|
| Country/Territory | United Kingdom |
| City | London |
| Period | 18/02/25 → 21/02/25 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Keywords
- Class balancing
- Evaluation metrics
- Machine learning models
- Random forest and gradient boosting
- Software defect prediction
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver