Abstract
This paper addresses the problem of identifying risk factors associated with diabetes using advanced machine learning techniques. The method used is based on combining rigorous data preparation with an exhaustive evaluation of multiple algorithms, optimizing predictive accuracy and facilitating the interpretation of the results. The development of the study is organized in three phases: Data preparation: From a public dataset, loading, detailed analysis of variables, denoising and data transformation are carried out. These steps ensure that the information is of high quality and ready for exploratory and predictive analysis. Classifier testing: Different machine learning algorithms are evaluated, from classical approaches to advanced methods such as J48, KNN, Linear Regression, Multi-Layer Perceptron (MLP), AdaBoost, XGBoost, CatBoost, Gradient Boosting, LightGBM and Random Forest. During this phase, exploratory and predictive analysis is performed to measure the performance of the methods based on seven key metrics. Selection of the best method: The results obtained allow us to identify the best performing method. In this case, Gradient Boosting and Random Forest proved to be the most efficient, while Multilayer Perceptron (MLP) presented the lowest performance in both the training and testing phases. This integrated approach not only ensures an efficient extraction of knowledge from the data, but also provides a detailed comparison of the performance of the methods, allowing to identify the most suitable to address this type of problem.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 10th International Congress on Information and Communication Technology - ICICT 2025 |
| Editors | Xin-She Yang, Simon Sherratt, Nilanjan Dey, Amit Joshi |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 539-552 |
| Number of pages | 14 |
| ISBN (Print) | 9789819664405 |
| DOIs | |
| State | Published - 2025 |
| Event | 10th International Congress on Information and Communication Technology, ICICT 2025 - London, United Kingdom Duration: 18 Feb 2025 → 21 Feb 2025 |
Publication series
| Name | Lecture Notes in Networks and Systems |
|---|---|
| Volume | 1416 LNNS |
| ISSN (Print) | 2367-3370 |
| ISSN (Electronic) | 2367-3389 |
Conference
| Conference | 10th International Congress on Information and Communication Technology, ICICT 2025 |
|---|---|
| Country/Territory | United Kingdom |
| City | London |
| Period | 18/02/25 → 21/02/25 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Data preprocessing
- Diabetes detection
- Ensemble learning
- Exploratory data analysis (EDA)
- Machine learning
- Model performance evaluation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver