Skip to main navigation Skip to search Skip to main content

Developing a Robust Method for Diabetes Prediction with Machine Learning and Ensemble Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper addresses the problem of identifying risk factors associated with diabetes using advanced machine learning techniques. The method used is based on combining rigorous data preparation with an exhaustive evaluation of multiple algorithms, optimizing predictive accuracy and facilitating the interpretation of the results. The development of the study is organized in three phases: Data preparation: From a public dataset, loading, detailed analysis of variables, denoising and data transformation are carried out. These steps ensure that the information is of high quality and ready for exploratory and predictive analysis. Classifier testing: Different machine learning algorithms are evaluated, from classical approaches to advanced methods such as J48, KNN, Linear Regression, Multi-Layer Perceptron (MLP), AdaBoost, XGBoost, CatBoost, Gradient Boosting, LightGBM and Random Forest. During this phase, exploratory and predictive analysis is performed to measure the performance of the methods based on seven key metrics. Selection of the best method: The results obtained allow us to identify the best performing method. In this case, Gradient Boosting and Random Forest proved to be the most efficient, while Multilayer Perceptron (MLP) presented the lowest performance in both the training and testing phases. This integrated approach not only ensures an efficient extraction of knowledge from the data, but also provides a detailed comparison of the performance of the methods, allowing to identify the most suitable to address this type of problem.

Original languageEnglish
Title of host publicationProceedings of 10th International Congress on Information and Communication Technology - ICICT 2025
EditorsXin-She Yang, Simon Sherratt, Nilanjan Dey, Amit Joshi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages539-552
Number of pages14
ISBN (Print)9789819664405
DOIs
StatePublished - 2025
Event10th International Congress on Information and Communication Technology, ICICT 2025 - London, United Kingdom
Duration: 18 Feb 202521 Feb 2025

Publication series

NameLecture Notes in Networks and Systems
Volume1416 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference10th International Congress on Information and Communication Technology, ICICT 2025
Country/TerritoryUnited Kingdom
CityLondon
Period18/02/2521/02/25

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Data preprocessing
  • Diabetes detection
  • Ensemble learning
  • Exploratory data analysis (EDA)
  • Machine learning
  • Model performance evaluation

Cite this