Skip to main navigation Skip to search Skip to main content

New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Breast cancer is one of the most common types of cancer in women, and its early detection significantly improves the survival rate. Although mammography is one of the least invasive and most widely used methods in the diagnostic process, its complexity and subjectivity in medical interpretation present significant challenges. In this article, we propose a new approach that supports the breast cancer diagnosis process by assisting in the classification of mammography images as malignant or benign, or through the BIRADS system. Our proposal consists of two phases. Initially, we implemented the FP-Growth algorithm on patients’ clinical data, analyzing variables such as age and sex to identify frequent patterns. This allows us to explore, group, and visually characterize shared findings and trends among clinical data, which is useful for doctors when creating risk groups or establishing a pre-diagnosis based on the patient’s profile. In this phase, we also prepared the images for training the different models. Subsequently, we combined the strengths of two models through stacking: the Random Forest (RF) model and Convolutional Neural Networks (CNN) with knowledge transfer, to improve image classification and diagnosis. We also explored other methods such as CNN and Support Vector Machine (SVM) to compare the accuracy of the proposed methodology against conventional techniques. The developed models were trained using public datasets: “The Chinese Mammography Database” [2] and “The INbreast database” [3]. The accuracy of the method is evaluated using various classification-related metrics, such as Accuracy, Precision, F1 Score, and Recall. The results show that combining base models using a stacking strategy achieves significantly superior performance compared to individual models, with ideal scores in accuracy, recall, and F1 score using k-fold cross-validation in the meta-model. These excellent results suggest that combining multiple base models more effectively captures the underlying complexities and patterns in the data.

Original languageEnglish
Title of host publicationIntelligent Data Engineering and Automated Learning – IDEAL 2024 - 25th International Conference, Proceedings
EditorsVicente Julian, David Camacho, Hujun Yin, Juan M. Alberola, Vitor Beires Nogueira, Paulo Novais, Antonio Tallón-Ballesteros
PublisherSpringer Science and Business Media Deutschland GmbH
Pages35-45
Number of pages11
ISBN (Print)9783031777370
DOIs
StatePublished - 2025
Event25th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2024 - Valencia, Spain
Duration: 20 Nov 202422 Nov 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15347 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2024
Country/TerritorySpain
CityValencia
Period20/11/2422/11/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Breast Cancer Diagnosis
  • Convolutional Neural Networks
  • Data Science
  • FP-Growth
  • Machine Learning
  • Medical Image Analysis
  • Random Forest
  • Support Vector Machine
  • Transfer Learning

CACES Knowledge Areas

  • 245A Statistics
  • 116A Computer Science

Fingerprint

Dive into the research topics of 'New Approach to Support the Breast Cancer Diagnosis Process Using Frequent Pattern Growth and Stacking Based on Machine Learning Techniques'. Together they form a unique fingerprint.

Cite this