Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Combining Synthetic Minority Over-Sampling Technique and Multinomial Naive Bayes for Sentiment Analysis on Imbalanced Social Media Datasets

Producción científica: Capítulo del libro/informe/acta de congresoContribución de conferenciarevisión exhaustiva

Resumen

Sentiment analysis on social media poses a significant challenge for researchers in the field of Natural Language Processing (NLP) due to the informal, ambiguous, and dynamic nature of the language used by users. This research proposes a methodology that combines the Synthetic Minority Over-sampling Technique (SMOTE) with the Multinomial Naive Bayes (MNB) classifier to enhance performance in sentiment classification tasks on imbalanced datasets. The methodological process includes text cleaning, stopword removal, and lemmatization, followed by vectorization using Term Frequency–Inverse Document Frequency (TF-IDF) to represent lexical features. The Chi-squared test is applied to select the most discriminative features, and hyperparameter optimization is carried out using GridSearchCV with cross-validation. The method was evaluated using a cyberbullying dataset of posts labeled with positive and negative polarity. Evaluation metrics include accuracy, precision, recall, F1-score, and the confusion matrix. Experimental results demonstrate that the proposed approach improves model performance, achieving an accuracy of 88.99%, a precision of 89.14%, a recall of 88.99%, and an F1-score of 88.85%, showing the effectiveness of the SMOTE + Naive Bayes combination in mitigating class imbalance.

Idioma originalInglés
Título de la publicación alojadaInformation and Communication Technologies - 13th Ecuadorian Conference, TICEC 2025, Proceedings
EditoresSantiago Berrezueta, Tatiana Gualotuña, Efrain R. Fonseca C., Germania Rodriguez Morales, Jorge Maldonado-Mahauad
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas3-17
Número de páginas15
ISBN (versión impresa)9783032083654
DOI
EstadoPublicada - 2026
Evento13th Ecuadorian Conference on Information and Communication Technologies, TICEC 2025 - Quito, Ecuador
Duración: 16 oct. 202517 oct. 2025

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen2707 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia13th Ecuadorian Conference on Information and Communication Technologies, TICEC 2025
País/TerritorioEcuador
CiudadQuito
Período16/10/2517/10/25

Nota bibliográfica

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

Huella

Profundice en los temas de investigación de 'Combining Synthetic Minority Over-Sampling Technique and Multinomial Naive Bayes for Sentiment Analysis on Imbalanced Social Media Datasets'. En conjunto forman una huella única.

Citar esto