Resumen
Sentiment analysis on social media poses a significant challenge for researchers in the field of Natural Language Processing (NLP) due to the informal, ambiguous, and dynamic nature of the language used by users. This research proposes a methodology that combines the Synthetic Minority Over-sampling Technique (SMOTE) with the Multinomial Naive Bayes (MNB) classifier to enhance performance in sentiment classification tasks on imbalanced datasets. The methodological process includes text cleaning, stopword removal, and lemmatization, followed by vectorization using Term Frequency–Inverse Document Frequency (TF-IDF) to represent lexical features. The Chi-squared test is applied to select the most discriminative features, and hyperparameter optimization is carried out using GridSearchCV with cross-validation. The method was evaluated using a cyberbullying dataset of posts labeled with positive and negative polarity. Evaluation metrics include accuracy, precision, recall, F1-score, and the confusion matrix. Experimental results demonstrate that the proposed approach improves model performance, achieving an accuracy of 88.99%, a precision of 89.14%, a recall of 88.99%, and an F1-score of 88.85%, showing the effectiveness of the SMOTE + Naive Bayes combination in mitigating class imbalance.
| Idioma original | Inglés |
|---|---|
| Título de la publicación alojada | Information and Communication Technologies - 13th Ecuadorian Conference, TICEC 2025, Proceedings |
| Editores | Santiago Berrezueta, Tatiana Gualotuña, Efrain R. Fonseca C., Germania Rodriguez Morales, Jorge Maldonado-Mahauad |
| Editorial | Springer Science and Business Media Deutschland GmbH |
| Páginas | 3-17 |
| Número de páginas | 15 |
| ISBN (versión impresa) | 9783032083654 |
| DOI | |
| Estado | Publicada - 2026 |
| Evento | 13th Ecuadorian Conference on Information and Communication Technologies, TICEC 2025 - Quito, Ecuador Duración: 16 oct. 2025 → 17 oct. 2025 |
Serie de la publicación
| Nombre | Communications in Computer and Information Science |
|---|---|
| Volumen | 2707 CCIS |
| ISSN (versión impresa) | 1865-0929 |
| ISSN (versión digital) | 1865-0937 |
Conferencia
| Conferencia | 13th Ecuadorian Conference on Information and Communication Technologies, TICEC 2025 |
|---|---|
| País/Territorio | Ecuador |
| Ciudad | Quito |
| Período | 16/10/25 → 17/10/25 |
Nota bibliográfica
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Huella
Profundice en los temas de investigación de 'Combining Synthetic Minority Over-Sampling Technique and Multinomial Naive Bayes for Sentiment Analysis on Imbalanced Social Media Datasets'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver