Skip to main navigation Skip to search Skip to main content

A Content-Based System for Discrimination Detection of Ecuadorian Text

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The increase in Internet users and the widespread use of social networks has led to higher rates of discrimination. The lack of technological tools and the inability of social networks to identify cyberbullying has generated psychological affection in minority social groups. This research compiled a glossary of segregative terms in the Ecuadorian context, according to four types of discrimination defined by the United Nations. The discriminatory dictionary was constructed using information from various blogs, research papers, and texts containing segregative expressions. This dictionary allowed the development of a content-based software prototype, using NLP-derived techniques and similarity metrics, to classify the type and degree of discrimination of texts in the Ecuadorian context. The prototype includes a web user interface, which allows for analyzing the corpus of a text and shows the degree of discrimination according to its taxonomy. The prototype’s performance was validated through functional tests by experimenting with a dataset of 40 sentences labeled as discriminatory and non-discriminatory. The experimental process used the pre-trained artificial intelligence ChatGPT model as an external validation method. The experiments showed that the cosine vector approach of the proposed prototype has an accuracy of 88%, in contrast to 50% of ChatGPT, for classifying a discriminative text in the Ecuadorian domain.

Original languageEnglish
Title of host publicationIntelligent Systems and Applications - Proceedings of the 2024 Intelligent Systems Conference IntelliSys Volume 4
EditorsKohei Arai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages284-299
Number of pages16
ISBN (Print)9783031663352
DOIs
StatePublished - 2024
EventIntelligent Systems Conference, IntelliSys 2024 - Amsterdam, Netherlands
Duration: 5 Sep 20246 Sep 2024

Publication series

NameLecture Notes in Networks and Systems
Volume1068 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceIntelligent Systems Conference, IntelliSys 2024
Country/TerritoryNetherlands
CityAmsterdam
Period5/09/246/09/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Keywords

  • Artificial intelligence
  • ChatGPT
  • Cyberbullying
  • Dissimilarity metrics
  • Natural language processing

CACES Knowledge Areas

  • 116A Computer Science

Cite this