Skip to main navigation Skip to search Skip to main content

Evaluating Word Embedding Models in Ecuadorian Legal Texts: A Comparison of CBOW and Skip-Gram for Semantic Analysis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study evaluates the effectiveness of the Continuous Bag-of-Words (CBOW) and Skip-gram models in capturing semantic relationships within Ecuadorian legal texts. Utilizing a comprehensive corpus that includes the Ecuadorian Constitution, the Comprehensive Organic Criminal Code (COIP), and the General Organic Code of Processes (COGEP), among other national laws, we analyze the models’ ability to represent the complex semantics of legal language. The CBOW model predicts target words based on their surrounding context, while Skip-gram predicts the context from a given target word, making them suitable for identifying intricate patterns in legal documents. A rigorous preprocessing phase was applied to the legal texts, including normalization, stopword removal, and lemmatization, ensuring high-quality input data for training. The models were then evaluated using semantic similarity (Spearman’s correlation) and topic coherence metrics. Results indicate that while both models show potential in capturing semantic relationships, CBOW demonstrated a marginally higher performance with a Spearman correlation of 0.24 and a topic coherence score of 0.6637, compared to Skip-gram’s 0.19 and 0.6573, respectively. Despite these findings, neither model fully captured the complexities inherent in legal language, suggesting a need for further refinement in NLP techniques for legal texts. These findings provide a foundation for improving semantic search and information retrieval systems tailored to the legal domain, offering tools to assist legal professionals in analyzing and understanding complex legal texts.

Original languageEnglish
Title of host publicationSmart Technologies, Systems and Applications - 4th International Conference, SmartTech-IC 2024, Revised Selected Papers
EditorsFabián R. Narváez, Micaela N. Villa, Gloria M. Díaz
PublisherSpringer Science and Business Media Deutschland GmbH
Pages206-216
Number of pages11
ISBN (Print)9783031982866
DOIs
StatePublished - 2026
Event4th International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2024 - Quito, Ecuador
Duration: 2 Dec 20244 Dec 2024

Publication series

NameCommunications in Computer and Information Science
Volume2392 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference4th International Conference on Smart Technologies, Systems and Applications, SmartTech-IC 2024
Country/TerritoryEcuador
CityQuito
Period2/12/244/12/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 16 - Peace, Justice and Strong Institutions
    SDG 16 Peace, Justice and Strong Institutions

Keywords

  • AI in Law
  • CBOW
  • Ecuadorian Law
  • Legal Texts
  • Natural Language Processing
  • Semantic Similarity
  • Skip-gram
  • Word Embeddings

Fingerprint

Dive into the research topics of 'Evaluating Word Embedding Models in Ecuadorian Legal Texts: A Comparison of CBOW and Skip-Gram for Semantic Analysis'. Together they form a unique fingerprint.

Cite this