Use this link to cite:
https://hdl.handle.net/2183/46476 Improving the Multi-Class Classification of Non-Functional Requirements in Spanish: A Study of Dataset Balancing and Performance
Loading...
Identifiers
Publication date
Authors
Advisors
Other responsabilities
Journal Title
Bibliographic citation
Limaylla-Lunarejo, M., Condori-Fernandez, N., Rodríguez Luaces, M. et al. Improving the Multi-Class Classification of Non-Functional Requirements in Spanish: A Study of Dataset Balancing and Performance. Empir Software Eng 31, 6 (2026). https://doi.org/10.1007/s10664-025-10736-9
Type of academic work
Academic degree
Abstract
Context
In recent years, the multi-class classification of non-functional requirements has seen improvements through the use of Machine Learning algorithms. However, challenges such as data scarcity and class imbalance persist, particularly for languages other than English, such as Spanish.
Objective
This study aims to analyze the performance metrics of Machine Learning algorithms for classifying non-functional requirements translated into and originally written in Spanish. It evaluates the effectiveness of dataset balancing techniques and conducts cross-dataset validation to assess the generalizability of the models.
Method
A dataset balancing process was conducted using a combination of oversampling and undersampling techniques. Six algorithms were trained in two experiments using a hyperparameter tuning process, employing two different datasets: PROMISE_exp_translated and the newly PROMISE_exp_balanced. The best-performing models were further tested on unseen data to evaluate their generalizability.
Results
Logistic Regression and Naive Bayes demonstrated superior performance on the translated dataset, achieving f1-scores of 82% and 81%, respectively. Although overall performance decreased on the balanced dataset, specific underrepresented classes such as Portability and Fault Tolerance benefited from the balancing process.
Conclusion
Shallow Machine Learning algorithms are effective for classifying Spanish non-functional requirements, particularly when addressing data imbalance. The study highlights the importance of dataset balancing in improving classification performance for specific classes and provides insights into the challenges of generalizing models across datasets.
Description
Editor version
Rights
© The Author(s) 2025
Attribution 4.0 International
Attribution 4.0 International








