Improving the Multi-Class Classification of Non-Functional Requirements in Spanish: A Study of Dataset Balancing and Performance
| UDC.coleccion | Investigación | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.grupoInv | Laboratorio de Bases de Datos (LBD) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.issue | 6 | |
| UDC.journalTitle | Empirical Software Engineering | |
| UDC.volume | 31 | |
| dc.contributor.author | Limaylla-Lunarejo, María-Isabel | |
| dc.contributor.author | Condori Fernandez, Nelly | |
| dc.contributor.author | Rodríguez Luaces, Miguel | |
| dc.contributor.author | Karras, O. | |
| dc.date.accessioned | 2025-11-18T08:23:20Z | |
| dc.date.available | 2025-11-18T08:23:20Z | |
| dc.date.issued | 2025-11-02 | |
| dc.description.abstract | Context In recent years, the multi-class classification of non-functional requirements has seen improvements through the use of Machine Learning algorithms. However, challenges such as data scarcity and class imbalance persist, particularly for languages other than English, such as Spanish. Objective This study aims to analyze the performance metrics of Machine Learning algorithms for classifying non-functional requirements translated into and originally written in Spanish. It evaluates the effectiveness of dataset balancing techniques and conducts cross-dataset validation to assess the generalizability of the models. Method A dataset balancing process was conducted using a combination of oversampling and undersampling techniques. Six algorithms were trained in two experiments using a hyperparameter tuning process, employing two different datasets: PROMISE_exp_translated and the newly PROMISE_exp_balanced. The best-performing models were further tested on unseen data to evaluate their generalizability. Results Logistic Regression and Naive Bayes demonstrated superior performance on the translated dataset, achieving f1-scores of 82% and 81%, respectively. Although overall performance decreased on the balanced dataset, specific underrepresented classes such as Portability and Fault Tolerance benefited from the balancing process. Conclusion Shallow Machine Learning algorithms are effective for classifying Spanish non-functional requirements, particularly when addressing data imbalance. The study highlights the importance of dataset balancing in improving classification performance for specific classes and provides insights into the challenges of generalizing models across datasets. | |
| dc.description.sponsorship | Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was partially supported by the following grants: TED2021-129245B-C21 (PLAGEMIS), partially funded by MCIN/AEI/10.13039/501100011033 and “NextGenerationEU”/PRTR; PID2022-141027NB-C21 (EarthDL), partially funded by MCIN/AEI/10.13039/501100011033 and the EU/ERDF, "A way of making Europe" (1st and 3rd authors); and ED431G2019/04 and ED431C2022/19, partially funded by the Galician Ministry of Culture, Education, Professional Training, and University (2nd author). Funding for open access charge: Universidade da Coruña/CISUG. | |
| dc.description.sponsorship | Xunta de Galicia; ED431G2019/04 | |
| dc.description.sponsorship | Xunta de Galicia; ED431C2022/19 | |
| dc.identifier.citation | Limaylla-Lunarejo, M., Condori-Fernandez, N., Rodríguez Luaces, M. et al. Improving the Multi-Class Classification of Non-Functional Requirements in Spanish: A Study of Dataset Balancing and Performance. Empir Software Eng 31, 6 (2026). https://doi.org/10.1007/s10664-025-10736-9 | |
| dc.identifier.issn | 1573-7616 | |
| dc.identifier.issn | 1382-3256 | |
| dc.identifier.uri | https://hdl.handle.net/2183/46476 | |
| dc.language.iso | eng | |
| dc.publisher | Springer Nature | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/TED2021-129245B-C21/ES/PLATAFORMA PARA LA GENERACIÓN AUTOMÁTICA DE SISTEMAS DE INFORMACIÓN DE LA MOVILIDAD ENERGÉTICAMENTE EFICIENTES, BASADOS EN ESTRUCTURAS DE DATOS COMPACTAS Y GIS (PLAGEMIS) | |
| dc.relation.projectID | info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-141027NB-C21/ES/MODELADO, DESCUBRIMIENTO, EXPLORACION Y ANALISIS DE DATA LAKES MEDIOAMBIENTALES [UDC] | |
| dc.relation.uri | https://doi.org/10.1007/s10664-025-10736-9 | |
| dc.rights | © The Author(s) 2025 | |
| dc.rights | Attribution 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Non-functional requirement | |
| dc.subject | Multi-class classification | |
| dc.subject | Dataset balancing | |
| dc.subject | ChatGPT prompt | |
| dc.subject | Spanish language | |
| dc.title | Improving the Multi-Class Classification of Non-Functional Requirements in Spanish: A Study of Dataset Balancing and Performance | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | c54bcf67-9180-4102-a87a-5c916898925f | |
| relation.isAuthorOfPublication | fbde3bd9-d786-4ef0-89ec-6af2091fa415 | |
| relation.isAuthorOfPublication.latestForDiscovery | c54bcf67-9180-4102-a87a-5c916898925f |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- LimayllaLunarejo_M_2025_Improving_multi_class_non_funct.pdf
- Size:
- 2.61 MB
- Format:
- Adobe Portable Document Format

