Morillo-Salas, José LuisBolón-Canedo, VerónicaAlonso-Betanzos, Amparo2024-11-192024-03-22Morillo-Salas, J. L., Bolón-Canedo, V., & Alonso-Betanzos, A. (2024). The imbalance problem: A comparison of sampling approaches using different parameters and feature selection methods in the context of classification. Expert Systems, 41(8), e13591. https://doi.org/10.1111/exsy.135910266-47201468-0394http://hdl.handle.net/2183/40195This is the peer reviewed version of the following article: Morillo-Salas, J. L., Bolón-Canedo, V., & Alonso-Betanzos, A. (2024). The imbalance problem: A comparison of sampling approaches using different parameters and feature selection methods in the context of classification. Expert Systems, 41(8), e13591, which has been published in final form at https://doi.org/10.1111/exsy.13591.[Abstract]: A common situation in classification tasks is to deal with unbalanced datasets, an issue that appears when the majority class(es) has a large number of samples compared to the minority class(es). This problem is even more significant when the datasets have a large number of features but only a few samples, as is the case with microarray datasets. Traditionally, an approach to alleviate this problem has been the application of sampling methods to obtain more balanced classes, increasing the number of samples in the minority class (replicating samples or generating new synthetic samples), or decreasing the number of samples in the majority class. In this study, we have compared different balancing methods, including a novel method that applies sampling in both the minority and majority classes. The interest in applying feature selection in combination with balancing methods has also been explored. In view of the results, a recommendation of sampling method, feature selection, and classifier is proposed to improve the classification results according to the type of dataset.engThis article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions (https://authorservices.wiley.com/author-resources/Journal-Authors/licensing/self-archiving.html#3). This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.© 2024 John Wiley & Sons Ltd.Microarray datasetsUnbalanced datasetsOversamplingFeature selectionClassificationThe imbalance problem: A comparison of sampling approaches using different parameters and feature selection methods in the context of classificationjournal articleopen access10.1111/exsy.13591