Dealing with heterogeneity in the context of distributed feature selection for classification

Morillo-Salas, José Luis; Bolón-Canedo, Verónica; Alonso-Betanzos, Amparo

dc.contributor.author	Morillo-Salas, José Luis
dc.contributor.author	Bolón-Canedo, Verónica
dc.contributor.author	Alonso-Betanzos, Amparo
dc.date.accessioned	2024-04-01T17:31:36Z
dc.date.available	2024-04-01T17:31:36Z
dc.date.issued	2021
dc.identifier.citation	Morillo-Salas, J.L., Bolón-Canedo, V. & Alonso-Betanzos, A. Dealing with heterogeneity in the context of distributed feature selection for classification. Knowl Inf Syst 63, 233–276 (2021). https://doi.org/10.1007/s10115-020-01526-4	es_ES
dc.identifier.issn	0219-1377
dc.identifier.issn	0219-3116
dc.identifier.uri	http://hdl.handle.net/2183/36033
dc.description	This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10115-020-01526-4.	es_ES
dc.description.abstract	[Abstract]: Advances in the information technologies have greatly contributed to the advent of larger datasets. These datasets often come from distributed sites, but even so, their large size usually means they cannot be handled in a centralized manner. A possible solution to this problem is to distribute the data over several processors and combine the different results. We propose a methodology to distribute feature selection processes based on selecting relevant and discarding irrelevant features. This preprocessing step is essential for current high-dimensional sets, since it allows the input dimension to be reduced. We pay particular attention to the problem of data imbalance, which occurs because the original dataset is unbalanced or because the dataset becomes unbalanced after data partitioning. Most works approach unbalanced scenarios by oversampling, while our proposal tests both over- and undersampling strategies. Experimental results demonstrate that our distributed approach to classification obtains comparable accuracy results to a centralized approach, while reducing computational time and efficiently dealing with data imbalance.	es_ES
dc.description.sponsorship	This research has been financially supported in part by the Spanish Ministerio de Economía y Competitividad (research projects TIN2015-65069-C2-1-R and PID2019-109238GB-C22), by European Union FEDER funds and by the Consellería de Industria of the Xunta de Galicia (research project ED431C 2018/34). Financial support from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016–2019) and the European Union (European Regional Development Fund—ERDF), is gratefully acknowledged (research project ED431G 2019/01).	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C 2018/34	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G 2019/01	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Springer	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2015-65069-C2-1-R/ES/ALGORITMOS ESCALABLES DE APRENDIZAJE COMPUTACIONAL: MAS ALLA DE LA CLASIFICACION Y LA REGRESION	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-109238GB-C22/ES/APRENDIZAJE AUTOMATICO ESCALABLE Y EXPLICABLE	es_ES
dc.relation.uri	https://doi.org/10.1007/s10115-020-01526-4	es_ES
dc.rights	Todos os dereitos reservados. All rights reserved.	es_ES
dc.subject	Feature selection	es_ES
dc.subject	Distributed learning	es_ES
dc.subject	Unbalanced data	es_ES
dc.subject	Oversampling	es_ES
dc.title	Dealing with heterogeneity in the context of distributed feature selection for classification	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Knowledge and Information Systems	es_ES
UDC.volume	63	es_ES
UDC.startPage	233	es_ES
UDC.endPage	276	es_ES
dc.identifier.doi	10.1007/s10115-020-01526-4

Ficheiros no ítem

Nome:: Morillo_Salas_JoseL_2020_Deali ...
Tamaño:: 2.781Mb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-LIDIA - Artigos [54]

Mostrar o rexistro simple do ítem