Mostrar o rexistro simple do ítem

dc.contributor.authorGonzález-Domínguez, Jorge
dc.contributor.authorBolón-Canedo, Verónica
dc.contributor.authorFreire, Borja
dc.contributor.authorTouriño, Juan
dc.date.accessioned2023-11-29T19:04:58Z
dc.date.available2023-11-29T19:04:58Z
dc.date.issued2019
dc.identifier.citationGonzález-Domínguez, J., Bolón-Canedo, V., Freire, B., & Touriño, J. (2019). Parallel feature selection for distributed-memory clusters. Information Sciences, 496, 399–409. https://doi.org/10.1016/j.ins.2019.01.050es_ES
dc.identifier.urihttp://hdl.handle.net/2183/34381
dc.descriptionVersión final aceptada de: https://doi.org/10.1016/j.ins.2019.01.050es_ES
dc.descriptionThis manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/ licenses/by-nc-nd/4.0/. This version of the article: González-Domínguez, J. et al. (2019) ‘Parallel feature selection for distributed-memory clusters’, has been accepted for publication in Information Sciences, 496, pp. 399–409. The Version of Record is available online at: https://doi.org/10.1016/j.ins.2019.01.050es_ES
dc.description.abstract[Abstract]: Feature selection is nowadays an extremely important data mining stage in the field of machine learning due to the appearance of problems of high dimensionality. In the literature there are numerous feature selection methods, mRMR (minimum-Redundancy-Maximum-Relevance) being one of the most widely used. However, although it achieves good results in selecting relevant features, it is impractical for datasets with thousands of features. A possible solution to this limitation is the use of the fast-mRMR method, a greedy optimization of the mRMR algorithm that improves both scalability and efficiency. In this work we present fast-mRMR-MPI, a novel hybrid parallel implementation that uses MPI and OpenMP to accelerate feature selection on distributed-memory clusters. Our performance evaluation on two different systems using five representative input datasets shows that fast-mRMR-MPI is significantly faster than fast-mRMR while providing the same results. As an example, our tool needs less than one minute to select 200 features of a dataset with more than four million features and 16,000 samples on a cluster with 32 nodes (768 cores in total), while the sequential fast-mRMR required more than eight hours. Moreover, fast-mRMR-MPI distributes data so that it is able to exploit the memory available on different nodes of a cluster and then complete analyses that fail on a single node due to memory constraints. Our tool is publicly available at https://github.com/borjaf696/Fast-mRMR.es_ES
dc.description.sponsorshipThis research has been partially funded by projects TIN2016-75845-P and TIN-2015-65069-C2-1-R of the Ministry of Economy, Industry and Competitiveness of Spain, as well as by Xunta de Galicia projects ED431D R2016/045 and GRC2014/035, all of them partially funded by FEDER funds of the European Union. We gratefully thank CESGA for providing access to the Finis Terrae II supercomputer.es_ES
dc.description.sponsorshipXunta de Galicia; ED431D R2016/045es_ES
dc.description.sponsorshipXunta de Galicia; GRC2014/035es_ES
dc.language.isoenges_ES
dc.relationinfo:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2016-75845-P/ES/NUEVOS DESAFIOS EN COMPUTACION DE ALTAS PRESTACIONES: DESDE ARQUITECTURAS HASTA APLICACIONES (II)/es_ES
dc.relationinfo:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2015-65069-C2-1-R/ES/ALGORITMOS ESCALABLES DE APRENDIZAJE COMPUTACIONAL: MAS ALLA DE LA CLASIFICACION Y LA REGRESIONes_ES
dc.relation.isversionofhttps://doi.org/10.1016/j.ins.2019.01.050
dc.relation.urihttps://doi.org/10.1016/j.ins.2019.01.050es_ES
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 Españaes_ES
dc.rightsCC-BY-NC-ND 4.0 license https://creativecommons.org/ licenses/by-nc-nd/4.0/es_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subjectMachine learninges_ES
dc.subjectFeature selectiones_ES
dc.subjectHigh performance computinges_ES
dc.subjectParallel computinges_ES
dc.titleParallel feature selection for distributed-memory clusterses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
dc.identifier.doi10.1016/j.ins.2019.01.050


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem