Mostrar o rexistro simple do ítem
Parallel feature selection for distributed-memory clusters
dc.contributor.author | González-Domínguez, Jorge | |
dc.contributor.author | Bolón-Canedo, Verónica | |
dc.contributor.author | Freire, Borja | |
dc.contributor.author | Touriño, Juan | |
dc.date.accessioned | 2023-11-29T19:04:58Z | |
dc.date.available | 2023-11-29T19:04:58Z | |
dc.date.issued | 2019 | |
dc.identifier.citation | González-Domínguez, J., Bolón-Canedo, V., Freire, B., & Touriño, J. (2019). Parallel feature selection for distributed-memory clusters. Information Sciences, 496, 399–409. https://doi.org/10.1016/j.ins.2019.01.050 | es_ES |
dc.identifier.uri | http://hdl.handle.net/2183/34381 | |
dc.description | Versión final aceptada de: https://doi.org/10.1016/j.ins.2019.01.050 | es_ES |
dc.description | This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/ licenses/by-nc-nd/4.0/. This version of the article: González-Domínguez, J. et al. (2019) ‘Parallel feature selection for distributed-memory clusters’, has been accepted for publication in Information Sciences, 496, pp. 399–409. The Version of Record is available online at: https://doi.org/10.1016/j.ins.2019.01.050 | es_ES |
dc.description.abstract | [Abstract]: Feature selection is nowadays an extremely important data mining stage in the field of machine learning due to the appearance of problems of high dimensionality. In the literature there are numerous feature selection methods, mRMR (minimum-Redundancy-Maximum-Relevance) being one of the most widely used. However, although it achieves good results in selecting relevant features, it is impractical for datasets with thousands of features. A possible solution to this limitation is the use of the fast-mRMR method, a greedy optimization of the mRMR algorithm that improves both scalability and efficiency. In this work we present fast-mRMR-MPI, a novel hybrid parallel implementation that uses MPI and OpenMP to accelerate feature selection on distributed-memory clusters. Our performance evaluation on two different systems using five representative input datasets shows that fast-mRMR-MPI is significantly faster than fast-mRMR while providing the same results. As an example, our tool needs less than one minute to select 200 features of a dataset with more than four million features and 16,000 samples on a cluster with 32 nodes (768 cores in total), while the sequential fast-mRMR required more than eight hours. Moreover, fast-mRMR-MPI distributes data so that it is able to exploit the memory available on different nodes of a cluster and then complete analyses that fail on a single node due to memory constraints. Our tool is publicly available at https://github.com/borjaf696/Fast-mRMR. | es_ES |
dc.description.sponsorship | This research has been partially funded by projects TIN2016-75845-P and TIN-2015-65069-C2-1-R of the Ministry of Economy, Industry and Competitiveness of Spain, as well as by Xunta de Galicia projects ED431D R2016/045 and GRC2014/035, all of them partially funded by FEDER funds of the European Union. We gratefully thank CESGA for providing access to the Finis Terrae II supercomputer. | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431D R2016/045 | es_ES |
dc.description.sponsorship | Xunta de Galicia; GRC2014/035 | es_ES |
dc.language.iso | eng | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2016-75845-P/ES/NUEVOS DESAFIOS EN COMPUTACION DE ALTAS PRESTACIONES: DESDE ARQUITECTURAS HASTA APLICACIONES (II)/ | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2015-65069-C2-1-R/ES/ALGORITMOS ESCALABLES DE APRENDIZAJE COMPUTACIONAL: MAS ALLA DE LA CLASIFICACION Y LA REGRESION | es_ES |
dc.relation.isversionof | https://doi.org/10.1016/j.ins.2019.01.050 | |
dc.relation.uri | https://doi.org/10.1016/j.ins.2019.01.050 | es_ES |
dc.rights | Atribución-NoComercial-SinDerivadas 3.0 España | es_ES |
dc.rights | CC-BY-NC-ND 4.0 license https://creativecommons.org/ licenses/by-nc-nd/4.0/ | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ | * |
dc.subject | Machine learning | es_ES |
dc.subject | Feature selection | es_ES |
dc.subject | High performance computing | es_ES |
dc.subject | Parallel computing | es_ES |
dc.title | Parallel feature selection for distributed-memory clusters | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
dc.identifier.doi | 10.1016/j.ins.2019.01.050 |
Ficheiros no ítem
Este ítem aparece na(s) seguinte(s) colección(s)
-
GI-GAC - Artigos [182]