Skip navigation
  •  Home
  • UDC 
    • Getting started
    • RUC Policies
    • FAQ
    • FAQ on Copyright
    • More information at INFOguias UDC
  • Browse 
    • Communities
    • Browse by:
    • Issue Date
    • Author
    • Title
    • Subject
  • Help
    • español
    • Gallegan
    • English
  • Login
  •  English 
    • Español
    • Galego
    • English
  
View Item 
  •   DSpace Home
  • Facultade de Informática
  • Investigación (FIC)
  • View Item
  •   DSpace Home
  • Facultade de Informática
  • Investigación (FIC)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Parallel feature selection for distributed-memory clusters

Thumbnail
View/Open
Gonzalez_Dominguez_Jorge_2019_Parallel_feature_selection_for_distributed_memory_clusters.pdf (283.8Kb)
Use this link to cite
http://hdl.handle.net/2183/34381
Atribución-NoComercial-SinDerivadas 3.0 España
Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 España
Collections
  • Investigación (FIC) [1728]
Metadata
Show full item record
Title
Parallel feature selection for distributed-memory clusters
Author(s)
González-Domínguez, Jorge
Bolón-Canedo, Verónica
Freire, Borja
Touriño, Juan
Date
2019
Citation
González-Domínguez, J., Bolón-Canedo, V., Freire, B., & Touriño, J. (2019). Parallel feature selection for distributed-memory clusters. Information Sciences, 496, 399–409. https://doi.org/10.1016/j.ins.2019.01.050
Is version of
https://doi.org/10.1016/j.ins.2019.01.050
Abstract
[Abstract]: Feature selection is nowadays an extremely important data mining stage in the field of machine learning due to the appearance of problems of high dimensionality. In the literature there are numerous feature selection methods, mRMR (minimum-Redundancy-Maximum-Relevance) being one of the most widely used. However, although it achieves good results in selecting relevant features, it is impractical for datasets with thousands of features. A possible solution to this limitation is the use of the fast-mRMR method, a greedy optimization of the mRMR algorithm that improves both scalability and efficiency. In this work we present fast-mRMR-MPI, a novel hybrid parallel implementation that uses MPI and OpenMP to accelerate feature selection on distributed-memory clusters. Our performance evaluation on two different systems using five representative input datasets shows that fast-mRMR-MPI is significantly faster than fast-mRMR while providing the same results. As an example, our tool needs less than one minute to select 200 features of a dataset with more than four million features and 16,000 samples on a cluster with 32 nodes (768 cores in total), while the sequential fast-mRMR required more than eight hours. Moreover, fast-mRMR-MPI distributes data so that it is able to exploit the memory available on different nodes of a cluster and then complete analyses that fail on a single node due to memory constraints. Our tool is publicly available at https://github.com/borjaf696/Fast-mRMR.
Keywords
Machine learning
Feature selection
High performance computing
Parallel computing
 
Description
Versión final aceptada de: https://doi.org/10.1016/j.ins.2019.01.050
 
This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/ licenses/by-nc-nd/4.0/. This version of the article: González-Domínguez, J. et al. (2019) ‘Parallel feature selection for distributed-memory clusters’, has been accepted for publication in Information Sciences, 496, pp. 399–409. The Version of Record is available online at: https://doi.org/10.1016/j.ins.2019.01.050
 
Editor version
https://doi.org/10.1016/j.ins.2019.01.050
Rights
Atribución-NoComercial-SinDerivadas 3.0 España
 
CC-BY-NC-ND 4.0 license https://creativecommons.org/ licenses/by-nc-nd/4.0/
 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsResearch GroupAcademic DegreeThis CollectionBy Issue DateAuthorsTitlesSubjectsResearch GroupAcademic Degree

My Account

LoginRegister

Statistics

View Usage Statistics
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Send Feedback