Insights into distributed feature ranking

Bolón-Canedo, V., Sechidis, K., Sánchez-Maroño, N., Alonso-Betanzos, A., & Brown, G. (2019). Insights into distributed feature ranking. Information Sciences, 496, 378–398. https://doi.org/10.1016/j.ins.2018.09.045

Abstract

[Abstract]: In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when dealing with huge datasets, and a possible solution is to distribute the data in several nodes. In this work we explore the different ways of distributing the data (by features and by samples) and we evaluate to what extent it is possible to obtain similar results as those obtained with the whole dataset. Trying to deal with the challenge of distributing the feature ranking process, we have performed experiments with different aggregation methods and feature rankers, and also evaluated the effect of distributing the feature ranking process in the subsequent classification performance.

Keywords

Feature selection
Feature ranking
Distributed learning

Description

This version of the article: Bolón-Canedo, V., Sechidis, K., Sánchez-Maroño, N., Alonso-Betanzos, A., & Brown, G. (2019). ‘Insights into distributed feature ranking’ has been accepted for publication in: Information Sciences, 496, 378–398. The Version of Record is available online at https://doi.org/10.1016/j.ins.2018.09.045.

Editor version

https://doi.org/10.1016/j.ins.2018.09.045

Rights

Atribución-NoComercial-SinDerivadas 4.0 Internacional

ISSN

0020-0255