Multithreaded and Spark parallelization of feature selection filters

Eiras-Franco, Carlos; Bolón-Canedo, Verónica; Ramos Garea, Sabela; González-Domínguez, Jorge; Alonso-Betanzos, Amparo; Touriño, Juan

dc.contributor.author	Eiras-Franco, Carlos
dc.contributor.author	Bolón-Canedo, Verónica
dc.contributor.author	Ramos Garea, Sabela
dc.contributor.author	González-Domínguez, Jorge
dc.contributor.author	Alonso-Betanzos, Amparo
dc.contributor.author	Touriño, Juan
dc.date.accessioned	2023-12-21T11:33:01Z
dc.date.available	2023-12-21T11:33:01Z
dc.date.issued	2016
dc.identifier.citation	C. Eiras-Franco, V. Bolón-Canedo, S. Ramos, J. González-Domínguez, A. Alonso-Betanzos, and J. Touriño, "Multithreaded and Spark parallelization of feature selection filters", Journal of Computational Science, Vol. 17, Part 3, Nov. 2016, Pp. 609-619, https://doi.org/10.1016/j.jocs.2016.07.002	es_ES
dc.identifier.uri	http://hdl.handle.net/2183/34589
dc.description	©2016 Elsevier B.V. All rights reserved. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/bync-nd/4.0/. This version of the article has been accepted for publication in Journal of Computational Science. The Version of Record is available online at https://doi.org/10.1016/j.jocs.2016.07.002	es_ES
dc.description	Versión final aceptada de: C. Eiras-Franco, V. Bolón-Canedo, S. Ramos, J. González-Domínguez, A. Alonso-Betanzos, and J. Touriño, "Multithreaded and Spark parallelization of feature selection filters", Journal of Computational Science, Vol. 17, Part 3, Nov. 2016, Pp. 609-619	es_ES
dc.description.abstract	[Abstract]: Vast amounts of data are generated every day, constituting a volume that is challenging to analyze. Techniques such as feature selection are advisable when tackling large datasets. Among the tools that provide this functionality, Weka is one of the most popular ones, although the implementations it provides struggle when processing large datasets, requiring excessive times to be practical. Parallel processing can help alleviate this problem, effectively allowing users to work with Big Data. The computational power of multicore machines can be harnessed by using multithreading and distributed programming, effectively helping to tackle larger problems. Both these techniques can dramatically speed up the feature selection process allowing users to work with larger datasets. The reimplementation of four popular feature selection algorithms included in Weka is the focus of this work. Multithreaded implementations previously not included in Weka as well as parallel Spark implementations were developed for each algorithm. Experimental results obtained from tests on real-world datasets show that the new versions offer significant reductions in processing times.	es_ES
dc.description.sponsorship	This work has been financed in part by Xunta de Galicia under Research Network R2014/041 and project GRC2014/035, and by Spanish Ministerio de Economía y Competitividad under projects TIN2012-37954 and TIN-2015-65069-C2-1-R, partially funded by FEDER funds of the European Union. V. Bolón-Canedo acknowledges support of the Xunta de Galicia under postdoctoral Grant code ED481B 2014/164-0. Additionally, the collaboration of Jorge Veiga on setting up and using the MREv tool for Spark execution was essential for this work.	es_ES
dc.description.sponsorship	Xunta de Galicia; R2014/041	es_ES
dc.description.sponsorship	Xunta de Galicia; GRC2014/035	es_ES
dc.description.sponsorship	Xunta de Galicia; ED481B 2014/164-0	es_ES
dc.language.iso	eng	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2012-37954/ES/ALGORITMOS DE APRENDIZAJE COMPUTACIONAL EN ENTORNOS DISTRIBUIDOS	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN-2015-65069-C2-1-R/ALGORITMOS ESCALABLES DE APRENDIZAJE COMPUTACIONAL: MAS ALLA DE LA CLASIFICACION Y LA REGRESION	es_ES
dc.relation.isversionof	https://doi.org/10.1016/j.jocs.2016.07.002
dc.relation.uri	https://doi.org/10.1016/j.jocs.2016.07.002	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	Multithreading	es_ES
dc.subject	Spark	es_ES
dc.subject	Feature selection	es_ES
dc.subject	Machine learning	es_ES
dc.title	Multithreaded and Spark parallelization of feature selection filters	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Journal of Computational Science	es_ES
UDC.volume	17	es_ES
UDC.issue	3	es_ES
UDC.startPage	609	es_ES
UDC.endPage	619	es_ES
dc.identifier.doi	10.1016/j.jocs.2016.07.002

Ficheiros no ítem

Nome:: license_rdf
Tamaño:: 1.203Kb
Formato:: application/rdf+xml

Ver/abrir

Nome:: EirasFranco_Carlos_2016_Multit ...
Tamaño:: 285.2Kb
Formato:: PDF
Descrición:: Versión aceptada

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Artigos [181]

Mostrar o rexistro simple do ítem