Mostrar o rexistro simple do ítem

dc.contributor.authorExpósito, Roberto R.
dc.contributor.authorGonzález-Domínguez, Jorge
dc.date.accessioned2024-04-19T07:17:26Z
dc.date.available2024-04-19T07:17:26Z
dc.date.issued2024-05
dc.identifier.citationR. R. Expósito, J. González-Domínguez, "BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction", Future Generation Computer Systems, Vol. 154, May 2024, pp. 314 - 329, doi: 10.1016/j.future.2024.01.011es_ES
dc.identifier.urihttp://hdl.handle.net/2183/36250
dc.descriptionFinanciado para publicación en acceso aberto: Universidade da Coruña/CISUGes_ES
dc.description.abstract[Abstract]: Despite the significant improvements in both throughput and cost provided by modern Next-Generation Sequencing (NGS) platforms, sequencing errors in NGS datasets can still degrade the quality of downstream analysis. Although state-of-the-art correction tools can provide high accuracy to improve such analysis, they are limited to apply a single correction algorithm while also requiring long runtimes when processing large NGS datasets. Furthermore, current parallel correctors generally only provide efficient support for shared-memory systems lacking the ability to scale out across a cluster of multicore nodes, or they require the availability of specific hardware devices or features. In this paper we present a Big Data Error Correction (BigDEC) tool that overcomes all those limitations by: (1) implementing three different error correction algorithms based on the widely extended k-mer spectrum method; (2) providing scalable performance for large datasets by efficiently exploiting the capabilities of Big Data technologies on multicore clusters based on commodity hardware; (3) supporting two different Big Data processing frameworks (Spark and Flink) to provide greater flexibility to end users; (4) including an efficient, stream-based merge operation to ease downstream processing of the corrected datasets; and (5) significantly outperforming existing parallel tools, being up to 79% faster on a 16-node multicore cluster when using the same underlying correction algorithm. BigDEC is publicly available to download at https://github.com/UDC-GAC/BigDEC.es_ES
dc.description.sponsorshipThis work was supported by grants PID2019-104184RB-I00 and PID2022-136435NB-I00, funded by the Ministry of Science and Innovation of Spain, MCIN/AEI/10.13039/501100011033 (PID2022 also funded by “ERDF A way of making Europe”, EU). It was also funded by Xunta de Galicia [Consolidation Program of Competitive Reference Groups, grant ED431C 2021/30]. Funding for open access charge: Universidade da Coruña/CISUG.es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2021/30es_ES
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFÍOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONESes_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136435NB-I00/ES/ARQUITECTURAS, FRAMEWORKS Y APLICACIONES DE LA COMPUTACION DE ALTAS PRESTACIONESes_ES
dc.relation.urihttps://doi.org/10.1016/j.future.2024.01.011es_ES
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subjectApache Flinkes_ES
dc.subjectApache Sparkes_ES
dc.subjectBig Data processinges_ES
dc.subjectError correctiones_ES
dc.subjectNext-Generation Sequencing (NGS)es_ES
dc.titleBigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correctiones_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleFuture Generation Computer Systemses_ES
UDC.volume154es_ES
UDC.startPage314es_ES
UDC.endPage329es_ES
dc.identifier.doi10.1016/j.future.2024.01.011


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem