Mostrar o rexistro simple do ítem
BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction
dc.contributor.author | Expósito, Roberto R. | |
dc.contributor.author | González-Domínguez, Jorge | |
dc.date.accessioned | 2024-04-19T07:17:26Z | |
dc.date.available | 2024-04-19T07:17:26Z | |
dc.date.issued | 2024-05 | |
dc.identifier.citation | R. R. Expósito, J. González-Domínguez, "BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction", Future Generation Computer Systems, Vol. 154, May 2024, pp. 314 - 329, doi: 10.1016/j.future.2024.01.011 | es_ES |
dc.identifier.uri | http://hdl.handle.net/2183/36250 | |
dc.description | Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG | es_ES |
dc.description.abstract | [Abstract]: Despite the significant improvements in both throughput and cost provided by modern Next-Generation Sequencing (NGS) platforms, sequencing errors in NGS datasets can still degrade the quality of downstream analysis. Although state-of-the-art correction tools can provide high accuracy to improve such analysis, they are limited to apply a single correction algorithm while also requiring long runtimes when processing large NGS datasets. Furthermore, current parallel correctors generally only provide efficient support for shared-memory systems lacking the ability to scale out across a cluster of multicore nodes, or they require the availability of specific hardware devices or features. In this paper we present a Big Data Error Correction (BigDEC) tool that overcomes all those limitations by: (1) implementing three different error correction algorithms based on the widely extended k-mer spectrum method; (2) providing scalable performance for large datasets by efficiently exploiting the capabilities of Big Data technologies on multicore clusters based on commodity hardware; (3) supporting two different Big Data processing frameworks (Spark and Flink) to provide greater flexibility to end users; (4) including an efficient, stream-based merge operation to ease downstream processing of the corrected datasets; and (5) significantly outperforming existing parallel tools, being up to 79% faster on a 16-node multicore cluster when using the same underlying correction algorithm. BigDEC is publicly available to download at https://github.com/UDC-GAC/BigDEC. | es_ES |
dc.description.sponsorship | This work was supported by grants PID2019-104184RB-I00 and PID2022-136435NB-I00, funded by the Ministry of Science and Innovation of Spain, MCIN/AEI/10.13039/501100011033 (PID2022 also funded by “ERDF A way of making Europe”, EU). It was also funded by Xunta de Galicia [Consolidation Program of Competitive Reference Groups, grant ED431C 2021/30]. Funding for open access charge: Universidade da Coruña/CISUG. | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431C 2021/30 | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFÍOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES | es_ES |
dc.relation | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136435NB-I00/ES/ARQUITECTURAS, FRAMEWORKS Y APLICACIONES DE LA COMPUTACION DE ALTAS PRESTACIONES | es_ES |
dc.relation.uri | https://doi.org/10.1016/j.future.2024.01.011 | es_ES |
dc.rights | Atribución-NoComercial-SinDerivadas 3.0 España | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ | * |
dc.subject | Apache Flink | es_ES |
dc.subject | Apache Spark | es_ES |
dc.subject | Big Data processing | es_ES |
dc.subject | Error correction | es_ES |
dc.subject | Next-Generation Sequencing (NGS) | es_ES |
dc.title | BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
UDC.journalTitle | Future Generation Computer Systems | es_ES |
UDC.volume | 154 | es_ES |
UDC.startPage | 314 | es_ES |
UDC.endPage | 329 | es_ES |
dc.identifier.doi | 10.1016/j.future.2024.01.011 |
Ficheiros no ítem
Este ítem aparece na(s) seguinte(s) colección(s)
-
GI-GAC - Artigos [189]