Skip navigation
  •  Inicio
  • UDC 
    • Cómo depositar
    • Políticas do RUC
    • FAQ
    • Dereitos de Autor
    • Máis información en INFOguías UDC
  • Percorrer 
    • Comunidades
    • Buscar por:
    • Data de publicación
    • Autor
    • Título
    • Materia
  • Axuda
    • español
    • Gallegan
    • English
  • Acceder
  •  Galego 
    • Español
    • Galego
    • English
  
Ver ítem 
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction

Thumbnail
Ver/abrir
Exposito_RobertoR_2024_BigDEC_A_multi_algorithm_Big_Data_tool.pdf (7.529Mb)
Use este enlace para citar
http://hdl.handle.net/2183/36250
Atribución-NoComercial-SinDerivadas 3.0 España
A non ser que se indique outra cousa, a licenza do ítem descríbese como Atribución-NoComercial-SinDerivadas 3.0 España
Coleccións
  • Investigación (FIC) [1678]
Metadatos
Mostrar o rexistro completo do ítem
Título
BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction
Autor(es)
Expósito, Roberto R.
González-Domínguez, Jorge
Data
2024-05
Cita bibliográfica
R. R. Expósito, J. González-Domínguez, "BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction", Future Generation Computer Systems, Vol. 154, May 2024, pp. 314 - 329, doi: 10.1016/j.future.2024.01.011
Resumo
[Abstract]: Despite the significant improvements in both throughput and cost provided by modern Next-Generation Sequencing (NGS) platforms, sequencing errors in NGS datasets can still degrade the quality of downstream analysis. Although state-of-the-art correction tools can provide high accuracy to improve such analysis, they are limited to apply a single correction algorithm while also requiring long runtimes when processing large NGS datasets. Furthermore, current parallel correctors generally only provide efficient support for shared-memory systems lacking the ability to scale out across a cluster of multicore nodes, or they require the availability of specific hardware devices or features. In this paper we present a Big Data Error Correction (BigDEC) tool that overcomes all those limitations by: (1) implementing three different error correction algorithms based on the widely extended k-mer spectrum method; (2) providing scalable performance for large datasets by efficiently exploiting the capabilities of Big Data technologies on multicore clusters based on commodity hardware; (3) supporting two different Big Data processing frameworks (Spark and Flink) to provide greater flexibility to end users; (4) including an efficient, stream-based merge operation to ease downstream processing of the corrected datasets; and (5) significantly outperforming existing parallel tools, being up to 79% faster on a 16-node multicore cluster when using the same underlying correction algorithm. BigDEC is publicly available to download at https://github.com/UDC-GAC/BigDEC.
Palabras chave
Apache flink
Apache spark
Big data processing
Error correction
Next generation sequencing (NGS)
 
Descrición
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG
Versión do editor
https://doi.org/10.1016/j.future.2024.01.011
Dereitos
Atribución-NoComercial-SinDerivadas 3.0 España

Listar

Todo RUCComunidades e colecciónsPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulaciónEsta colecciónPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulación

A miña conta

AccederRexistro

Estatísticas

Ver Estatísticas de uso
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Suxestións