Skip navigation
  •  Inicio
  • UDC 
    • Cómo depositar
    • Políticas del RUC
    • FAQ
    • Derechos de autor
    • Más información en INFOguías UDC
  • Listar 
    • Comunidades
    • Buscar por:
    • Fecha de publicación
    • Autor
    • Título
    • Materia
  • Ayuda
    • español
    • Gallegan
    • English
  • Acceder
  •  Español 
    • Español
    • Galego
    • English
  
Ver ítem 
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction

Thumbnail
Ver/Abrir
Exposito_RobertoR_2024_BigDEC_A_multi_algorithm_Big_Data_tool.pdf (7.529Mb)
Use este enlace para citar
http://hdl.handle.net/2183/36250
Atribución-NoComercial-SinDerivadas 3.0 España
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 España
Colecciones
  • Investigación (FIC) [1685]
Metadatos
Mostrar el registro completo del ítem
Título
BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction
Autor(es)
Expósito, Roberto R.
González-Domínguez, Jorge
Fecha
2024-05
Cita bibliográfica
R. R. Expósito, J. González-Domínguez, "BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction", Future Generation Computer Systems, Vol. 154, May 2024, pp. 314 - 329, doi: 10.1016/j.future.2024.01.011
Resumen
[Abstract]: Despite the significant improvements in both throughput and cost provided by modern Next-Generation Sequencing (NGS) platforms, sequencing errors in NGS datasets can still degrade the quality of downstream analysis. Although state-of-the-art correction tools can provide high accuracy to improve such analysis, they are limited to apply a single correction algorithm while also requiring long runtimes when processing large NGS datasets. Furthermore, current parallel correctors generally only provide efficient support for shared-memory systems lacking the ability to scale out across a cluster of multicore nodes, or they require the availability of specific hardware devices or features. In this paper we present a Big Data Error Correction (BigDEC) tool that overcomes all those limitations by: (1) implementing three different error correction algorithms based on the widely extended k-mer spectrum method; (2) providing scalable performance for large datasets by efficiently exploiting the capabilities of Big Data technologies on multicore clusters based on commodity hardware; (3) supporting two different Big Data processing frameworks (Spark and Flink) to provide greater flexibility to end users; (4) including an efficient, stream-based merge operation to ease downstream processing of the corrected datasets; and (5) significantly outperforming existing parallel tools, being up to 79% faster on a 16-node multicore cluster when using the same underlying correction algorithm. BigDEC is publicly available to download at https://github.com/UDC-GAC/BigDEC.
Palabras clave
Apache flink
Apache spark
Big data processing
Error correction
Next generation sequencing (NGS)
 
Descripción
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG
Versión del editor
https://doi.org/10.1016/j.future.2024.01.011
Derechos
Atribución-NoComercial-SinDerivadas 3.0 España

Listar

Todo RUCComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulaciónEsta colecciónPor fecha de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulación

Mi cuenta

AccederRegistro

Estadísticas

Ver Estadísticas de uso
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Sugerencias