SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets
Use este enlace para citar
http://hdl.handle.net/2183/26270
A non ser que se indique outra cousa, a licenza do ítem descríbese como Atribución 4.0 Internacional (CC BY 4.0)
Coleccións
- GI-GAC - Artigos [192]
Metadatos
Mostrar o rexistro completo do ítemTítulo
SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS DatasetsData
2020-08-07Cita bibliográfica
R. R. Expósito, R. Galego-Torreiro and J. González-Domínguez, "SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets," in IEEE Access, vol. 8, pp. 146075-146084, 2020, doi: 10.1109/ACCESS.2020.3015016.
Resumo
[Abstract]
This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) that can be applied to DNA/RNA reads in FASTQ/FASTA formats to improve subsequent downstream analyses, while providing a simple and user-friendly graphical interface for non-expert users. Furthermore, SeQual takes full advantage of Big Data technologies to process massive datasets on distributed-memory systems such as clusters by relying on the open-source Apache Spark cluster computing framework. Our scalable Spark-based implementation allows to reduce the runtime from more than three hours to less than 20 minutes when processing a paired-end dataset with 251 million reads per input file on an 8-node multi-core cluster.
Palabras chave
Big data
Next-generation sequencing (NGS)
Bioinformatics
Quality control
Apache spark
Next-generation sequencing (NGS)
Bioinformatics
Quality control
Apache spark
Versión do editor
Dereitos
Atribución 4.0 Internacional (CC BY 4.0)
ISSN
2169-3536