Mostrar o rexistro simple do ítem
SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets
dc.contributor.author | Expósito, Roberto R. | |
dc.contributor.author | Galego Torreiro, Roi | |
dc.contributor.author | González-Domínguez, Jorge | |
dc.date.accessioned | 2020-09-30T14:41:22Z | |
dc.date.available | 2020-09-30T14:41:22Z | |
dc.date.issued | 2020-08-07 | |
dc.identifier.citation | R. R. Expósito, R. Galego-Torreiro and J. González-Domínguez, "SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets," in IEEE Access, vol. 8, pp. 146075-146084, 2020, doi: 10.1109/ACCESS.2020.3015016. | es_ES |
dc.identifier.issn | 2169-3536 | |
dc.identifier.uri | http://hdl.handle.net/2183/26270 | |
dc.description.abstract | [Abstract] This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) that can be applied to DNA/RNA reads in FASTQ/FASTA formats to improve subsequent downstream analyses, while providing a simple and user-friendly graphical interface for non-expert users. Furthermore, SeQual takes full advantage of Big Data technologies to process massive datasets on distributed-memory systems such as clusters by relying on the open-source Apache Spark cluster computing framework. Our scalable Spark-based implementation allows to reduce the runtime from more than three hours to less than 20 minutes when processing a paired-end dataset with 251 million reads per input file on an 8-node multi-core cluster. | es_ES |
dc.description.sponsorship | 10.13039/501100004837-Ministry of Science and Innovation of Spain (Grant Number: TIN2016-75845-P and PID2019-104184RB-I00) 10.13039/501100004837-AEI/FEDER/EU (Grant Number: 10.13039/501100011033) 10.13039/501100010801-Xunta de Galicia and FEDER funds (Centro de Investigación de Galicia accreditation 2019–2022 and the Consolidation Program of Competitive Reference Groups) (Grant Number: ED431G 2019/01 and ED431C 2017/04) | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431G 2019/01 | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431C 2017/04 | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Institute of Electrical and Electronics Engineers | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2016-75845-P/ES/NUEVOS DESAFIOS EN COMPUTACION DE ALTAS PRESTACIONES: DESDE ARQUITECTURAS HASTA APLICACIONES (II) | |
dc.relation | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES | |
dc.relation | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES | |
dc.relation | info:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2016-75845-P/ES/NUEVOS DESAFIOS EN COMPUTACION DE ALTAS PRESTACIONES: DESDE ARQUITECTURAS HASTA APLICACIONES (II) | |
dc.relation.uri | https://doi.org/10.1109/ACCESS.2020.3015016 | es_ES |
dc.rights | Atribución 4.0 Internacional (CC BY 4.0) | es_ES |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | * |
dc.subject | Big data | es_ES |
dc.subject | Next-generation sequencing (NGS) | es_ES |
dc.subject | Bioinformatics | es_ES |
dc.subject | Quality control | es_ES |
dc.subject | Apache spark | es_ES |
dc.title | SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
UDC.journalTitle | IEEE Access | es_ES |
UDC.volume | 8 | es_ES |
UDC.startPage | 146075 | es_ES |
UDC.endPage | 146084 | es_ES |
dc.identifier.doi | 10.1109/ACCESS.2020.3015016. |
Ficheiros no ítem
Este ítem aparece na(s) seguinte(s) colección(s)
-
GI-GAC - Artigos [189]