Mostrar o rexistro simple do ítem

dc.contributor.authorExpósito, Roberto R.
dc.contributor.authorGalego Torreiro, Roi
dc.contributor.authorGonzález-Domínguez, Jorge
dc.date.accessioned2020-09-30T14:41:22Z
dc.date.available2020-09-30T14:41:22Z
dc.date.issued2020-08-07
dc.identifier.citationR. R. Expósito, R. Galego-Torreiro and J. González-Domínguez, "SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets," in IEEE Access, vol. 8, pp. 146075-146084, 2020, doi: 10.1109/ACCESS.2020.3015016.es_ES
dc.identifier.issn2169-3536
dc.identifier.urihttp://hdl.handle.net/2183/26270
dc.description.abstract[Abstract] This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) that can be applied to DNA/RNA reads in FASTQ/FASTA formats to improve subsequent downstream analyses, while providing a simple and user-friendly graphical interface for non-expert users. Furthermore, SeQual takes full advantage of Big Data technologies to process massive datasets on distributed-memory systems such as clusters by relying on the open-source Apache Spark cluster computing framework. Our scalable Spark-based implementation allows to reduce the runtime from more than three hours to less than 20 minutes when processing a paired-end dataset with 251 million reads per input file on an 8-node multi-core cluster.es_ES
dc.description.sponsorship10.13039/501100004837-Ministry of Science and Innovation of Spain (Grant Number: TIN2016-75845-P and PID2019-104184RB-I00) 10.13039/501100004837-AEI/FEDER/EU (Grant Number: 10.13039/501100011033) 10.13039/501100010801-Xunta de Galicia and FEDER funds (Centro de Investigación de Galicia accreditation 2019–2022 and the Consolidation Program of Competitive Reference Groups) (Grant Number: ED431G 2019/01 and ED431C 2017/04)es_ES
dc.description.sponsorshipXunta de Galicia; ED431G 2019/01es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2017/04es_ES
dc.language.isoenges_ES
dc.publisherInstitute of Electrical and Electronics Engineerses_ES
dc.relationinfo:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2016-75845-P/ES/NUEVOS DESAFIOS EN COMPUTACION DE ALTAS PRESTACIONES: DESDE ARQUITECTURAS HASTA APLICACIONES (II)
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES
dc.relationinfo:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2016-75845-P/ES/NUEVOS DESAFIOS EN COMPUTACION DE ALTAS PRESTACIONES: DESDE ARQUITECTURAS HASTA APLICACIONES (II)
dc.relation.urihttps://doi.org/10.1109/ACCESS.2020.3015016es_ES
dc.rightsAtribución 4.0 Internacional (CC BY 4.0)es_ES
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subjectBig dataes_ES
dc.subjectNext-generation sequencing (NGS)es_ES
dc.subjectBioinformaticses_ES
dc.subjectQuality controles_ES
dc.subjectApache sparkes_ES
dc.titleSeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasetses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleIEEE Accesses_ES
UDC.volume8es_ES
UDC.startPage146075es_ES
UDC.endPage146084es_ES
dc.identifier.doi10.1109/ACCESS.2020.3015016.


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem