HSRA: Hadoop-based spliced read aligner for RNA sequencing data

Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan

dc.contributor.author	Expósito, Roberto R.
dc.contributor.author	González-Domínguez, Jorge
dc.contributor.author	Touriño, Juan
dc.date.accessioned	2019-02-15T18:23:34Z
dc.date.available	2019-02-15T18:23:34Z
dc.date.issued	2018-07-31
dc.identifier.citation	Expósito RR, González-Domínguez J, Touriño J (2018) HSRA: Hadoop-based spliced read aligner for RNA sequencing data. PLoS ONE 13(7): e0201483. https://doi.org/10.1371/journal.pone.0201483	es_ES
dc.identifier.issn	1932-6203
dc.identifier.uri	http://hdl.handle.net/2183/21813
dc.description.abstract	[Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user’s guide are publicly available for download at http://hsra.dec.udc.es.	es_ES
dc.description.sponsorship	Ministerio de Economía, Industria y Competitividad; TIN2016-75845-P	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G/01	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Public Library of Science	es_ES
dc.relation.uri	https://doi.org/10.1371/journal.pone.0201483	es_ES
dc.subject	Sequence alignment	es_ES
dc.subject	Data processing	es_ES
dc.subject	Genome analysis	es_ES
dc.subject	RNA	es_ES
dc.subject	Sequencing	es_ES
dc.subject	Memory	es_ES
dc.subject	RNA analysis	es_ES
dc.subject	Preprocessing	es_ES
dc.subject	RNA alignment	es_ES
dc.title	HSRA: Hadoop-based spliced read aligner for RNA sequencing data	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	PL o S One	es_ES
UDC.volume	13	es_ES
UDC.issue	7	es_ES
dc.identifier.doi	10.1371/journal.pone.0201483

Ficheiros no ítem

Nome:: R.R.Exósito_2018_HSRA_Hadoop-b ...
Tamaño:: 3.934Mb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Artigos [192]

Mostrar o rexistro simple do ítem