Buscar

Mostrando ítems 1-10 de 16

Communication avoiding and overlapping for numerical linear algebra

Georganas, Evangelos; González-Domínguez, Jorge; Solomonik, Edgar; Zheng, Yili; Touriño, Juan; Yelick, Katherine (IEEE Computer Society, 2013-02-25)

[Abstract] To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor ...

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures

González-Domínguez, Jorge; Ramos Garea, Sabela; Touriño, Juan; Schmidt, Bertil (Institute of Electrical and Electronics Engineers, 2016-08)

[Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on ...

MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud

Expósito, Roberto R.; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2017)

[Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...

Acceleration of a Feature Selection Algorithm Using High Performance Computing

Beceiro, Bieito; González-Domínguez, Jorge; Touriño, Juan (MDPI AG, 2020-09-01)

[Abstract] Feature selection is a subfield of data analysis that is on reducing the dimensionality of datasets, so that subsequent analyses over them can be performed in affordable execution times while keeping the same ...

A 2D algorithm with asymmetric workload for the UPC conjugate gradient method

González-Domínguez, Jorge; Marques, Osni A.; Martín, María J.; Touriño, Juan (Springer New York LLC, 2014)

[Abstract] This paper examines four different strategies, each one with its own data distribution, for implementing the parallel conjugate gradient (CG) method and how they impact communication and overall performance. ...

HSRA: Hadoop-based spliced read aligner for RNA sequencing data

Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Public Library of Science, 2018-07-31)

[Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference ...

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems

González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil (Oxford University Press, 2016)

[Abstracts] MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input ...

SMusket: Spark-based DNA error correction on distributed-memory systems

Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Elsevier B.V., 2020)

[Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...

Parallel feature selection for distributed-memory clusters

González-Domínguez, Jorge; Bolón-Canedo, Verónica; Freire, Borja; Touriño, Juan (2019)

[Abstract]: Feature selection is nowadays an extremely important data mining stage in the field of machine learning due to the appearance of problems of high dimensionality. In the literature there are numerous feature ...

Multithreaded and Spark parallelization of feature selection filters

Eiras-Franco, Carlos; Bolón-Canedo, Verónica; Ramos Garea, Sabela; González-Domínguez, Jorge; Alonso-Betanzos, Amparo; Touriño, Juan (2016)

[Abstract]: Vast amounts of data are generated every day, constituting a volume that is challenging to analyze. Techniques such as feature selection are advisable when tackling large datasets. Among the tools that provide ...