• General‐purpose computation on GPUs for high performance cloud computing 

      Expósito, Roberto R.; Taboada, Guillermo L.; Ramos Garea, Sabela; Touriño, Juan; Doallo, Ramón (John Wiley & Sons Ltd., 2013-08)
      [Abstract] Cloud computing is offering new approaches for High Performance Computing (HPC) as it provides dynamically scalable resources as a service over the Internet. In addition, General‐Purpose computation on Graphical ...
    • HSRA: Hadoop-based spliced read aligner for RNA sequencing data 

      Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Public Library of Science, 2018-07-31)
      [Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference ...
    • Java in the High Performance Computing arena: Research, practice and experience 

      Taboada, Guillermo L.; Ramos Garea, Sabela; Expósito, Roberto R.; Touriño, Juan; Doallo, Ramón (Elsevier BV, 2013-05-01)
      [Abstract] The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and ...
    • Low‐latency Java communication devices on RDMA‐enabled networks 

      Expósito, Roberto R.; Taboada, Guillermo L.; Ramos Garea, Sabela; Touriño, Juan; Doallo, Ramón (John Wiley & Sons Ltd., 2015)
      [Abstract] Providing high‐performance inter‐node communication is a key capability for running high performance computing applications efficiently on parallel architectures. In fact, current systems deployments are aggregating ...
    • MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud 

      Expósito, Roberto R.; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2017)
      [Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...
    • MPI-dot2dot: A Parallel Tool to Find DNA Tandem Repeats on Multicore Clusters 

      González-Domínguez, Jorge; Martín Martínez, José Manuel; Expósito, Roberto R. (Springer, 2022)
      [Abstract] Tandem Repeats (TRs) are segments that occur several times in a DNA sequence, and each copy is adjacent to other. In the last few years, TRs have gained significant attention as they are thought to be related ...
    • MREv: An Automatic MapReduce Evaluation Tool for Big Data Workloads 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Elsevier, 2015)
      [Abstract]: The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case ...
    • Nonblocking collectives for scalable Java communications 

      Ramos Garea, Sabela; Taboada, Guillermo L.; Expósito, Roberto R.; Touriño, Juan (John Wiley & Sons Ltd., 2015-04-22)
      [Abstract] This paper presents a Java implementation of the recently published MPI 3.0 nonblocking message passing collectives in order to analyze and assess the feasibility of taking advantage of these operations in shared ...
    • Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases 

      Veiga, Jorge; Expósito, Roberto R.; Raffin, Bruno; Touriño, Juan (Institute of Electrical and Electronics Engineers, 2018-11-12)
      [Abstract] Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although ...
    • ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems 

      González-Domínguez, Jorge; Expósito, Roberto R. (PLoS, 2018)
      [Abstract]: Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, ...
    • Performance analysis of HPC applications in the cloud 

      Expósito, Roberto R.; Taboada, Guillermo L.; Ramos Garea, Sabela; Touriño, Juan; Doallo, Ramón (Elsevier BV * North-Holland, 2013-01)
      [Abstract] The scalability of High Performance Computing (HPC) applications depends heavily on the efficient support of network communications in virtualized environments. However, Infrastructure as a Service (IaaS) providers ...
    • Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics 

      Veiga, Jorge; Expósito, Roberto R.; Pardo, Xoán C.; Taboada, Guillermo L.; Touriño, Juan (IEEE Computer Society, 2017-02-06)
      [Abstract] The increasing adoption of Big Data analytics has led to a high demand for efficient technologies in order to manage and process large datasets. Popular MapReduce frameworks such as Hadoop are being replaced by ...
    • Performance Evaluation of Data-Intensive Computing Applications on a Public IaaS Cloud 

      Expósito, Roberto R.; Taboada, Guillermo L.; Ramos Garea, Sabela; Touriño, Juan; Doallo, Ramón (Oxford University Press, 2016)
      [Abstract] The advent of cloud computing technologies, which dynamically provide on-demand access to computational resources over the Internet, is offering new possibilities to many scientists and researchers. Nowadays, ...
    • Performance Optimization of a Parallel Error Correction Tool 

      Martínez-Sánchez, Marco; Expósito, Roberto R.; Touriño, Juan (MDPI, 2021)
      [Abstract] Due to the continuous development in the field of Next Generation Sequencing (NGS) technologies that have allowed researchers to take advantage of greater genetic samples in less time, it is a matter of relevance ...
    • Power Budgeting of Big Data Applications in Container-based Clusters 

      Enes, Jonatan; Fieni, Guillaume; Expósito, Roberto R.; Rouvoy, Romain; Touriño, Juan (Institute of Electrical and Electronics Engineers, 2020-11-02)
      [Abstract] Energy consumption is currently highly regarded on computing systems for many reasons, such as improving the environmental impact and reducing operational costs considering the rising price of energy. Previous ...
    • Real-time resource scaling platform for Big Data workloads on serverless environments 

      Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (2020)
      The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such ...
    • RGen: Data Generator for Benchmarking Big Data Workloads 

      Pérez-Jove, Rubén; Expósito, Roberto R.; Touriño, Juan (MDPI, 2021)
      [Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in ...
    • Running scientific codes on amazon EC2: a performance analysis of five high-end instances 

      Expósito, Roberto R.; Taboada, Guillermo L.; Pardo, Xoán C.; Touriño, Juan; Doallo, Ramón (Springer New York LLC, 2013)
      [Abstract] Amazon Web Services (AWS) is a well-known public Infrastructure-as-a-Service (IaaS) provider whose Elastic Computing Cloud (EC2) o ering includes some instances, known as cluster instances, aimed at High-Performance ...
    • SeQual-Stream: approaching stream processing to quality control of NGS datasets 

      Castellanos Rodríguez, Óscar; Expósito, Roberto R.; Touriño, Juan (BMC, 2023-10)
      [Abstract]: Background Quality control of DNA sequences is an important data preprocessing step in many genomic analyses. However, all existing parallel tools for this purpose are based on a batch processing model, ...
    • SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets 

      Expósito, Roberto R.; Galego Torreiro, Roi; González-Domínguez, Jorge (Institute of Electrical and Electronics Engineers, 2020-08-07)
      [Abstract] This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) ...