• Accelerating the quality control of genetic sequences through stream processing 

      Castellanos Rodríguez, Óscar; Expósito, Roberto R.; Touriño, Juan (Association for Computing Machinery, 2023)
      [Abstract]: Quality control of DNA sequences is an important data preprocessing step in many genomic analyses. However, all existing parallel tools for this purpose are based on a batch processing model, needing to have ...
    • Analysis and evaluation of MapReduce solutions on an HPC cluster 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Pergamon Press, 2016-02)
      [Abstract] The ever growing needs of Big Data applications are demanding challenging capabilities which cannot be handled easily by traditional systems, and thus more and more organizations are adopting High Performance ...
    • BDWatchdog: real-time monitoring and profiling of Big Data applications and frameworks 

      Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (Elsevier BV * North-Holland, 2018-10)
      [Abstract] Current Big Data applications are characterized by a heavy use of system resources (e.g., CPU, disk) generally distributed across a cluster. To effectively improve their performance there is a critical need for ...
    • Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization 

      López Cacheiro, Javier; Expósito, Roberto R.; Touriño, Juan; Enes, Jonatan (Springer Netherlands, 2018-12)
      [Abstract] With the increasing adoption of Big Data technologies as basic tools for the ongoing Digital Transformation, there is a high demand for data-intensive applications. In order to efficiently execute such applications, ...
    • Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big Data 

      Expósito, Roberto R.; Veiga, Jorge; Touriño, Juan (Springer, 2020)
      [Abstract]: Java has been the backbone of Big Data processing for more than a decade due to its interesting features such as object orientation, cross-platform portability and good programming productivity. In fact, most ...
    • Enhancing in-memory Efficiency for MapReduce-based Data Processing 

      Veiga Fachal, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Academic Press, 2018-10)
      [Abstract] As the memory capacity of computational systems increases, the in-memory data management of Big Data processing frameworks becomes more crucial for performance. This paper analyzes and improves the memory ...
    • Flame-MR: An event-driven architecture for MapReduce applications 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Elsevier BV * North-Holland, 2016)
      [Abstract] Nowadays, many organizations analyze their data with the MapReduce paradigm, most of them using the popular Apache Hadoop framework. As the data size managed by MapReduce applications is steadily increasing, the ...
    • MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud 

      Expósito, Roberto R.; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2017)
      [Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...
    • MREv: An Automatic MapReduce Evaluation Tool for Big Data Workloads 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Elsevier, 2015)
      [Abstract]: The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case ...
    • Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics 

      Veiga, Jorge; Expósito, Roberto R.; Pardo, Xoán C.; Taboada, Guillermo L.; Touriño, Juan (IEEE Computer Society, 2017-02-06)
      [Abstract] The increasing adoption of Big Data analytics has led to a high demand for efficient technologies in order to manage and process large datasets. Popular MapReduce frameworks such as Hadoop are being replaced by ...
    • Performance Optimization of a Parallel Error Correction Tool 

      Martínez-Sánchez, Marco; Expósito, Roberto R.; Touriño, Juan (MDPI, 2021)
      [Abstract] Due to the continuous development in the field of Next Generation Sequencing (NGS) technologies that have allowed researchers to take advantage of greater genetic samples in less time, it is a matter of relevance ...
    • Power Budgeting of Big Data Applications in Container-based Clusters 

      Enes, Jonatan; Fieni, Guillaume; Expósito, Roberto R.; Rouvoy, Romain; Touriño, Juan (Institute of Electrical and Electronics Engineers, 2020-11-02)
      [Abstract] Energy consumption is currently highly regarded on computing systems for many reasons, such as improving the environmental impact and reducing operational costs considering the rising price of energy. Previous ...
    • Real-time resource scaling platform for Big Data workloads on serverless environments 

      Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (2020)
      The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such ...
    • RGen: Data Generator for Benchmarking Big Data Workloads 

      Pérez-Jove, Rubén; Expósito, Roberto R.; Touriño, Juan (MDPI, 2021)
      [Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in ...
    • Serverless-like platform for container-based YARN clusters 

      Castellanos Rodríguez, Óscar; Expósito, Roberto R.; Enes, Jonatan; Taboada, Guillermo L.; Touriño, Juan (Elsevier, 2024-06)
      [Abstract]: Serverless computing is an emerging paradigm that has gained a lot of relevance in recent years, as it allows users to consume computing resources without worrying about the underlying infrastructure and pay ...
    • SMusket: Spark-based DNA error correction on distributed-memory systems 

      Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Elsevier B.V., 2020)
      [Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...