• Analysis and evaluation of MapReduce solutions on an HPC cluster 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Pergamon Press, 2016-02)
      [Abstract] The ever growing needs of Big Data applications are demanding challenging capabilities which cannot be handled easily by traditional systems, and thus more and more organizations are adopting High Performance ...
    • BDWatchdog: real-time monitoring and profiling of Big Data applications and frameworks 

      Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (Elsevier BV * North-Holland, 2018-10)
      [Abstract] Current Big Data applications are characterized by a heavy use of system resources (e.g., CPU, disk) generally distributed across a cluster. To effectively improve their performance there is a critical need for ...
    • Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization 

      López Cacheiro, Javier; Expósito, Roberto R.; Touriño, Juan; Enes, Jonatan (Springer Netherlands, 2018-12)
      [Abstract] With the increasing adoption of Big Data technologies as basic tools for the ongoing Digital Transformation, there is a high demand for data-intensive applications. In order to efficiently execute such applications, ...
    • Enhancing in-memory Efficiency for MapReduce-based Data Processing 

      Veiga Fachal, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Academic Press, 2018-10)
      [Abstract] As the memory capacity of computational systems increases, the in-memory data management of Big Data processing frameworks becomes more crucial for performance. This paper analyzes and improves the memory ...
    • Flame-MR: An event-driven architecture for MapReduce applications 

      Veiga, Jorge; Expósito, Roberto R.; Taboada, Guillermo L.; Touriño, Juan (Elsevier BV * North-Holland, 2016)
      [Abstract] Nowadays, many organizations analyze their data with the MapReduce paradigm, most of them using the popular Apache Hadoop framework. As the data size managed by MapReduce applications is steadily increasing, the ...
    • MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud 

      Expósito, Roberto R.; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2017)
      [Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...
    • Real-time resource scaling platform for Big Data workloads on serverless environments 

      Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (2020)
      The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such ...
    • Serverless-like platform for container-based YARN clusters 

      Castellanos Rodríguez, Óscar; Expósito, Roberto R.; Enes, Jonatan; Taboada, Guillermo L.; Touriño, Juan (Elsevier, 2024-06)
      [Abstract]: Serverless computing is an emerging paradigm that has gained a lot of relevance in recent years, as it allows users to consume computing resources without worrying about the underlying infrastructure and pay ...
    • SMusket: Spark-based DNA error correction on distributed-memory systems 

      Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Elsevier B.V., 2020)
      [Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...