Buscar

Mostrando ítems 1-10 de 28

Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases

Veiga, Jorge; Expósito, Roberto R.; Raffin, Bruno; Touriño, Juan (Institute of Electrical and Electronics Engineers, 2018-11-12)

[Abstract] Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although ...

Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization

López Cacheiro, Javier; Expósito, Roberto R.; Touriño, Juan; Enes, Jonatan (Springer Netherlands, 2018-12)

[Abstract] With the increasing adoption of Big Data technologies as basic tools for the ongoing Digital Transformation, there is a high demand for data-intensive applications. In order to efficiently execute such applications, ...

BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks

Veiga, Jorge; Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (Elsevier BV * North-Holland, 2018-09)

[Abstract] As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks becomes a crucial task in order to identify potential performance bottlenecks that may delay the processing of large ...

BDWatchdog: real-time monitoring and profiling of Big Data applications and frameworks

Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (Elsevier BV * North-Holland, 2018-10)

[Abstract] Current Big Data applications are characterized by a heavy use of system resources (e.g., CPU, disk) generally distributed across a cluster. To effectively improve their performance there is a critical need for ...

MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud

Expósito, Roberto R.; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2017)

[Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...

HSRA: Hadoop-based spliced read aligner for RNA sequencing data

Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Public Library of Science, 2018-07-31)

[Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference ...

A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs

Enes, Jonatan; Expósito, Roberto R.; Fuentes Rodríguez, Jose; López Cacheiro, Javier; Touriño, Juan (Elsevier B.V., 2023-05)

[Abstract]: Time series are key across industrial and research areas for their ability to model behaviour across time, making them ideal for a wide range of use cases such as event monitoring, trend prediction or anomaly ...

SparkEC: speeding up alignment-based DNA error correction tools

Expósito, Roberto R.; Martínez-Sánchez, Marco; Touriño, Juan (BioMed Central (Springer), 2022)

[Abstract]: In recent years, huge improvements have been made in the context of sequencing genomic data under what is called Next Generation Sequencing (NGS). However, the DNA reads generated by current NGS platforms are ...

SMusket: Spark-based DNA error correction on distributed-memory systems

Expósito, Roberto R.; González-Domínguez, Jorge; Touriño, Juan (Elsevier B.V., 2020)

[Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...

Real-time resource scaling platform for Big Data workloads on serverless environments

Enes, Jonatan; Expósito, Roberto R.; Touriño, Juan (2020)

The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such ...