Buscar
Mostrando ítems 1-10 de 13
SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets
(Institute of Electrical and Electronics Engineers, 2020-08-07)
[Abstract]
This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) ...
Power Budgeting of Big Data Applications in Container-based Clusters
(Institute of Electrical and Electronics Engineers, 2020-11-02)
[Abstract]
Energy consumption is currently highly regarded on computing systems for many reasons, such as improving the environmental impact and reducing operational costs considering the rising price of energy. Previous ...
RGen: Data Generator for Benchmarking Big Data Workloads
(MDPI, 2021)
[Abstract] This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in ...
Performance Optimization of a Parallel Error Correction Tool
(MDPI, 2021)
[Abstract] Due to the continuous development in the field of Next Generation Sequencing (NGS) technologies that have allowed researchers to take advantage of greater genetic samples in less time, it is a matter of relevance ...
A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs
(Elsevier B.V., 2023-05)
[Abstract]: Time series are key across industrial and research areas for their ability to model behaviour across time, making them ideal for a wide range of use cases such as event monitoring, trend prediction or anomaly ...
MPI-dot2dot: A Parallel Tool to Find DNA Tandem Repeats on Multicore Clusters
(Springer, 2022)
[Abstract] Tandem Repeats (TRs) are segments that occur several times in a DNA sequence, and each copy is adjacent to other. In the last few years, TRs have gained significant attention as they are thought to be related ...
SparkEC: speeding up alignment-based DNA error correction tools
(BioMed Central (Springer), 2022)
[Abstract]: In recent years, huge improvements have been made in the context of sequencing genomic data under what is called Next Generation Sequencing (NGS). However, the DNA reads generated by current NGS platforms are ...
CUDA-JMI: Acceleration of feature selection on heterogeneous systems
(Elsevier, 2020-01)
[Abstract]: Feature selection is a crucial step nowadays in machine learning and data analytics to remove irrelevant and redundant characteristics and thus to provide fast and reliable analyses. Many research works have ...
SMusket: Spark-based DNA error correction on distributed-memory systems
(Elsevier B.V., 2020)
[Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...
Real-time resource scaling platform for Big Data workloads on serverless environments
(2020)
The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such ...