Buscar
Mostrando ítems 1-10 de 10
SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets
(Institute of Electrical and Electronics Engineers, 2020-08-07)
[Abstract]
This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) ...
A pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs
(Elsevier B.V., 2023-05)
[Abstract]: Time series are key across industrial and research areas for their ability to model behaviour across time, making them ideal for a wide range of use cases such as event monitoring, trend prediction or anomaly ...
MPI-dot2dot: A Parallel Tool to Find DNA Tandem Repeats on Multicore Clusters
(Springer, 2022)
[Abstract] Tandem Repeats (TRs) are segments that occur several times in a DNA sequence, and each copy is adjacent to other. In the last few years, TRs have gained significant attention as they are thought to be related ...
SparkEC: speeding up alignment-based DNA error correction tools
(BioMed Central (Springer), 2022)
[Abstract]: In recent years, huge improvements have been made in the context of sequencing genomic data under what is called Next Generation Sequencing (NGS). However, the DNA reads generated by current NGS platforms are ...
CUDA-JMI: Acceleration of feature selection on heterogeneous systems
(Elsevier, 2020-01)
[Abstract]: Feature selection is a crucial step nowadays in machine learning and data analytics to remove irrelevant and redundant characteristics and thus to provide fast and reliable analyses. Many research works have ...
SMusket: Spark-based DNA error correction on distributed-memory systems
(Elsevier B.V., 2020)
[Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...
Real-time resource scaling platform for Big Data workloads on serverless environments
(2020)
The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such ...
SeQual-Stream: approaching stream processing to quality control of NGS datasets
(BMC, 2023-10)
[Abstract]: Background
Quality control of DNA sequences is an important data preprocessing step in many genomic analyses. However, all existing parallel tools for this purpose are based on a batch processing model, ...
BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction
(Elsevier, 2024-05)
[Abstract]: Despite the significant improvements in both throughput and cost provided by modern Next-Generation Sequencing (NGS) platforms, sequencing errors in NGS datasets can still degrade the quality of downstream ...
Serverless-like platform for container-based YARN clusters
(Elsevier, 2024-06)
[Abstract]: Serverless computing is an emerging paradigm that has gained a lot of relevance in recent years, as it allows users to consume computing resources without worrying about the underlying infrastructure and pay ...