Buscar
Mostrando ítems 1-10 de 11
ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
(PLoS, 2018)
[Abstract]: Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, ...
SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets
(Institute of Electrical and Electronics Engineers, 2020-08-07)
[Abstract]
This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) ...
BigDEC: A multi-algorithm Big Data tool based on the k-mer spectrum method for scalable short-read error correction
(Elsevier, 2024-05)
[Abstract]: Despite the significant improvements in both throughput and cost provided by modern Next-Generation Sequencing (NGS) platforms, sequencing errors in NGS datasets can still degrade the quality of downstream ...
Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform
(Springer Netherlands, 2013-12)
[Abstract] Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well ...
The Servet 3.0 benchmark suite: characterization of network performance degradation
(Pergamon Press, 2013-11)
[Abstract] Servet is a suite of benchmarks focused on extracting a set of parameters with high influence on the overall performance of multicore clusters. These parameters can be used to optimize the performance of parallel ...
MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud
(Oxford University Press, 2017)
[Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted ...
MPI-dot2dot: A Parallel Tool to Find DNA Tandem Repeats on Multicore Clusters
(Springer, 2022)
[Abstract] Tandem Repeats (TRs) are segments that occur several times in a DNA sequence, and each copy is adjacent to other. In the last few years, TRs have gained significant attention as they are thought to be related ...
HSRA: Hadoop-based spliced read aligner for RNA sequencing data
(Public Library of Science, 2018-07-31)
[Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference ...
CUDA-JMI: Acceleration of feature selection on heterogeneous systems
(Elsevier, 2020-01)
[Abstract]: Feature selection is a crucial step nowadays in machine learning and data analytics to remove irrelevant and redundant characteristics and thus to provide fast and reliable analyses. Many research works have ...
SMusket: Spark-based DNA error correction on distributed-memory systems
(Elsevier B.V., 2020)
[Abstract]: Next-Generation Sequencing (NGS) technologies have revolutionized genomics research over the last decade, bringing new opportunities for scientists to perform groundbreaking biological studies. Error correction ...