• Multithreaded and Spark parallelization of feature selection filters 

      Eiras-Franco, Carlos; Bolón-Canedo, Verónica; Ramos Garea, Sabela; González-Domínguez, Jorge; Alonso-Betanzos, Amparo; Touriño, Juan (2016)
      [Abstract]: Vast amounts of data are generated every day, constituting a volume that is challenging to analyze. Techniques such as feature selection are advisable when tackling large datasets. Among the tools that provide ...
    • Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++ 

      Liu, Yongchao; Schmidt, Bertil; González-Domínguez, Jorge (Johannes Gutenberg University Mainz, 2016)
      [Abstract]: The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown ...
    • Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis 

      González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J. (Elsevier Ireland Ltd., 2017)
      [Abstract] Background and objectives The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy ...
    • Parallel feature selection for distributed-memory clusters 

      González-Domínguez, Jorge; Bolón-Canedo, Verónica; Freire, Borja; Touriño, Juan (2019)
      [Abstract]: Feature selection is nowadays an extremely important data mining stage in the field of machine learning due to the appearance of problems of high dimensionality. In the literature there are numerous feature ...
    • Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures 

      González-Domínguez, Jorge; Ramos Garea, Sabela; Touriño, Juan; Schmidt, Bertil (Institute of Electrical and Electronics Engineers, 2016-08)
      [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on ...
    • Parallel-FST: A feature selection library for multicore clusters 

      Beceiro, Bieito; González-Domínguez, Jorge; Touriño, Juan (Elsevier, 2022-11)
      [Abstract]: Feature selection is a subfield of machine learning focused on reducing the dimensionality of datasets by performing a computationally intensive process. This work presents Parallel-FST, a publicly available ...
    • Parallelization of ARACNe, an Algorithm for the Reconstruction of Gene Regulatory Networks 

      Casal, Uxía; González-Domínguez, Jorge; Martín, María J. (M D P I AG, 2019-07-31)
      [Abstract] Gene regulatory networks are graphical representations of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression. There are different computational ...
    • Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems 

      González-Domínguez, Jorge; Wienbrandt, Lars; Kässens, Jan Christian; Ellinghaus, David; Schimmler, Manfred; Schmidt, Bertil (Institute of Electrical and Electronics Engineers, 2015)
      [Abstract] High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide ...
    • PARamrfinder: detecting allele-specific DNA methylation on multicore clusters 

      Fernández Fraga, Alejandro; González-Domínguez, Jorge; Martín, María J. (Springer, 2024-01)
      [Abstract]: The discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, ...
    • ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems 

      González-Domínguez, Jorge; Expósito, Roberto R. (PLoS, 2018)
      [Abstract]: Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, ...
    • ParDRe: faster parallel duplicated reads removal tool for sequencing studies 

      González-Domínguez, Jorge; Schmidt, Bertil (Oxford University Press, 2016)
      [Abstract] Summary: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but ...
    • ParRADMeth: Identification of Differentially Methylated Regions on Multicore Clusters 

      Fernández Fraga, Alejandro; González-Domínguez, Jorge; Touriño, Juan (IEEE, 2023)
      [Abstract]: The discovery of Differentially Methylated (DM) regions is an important research field in biology, as it can help to anticipate the risk of suffering from specific diseases. Nevertheless, the high computational ...
    • parSRA: A framework for the parallel execution of short read aligners on compute clusters 

      González-Domínguez, Jorge; Hundt, Christian; Schmidt, Bertil (2018)
      [Abstract]: The growth of next generation sequencing datasets poses as a challenge to the alignment of reads to reference genomes in terms of both accuracy and speed. In this work we present parSRA, a parallel framework ...
    • PATO: genome-wide prediction of lncRNA-DNA triple helices 

      Amatria Barral, Iñaki; González-Domínguez, Jorge; Touriño, Juan (Oxford University Press, 2023-03)
      [Abstract]: Motivation: Long non-coding RNA (lncRNA) plays a key role in many biological processes. For instance, lncRNA regulates chromatin using different molecular mechanisms, including direct RNA-DNA hybridization via ...
    • Performance Evaluation of Sparse Matrix Products in UPC 

      González-Domínguez, Jorge; García-López, Óscar; López Taboada, Guillermo; Martín, María J.; Touriño, Juan (Springer New York LLC, 2013-04)
      [Abstract] Unified Parallel C (UPC) is a Partitioned Global Address Space (PGAS) language whose popularity has increased during the last years owing to its high programmability and reasonable performance through an efficient ...
    • pRIblast: A highly efficient parallel application for comprehensive lncRNA–RNA interaction prediction 

      Amatria Barral, Iñaki; González-Domínguez, Jorge; Touriño, Juan (Elsevier, 2023-01)
      [Abstract]: Long non-coding RNAs (lncRNAs) play a key role in several biological processes and scientists are constantly trying to come up with new strategies to elucidate their functions. One common approach to characterize ...
    • PyToxo: a Python tool for calculating penetrance tables of high-order epistasis models 

      González-Seoane, Borja; Ponte-Fernández, Christian; González-Domínguez, Jorge; Martín, María J. (BMC, 2022)
      [Abstract] Background Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area ...
    • Scalable PGAS collective operations in NUMA clusters 

      Mallón, Damián A.; Teijeiro Barjas, Carlos; González-Domínguez, Jorge; López Taboada, Guillermo; Gómez, Andrés (Springer New York LLC, 2014-12)
      [Abstract] The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores ...
    • ScalaParBiBit: Scaling the Binary Biclustering in Distributed-Memory Systems 

      Fraguela, Basilio B.; Andrade, Diego; González-Domínguez, Jorge (SpringerLink, 2021-03-19)
      [Abstract] Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, ...
    • SeQual: Big Data Tool to Perform Quality Control and Data Preprocessing of Large NGS Datasets 

      Expósito, Roberto R.; Galego Torreiro, Roi; González-Domínguez, Jorge (Institute of Electrical and Electronics Engineers, 2020-08-07)
      [Abstract] This paper presents SeQual, a scalable tool to efficiently perform quality control of large genomic datasets. Our tool currently supports more than 30 different operations (e.g., filtering, trimming, formatting) ...