• Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model 

      Andrade, Diego; Fraguela, Basilio B.; Doallo, Ramón (2018)
      [Abstract]: Cache performance is particularly hard to predict in modern multicore processors as several threads can be concurrently in execution, and private cache levels are combined with shared ones. This paper presents ...
    • Parallel Sparse Modified Gram-Schmidt QR Decomposition 

      Doallo, Ramón; Fraguela, Basilio B.; Touriño, Juan; Zapata, Emilio L. (Springer, 1996)
      [Abstract] We present a parallel computational method for the QR decomposition with column pivoting of a sparse matrix by means of Modified Gram-Schmidt orthogonalization. Nonzero elements of the matrix M to be decomposed ...
    • Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures 

      Mallón, Damián A.; López Taboada, Guillermo; Teijeiro Barjas, Carlos; Touriño, Juan; Fraguela, Basilio B.; Gómez, Andrés; Doallo, Ramón; Mouriño, José C. (Springer, 2009)
      [Abstract] The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. ...
    • Performance Evaluation of Unified Parallel C Collective Communications 

      López Taboada, Guillermo; Teijeiro Barjas, Carlos; Touriño, Juan; Fraguela, Basilio B.; Doallo, Ramón; Mouriño, José C.; Mallón, Damián A.; Gómez, Andrés (IEEE Computer Society, 2009-07-17)
      [Abstract] Unified Parallel C (UPC) is an extension of ANSI C designed for parallel programming. UPC collective primitives, which are part of the UPC standard, increase programming productivity while reducing the communication ...
    • Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs 

      López Castro, Roberto; Andrade, Diego; Fraguela, Basilio B. (Institute of Electrical and Electronics Engineers, 2022)
      [Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...
    • Servet: A Benchmark Suite for Autotuning on Multicore Clusters 

      González-Domínguez, Jorge; López Taboada, Guillermo; Fraguela, Basilio B.; Martín, María J.; Touriño, Juan (Institute of Electrical and Electronics Engineers, 2010-05-24)
      [Abstract] MapReduce is a powerful tool for processing large data sets used by many applications running in distributed environments. However, despite the increasing number of computationally intensive problems that require ...
    • The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism 

      Fraguela, Basilio B.; Andrade, Diego (Springer, 2022)
      [Abstract] Data-flow computing is a natural and convenient paradigm for expressing parallelism. This is particularly true for tools that automatically extract the data dependencies among the tasks while allowing to exploit ...
    • VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores 

      López Castro, Roberto; Ivanov, Andrei; Andrade, Diego; Ben-Nun, Tal; Fraguela, Basilio B.; Hoefler, Torsten (Association for Computing Machinery, 2023-11)
      [Abstract]: The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated ...