Search
Now showing items 1-10 of 255
Communication avoiding and overlapping for numerical linear algebra
(IEEE Computer Society, 2013-02-25)
[Abstract] To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor ...
Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL
(Springer, 2023)
[Abstract]: Motion Estimation is one of the main tasks behind any video encoder. It is a computationally costly task; therefore, it is usually delegated to specific or reconfigurable hardware, such as FPGAs. Over the years, ...
Efficient Culling Techniques for Interactive Deformable NURBS Surfaces on GPU
(SciTePress, 2016-02)
[Abstrtact] InfoValue: NURBS (Non-uniform rational B-splines) surfaces are the standard freeform representation in Computer-Aided Design (CAD) applications. Rendering NURBS surfaces accurately while they are interactively ...
Parallel Sparse Modified Gram-Schmidt QR Decomposition
(Springer, 1996)
[Abstract] We present a parallel computational method for the QR decomposition with column pivoting of a sparse matrix by means of Modified Gram-Schmidt orthogonalization. Nonzero elements of the matrix M to be decomposed ...
Non-blocking Java Communications Support on Clusters
(Springer, 2006)
[Abstract] This paper presents communication strategies for supporting efficient non-blocking Java communication on clusters. The communication performance is critical for the overall cluster performance. It is possible ...
Evaluation of Parallel Differential Evolution Implementations on MapReduce and Spark
(Springer, 2017-09)
[Abstract] Global optimization problems arise in many areas of science and engineering, computational and systems biology and bioinformatics among them. Many research efforts have focused on developing parallel metaheuristics ...
Facilitating the development of stencil applications using the Heterogeneous Programming Library
(2017)
[Abstract] Stencil computations are very common in scientific codes. Heterogeneous systems achieve good results solving these problems, but their programming is complex because of the ghost regions required in multi-device ...
A SIMD Algorithm for the Detection of Epistatic Interactions of Any Order
(Elsevier, 2022)
[Abstract] Epistasis is a phenomenon in which a phenotype outcome is determined by the interaction of genetic variation at two or more loci and it cannot be attributed to the additive combination of effects corresponding ...
BetaGPU: Harnessing GPU power for parallelized beta distribution functions
(Elsevier, 2025-02)
[Abstract]: The efficient computation of Beta distribution functions, particularly the Probability Density Function (PDF) and Cumulative Distribution Function (CDF), is critical in various scientific fields, including ...
Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model
(2018)
[Abstract]:
Cache performance is particularly hard to predict in modern multicore processors as several threads can be concurrently in execution, and private cache levels are combined with shared ones. This paper presents ...