Search
Now showing items 1-9 of 9
Automated and accurate cache behavior analysis for codes with irregular access patterns
(John Wiley & Sons Ltd., 2007-04-03)
[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal ...
Program Behavior Characterization Through Advanced Kernel Recognition
(Springer, 2007)
[Abstract] Understanding program behavior is at the foundation of program optimization. Techniques for automatic recognition of program constructs (from now on, computational kernels) characterize the behavior of program ...
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives
(Springer New York LLC, 2016-06)
[Abstract] The use of GPUs for general purpose computation has increased dramatically in the past years due to the rising demands of computing power and their tremendous computing capacity at low cost. Hence, new programming ...
A Grid Portal for an Undergraduate Parallel Programming Course
(Institute of Electrical and Electronics Engineers, 2005-08)
[Abstract] This paper describes an experience of designing and implementing a portal to support transparent remote access to supercomputing facilities to students enrolled in an undergraduate parallel programming course. ...
A Novel Compiler Support for Automatic Parallelization on Multicore Systems
(Elsevier, 2013-09)
[Abstract] The widespread use of multicore processors is not a consequence of significant advances in parallel programming. In contrast, multicore processors arise due to the complexity of building power-efficient, ...
An Inspector-Executor Algorithm for Irregular Assignment Parallelization
(Springer, 2004)
[Abstract] A loop with irregular assignment computations contains loop-carried output data dependences that can only be detected at run-time. In this paper, a load-balanced method based on the inspector-executor model is ...
Efficient Parallel Numerical Solver for the Elastohydrodynamic Reynolds–Hertz Problem
(Elsevier BV * North-Holland, 2001-12-01)
[Abstract] This work presents a parallel version of a complex numerical algorithm for solving an elastohydrodynamic piezoviscous lubrication problem studied in tribology. The numerical algorithm combines regula falsi, fixed ...
XARK: an extensible framework for automatic recognition of computational kernels
(Association for Computing Machinery, 2008-10)
[Abstract] The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development ...
Compiler support for parallel code generation through kernel recognition
(IEEE Computer Society, 2004-06-07)
[Abstract] Summary form only given. The automatic parallelization of loops that contain complex computations is still a challenge for current parallelizing compilers. The main limitations are related to the analysis of ...