ICCS 2015 Main Track (MT) Session 3

Time and Date: 16:40 - 18:20 on 1st June 2015

Room: M101

Chair: Gabriele Maria Lozito

37	Improving OpenCL programmability with the Heterogeneous Programming Library [abstract] Abstract: The use of heterogeneous devices is becoming increasingly widespread. Their main drawback is their low programmability due to the large amount of details that must be handled. Another important problem is the reduced code portability, as most of the tools to program them are vendor or device-specfic. The exception to this observation is OpenCL, which largely suffers from the reduced programmability problem mentioned, particularly in the host side. The Heterogeneous Programming Library (HPL) is a recent proposal to improve this situation, as it couples portability with good programmability. While the HPL kernels must be written in a language embedded in C++, users may prefer to use OpenCL kernels for several reasons such as their growing availability or a faster development from existing codes. In this paper we extend HPL to support the execution of native OpenCL kernels and we evaluate the resulting solution in terms of performance and programmability, achieving very good results.	Moises Vinas, Basilio B. Fraguela, Zeki Bozkus, Diego Andrade
241	Efficient Particle-Mesh Spreading on GPUs [abstract] Abstract: The particle-mesh spreading operation maps a value at an arbitrary particle position to con- tributions at regular positions on a mesh. This operation is often used when a calculation involving irregular positions is to be performed in Fourier space. We study several approaches for particle-mesh spreading on GPUs. A central concern is the use of atomic operations. We are also concerned with the case where spreading is performed multiple times using the same particle configuration, which opens the possibility of preprocessing to accelerate the overall com- putation time. Experimental tests show which algorithms are best under which circumstances.	Xiangyu Guo, Xing Liu, Peng Xu, Zhihui Du, Edmond Chow
279	AMA: Asynchronous Management of Accelerators for Task-based Programming Models [abstract] Abstract: Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set of optimization techniques to efficiently manage multi-accelerator systems. AMA maximizes the overlap of computation and communication in a blocking-free way. Then, we can use such spare time to do other work while waiting for device operations. Implemented on top of a task-based framework, the experimental evaluation of AMA on a quad-GPU node shows that we reach the performance of a hand-tuned native CUDA code, with the advantage of fully hiding the device management. In addition, we obtain up to more than 2x performance speed-up with respect to the original framework implementation.	Judit Planas, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta
286	Adaptive Partitioning for Irregular Applications on Heterogeneous CPU-GPU Chips [abstract] Abstract: Commodity processors are comprised of several CPU cores and one integrated GPU. To fully exploit this type of architectures, one needs to automatically determine how to partition the workload between both devices. This is specially challenging for irregular workloads, where each iteration's work is data dependent and shows control and memory divergence. In this paper, we present a novel adaptive partitioning strategy specially designed for irregular applications running on heterogeneous CPU-GPU chips. The main novelty of this work is that the size of the workload assigned to the GPU and CPU adapts dynamically to maximize the GPU and CPU utilization while balancing the workload among the devices. Our experimental results on an Intel Haswell architecture using a set of irregular benchmarks show that our approach outperforms exhaustive static and adaptive state-of-the-art approaches in terms of performance and energy consumption.	Antonio Vilches, Rafael Asenjo, Angeles Navarro, Francisco Corbera, Ruben Gran, Maria Garzaran
304	Using high performance algorithms for the hybrid simulation of disease dynamics on CPU and GPU [abstract] Abstract: In the current work the authors present several approaches to the high performance simulation of human diseases propagation using hybrid two-component imitational models. The models under study were created by coupling compartmental and discrete-event submodels. The former is responsible for the simulation of the demographic processes in a population while the latter deals with a disease progression for a certain individual. The number and type of components used in a model may vary depending on the research aims and data availability. The introduced high performance approaches are based on batch random number generation, distribution of simulation runs and the calculations on graphical processor units. The emphasis was made on the possibility to use the approaches for various model types without considerable code refactoring for every particular model. The speedup gained was measured on simulation programs written in C++ and MATLAB for the models of HIV and tuberculosis spread and the models of tumor screening for the prevention of colorectal cancer. The benefits and drawbacks of the described approaches along with the future directions of their development are discussed.	Vasiliy Leonenko, Nikolai Pertsev, Marc Artzrouni