Advances in High-Performance Computational Earth Sciences: Applications and Frameworks (IHPCES) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Cockatoo

Chair: Yifeng Cui

549	Inside the Pascal GPU Architecture and Benefits to Seismic Applications (Invited) [abstract] Abstract: Stencil computations are one of the major computational patterns for seismic applications. In this talk I will first describe techniques to implement stencil computations efficiently on GPU. Then I will introduce the Pascal architecture in NVIDIA’s latest Tesla P100 GPU, especially focusing on new architecture features such as HBM2 and NVLINK. I will highlight how those features will enable significant performance improvement for seismic applications. Pascal also introduces GPU page fault which enables Unified Virtual Memory on GPU. I will illustrate how UVM will simplify GPU programming by removing the need to manage GPU data manually in the code while still get good performance in most cases. Bio: Peng Wang is a senior engineer in the HPC developer technology group of NVIDIA, where he works on parallelizing and optimizing scientific applications on GPU. One of his main focuses is on optimizing seismic algorithms on GPU. He got his Ph.D. in computational astrophysics from Stanford University.	Peng Wang
433	High-productivity Framework for Large-scale GPU/CPU Stencil Applications [abstract] Abstract: A high-productivity framework for multi-GPU and multi-CPU computation of stencil applications is proposed. Our framework is implemented in C++ and CUDA languages. It automatically translates user-written stencil functions that update a grid point and generates both GPU and CPU codes. The programmers write user code just in the C++ language, and can execute the translated user code on either multiple multicore CPUs or multiple GPUs with optimization. The user code can be executed on multiple GPUs with the auto-tuning mechanism and the overlapping method to hide communication cost by computation. It can be also executed on multiple CPUs with OpenMP. The compressible flow code on GPU exploiting the optimizations provided by the framework has achieved 2.7 times faster than the non-optimized version.	Takashi Shimokawabe, Takayuki Aoki, Naoyuki Onodera
305	GPU acceleration of a non-hydrostatic ocean model with a multigrid Poisson/Helmholtz solver [abstract] Abstract: To meet the demand for fast and detailed calculations in numerical ocean simulations, we implemented a non-hydrostatic ocean model on a graphics processing unit (GPU). We improved the model’s Poisson/Helmholtz solver by optimizing the memory access, using instruction-level parallelism, and applying a mixed precision calculation to the preconditioning of the Poisson/Helmholtz solver. The GPU-implemented model was 4.7 times faster than a comparable central processing unit execution. The output errors due to this implementation will not significantly influence oceanic studies.	Takateru Yamagishi, Yoshimasa Matsumura