ICCS 2016 Main Track (MT) Session 5

Time and Date: 14:10 - 15:50 on 7th June 2016

Room: KonTiki Ballroom

Chair: Sreya Gosh

146	Optimization of Parallel Legendre Transform using Graphics Processing Unit (GPU) for a Geodynamo Code [abstract] Abstract: Convection and magnetic field of the Earth's outer core are expected to have vast length scales. To resolve these flows at extremely high spatial resolution, high performance computing is required for geodynamo simulations. Calypso has been shown to scale well on computer clusters capable of computing at the order of 10⁵ cores using Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) parallelization for CPUs. Calypso is designed to model magnetohydrodynamics (MHD) of a Boussinesq fluid in a rotating spherical shell, such as the outer core of Earth. Calypso employs a pseudo-spectral method that requires both a forward and backward spherical harmonic transform (SHT) in each iteration of the time integration loop. The time complexity of SHT is dominated by the time complexity of Legendre transform which is O(N^3). We investigate time efficiency of three different algorithms of the SHT using graphics processing units (GPUs). One is to preemptively compute the Legendre polynomials on the CPU before executing Legendre transform on the GPU. In the second approach, the Legendre polynomials are computed “on-the-fly” on a need-by-basis. In the third approach, we employ sum reduction intrinsics offered by the library cuda Unbound. We examine the trade-offs between space and time on Maverick, a Texas Advanced Computing Center (TACC) supercomputer using a GPU enabled Legendre transform. The first algorithm leads to a memory bottleneck that can be alleviated by utilizing the memory hierarchy of Nvidia GPUs. Moreover, coalesced reads and prefetching can further reduce the memory bottleneck. The second algorithm is highly computationally intensive relative to memory bandwidth utilization as a result of repetitive computations. The third algorithm is well-balanced between computational and memory utilization. We conclude that the most-efficient approach is a hybrid model that involves a look-up table as well as on-the-fly computations over the GPU.	Harsha Lokavarapu and Hiroaki Matsui
49	A Case Study in Adjoint Sensitivity Analysis of Parameter Calibration Problems [abstract] Abstract: Adjoint sensitivity computation of parameter estimation problems is a widely used technique in the field of computational science and engineering for retrieving derivatives of a cost func- tional with respect to parameters efficiently. Those derivatives can be used, e.g., for sensitivity analysis, optimization, or robustness analysis. Deriving and implementing adjoint code is an error-prone, non-trivial task which can be avoided by using Algorithmic Differentiation (AD) software. Generating adjoint code by AD software has the downside of usually requiring a huge amount of memory as well as a non-optimal run time. In this article, we couple two approaches for achieving both, a robust and efficient adjoint code: symbolically derived adjoint formula- tions are coupled with AD. Comparisons are carried out for a real-world case study originating from the remote atmospheric sensing simulation software JURASSIC developed at the Institute of Energy and Climate Research – Stratosphere, Research Center Jülich. We show, that the coupled approach outperforms the fully algorithmic approach by AD in terms of run time and memory requirement and argue that this can be achieved while still preserving the desireable feature of AD being automatic.	Johannes Lotz, Marc Schwalbach, Uwe Naumann
80	Tuning the Computation of the Interpolation Operator in an Algebraic Multigrid Solver [abstract] Abstract: In this paper, we discuss strategies for computing subsets of eigenvectors of matrices corresponding to subdomains of finite element meshes achieving compromise between two contradicting goals. The subset of eigenvectors is required in the construction of coarse spaces used in algebraic multigrid methods (AMG) as well as in certain domain decomposition (DD) methods. The quality of the coarse spaces depends on the number of eigenvectors, which improves the approximation properties of the coarse space and impacts the overall performance and convergence of the associated AMG or DD algorithms. However, a large number of eigenvectors reflects negatively the sparsity of the corresponding coarse matrices, which can become fairly dense. The sparsity of the coarse matrices can be controlled to a certain extent by the size of the subdomains (union of finite elements) referred to as agglomerates. If the size of the agglomerates is too large, then the cost of the eigensolvers increases and eventually can become unacceptable for the purpose of constructing the AMG or DD solvers. This paper investigates strategies to optimize the solution of the partial eigenproblems of interest. In particular, we examine direct and iterative eigensolvers for computing those subsets. Our experiments with synthetic meshes and with a well-known model of an oil-reservoir simulation benchmark indicate that iterative eigensolvers can lead to significant improvements in the overall performance of an AMG solver that exploits such spectral construction of coarse spaces.	Osni Marques, Alex Druinsky, Xiaoye Li, Andrew Barker, Delyan Kalchev, Panayot Vassilevski
413	Induced Dimension Reduction method for solving linear matrix equations [abstract] Abstract: This paper discusses the solution of large-scale linear matrix equations using the Induced Dimension reduction method (IDR(s)). IDR(s) was originally presented to solve system of linear equations, and is based on the IDR(s) theorem. We generalize the IDR(s) theorem to solve linear problems in any finite-dimensional space. This generalization allows us to develop IDR(s) algorithms to approximate the solution of linear matrix equations. The IDR(s) method presented here has two main advantages; firstly, it does not require the computation of inverses of any matrix, and secondly, it allows incorporation of preconditioners. Additionally, we present a simple preconditioner to solve the Sylvester equation based on a fixed point iteration. Several numerical examples illustrate the performance of IDR($s$) for solving linear matrix equations. We also present the software implementation.	Reinaldo Astudillo, Martin van Gijzen
134	A Cylindrical Basis Function for Solving Partial Differential Equations on Manifolds [abstract] Abstract: Numerical solutions of partial differential equations (PDEs) on manifolds continues to generate a lot of interest among scientists in the natural and applied sciences. On the other hand, recent developments of 3D scanning and computer vision technologies have produced a large number of 3D surface models represented as point clouds. Herein, we develop a simple and efficient method for solving PDEs on closed surfaces represented as point clouds. By projecting the radial vector of standard radial basis function(RBF) kernels onto the local tangent plane, we are able to produce a representation of functions that permits the replacement of surface differential operators with their Cartesian equivalent. We demonstrate, numerically, the efficiency of the method in discretizing the Laplace Beltrami operator.	Emmanuel Asante-Asamani, Lei Wang, Zeyun Yu