Mathematical Methods and Algorithms for Extreme Scale (MATH-EX) Session 1

Time and Date: 13:25 - 15:05 on 14th June 2017

Room: HG D 3.2

Chair: Vassil Alexandrov

356	Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning [abstract] Abstract: In this work we present new kernels for the generation and application of block-Jacobi preconditioners that accelerate the iterative solution of sparse linear systems on graphics processing units (GPUs). Our approach departs from the conventional LU factorization and decomposes the diagonal blocks of the matrix using the Gauss-Huard method. When enhanced with column pivoting, this method is as stable as LU with partial/row pivoting. Due to extensive use of GPU registers and integration of implicit pivoting, our variable size batched Gauss-Huard implementation outperforms the batched version of LU factorization. In addition, the application kernel combines the conventional two-stage triangular solve procedure, consisting of a backward solve followed by a forward solve, into a single stage that performs both operations simultaneously.	Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti and Andres E. Tomas
367	Parallel Modularity Clustering [abstract] Abstract: In this paper we develop a parallel approach for computing the modularity clustering often used to identify and analyse communities in social networks. We show that modularity can be approximated by looking at the largest eigenpairs of the weighted graph adjacency matrix that has been perturbed by a rank one update. Also, we generalize this formulation to identify multiple clusters at once. We develop a fast parallel implementation for it that takes advantage of the Lanczos eigenvalue solver and k-means algorithm on the GPU. Finally, we highlight the performance and quality of our approach versus existing state-of-the-art techniques.	Alexandre Fender, Nahid Emad, Serge Petiton and Maxim Naumov
405	Parallel Monte Carlo on Intel MIC Architecture [abstract] Abstract: The trade-off between the cost-efficiency of powerful computational accelerators and the increasing energy needed to perform numerical tasks can be tackled by implementation of algorithms on the Intel Multiple Integrated Cores (MIC) architecture. The best performance of the algorithms requires the use of appropriate optimization and parallelization approaches throughout all process of their design. Monte Carlo methods and Quasi-Monte Carlo methods depend on a huge number of computational cores. In this paper we present the advances in our studies on the performance of algorithms for solving multidimensional integrals on Intel MIC architecture and their comparison with the performance of Monte Carlo methods. The fast implementations are due to the high parallelism in the operations with the many coordinates of the sequences achieved with the Intel MIC architecture. These implementations are easy to be integrated and demonstrate high performance in terms of timing and computational speeds.	Emanouil Atanassov, Todor Gurov, Sofiya Ivanovska and Aneta Karaivanova