ICCS 2019 Main Track (MT) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2019

Room: 1.5

Chair: Howard Stamato

67 Efficient Computation of Sparse Higher Derivative Tensors [abstract]
Abstract: The computation of higher derivatives tensors is expensive even for adjoint algorithmic differentiation methods. In this work we introduce methods to exploit the symmetry and the sparsity structure of higher derivatives to considerably improve the efficiency of their computation. The proposed methods apply coloring algorithms to two-dimensional compressed slices of the derivative tensors. The presented work is a step towards feasibility of higher-order methods which might benefit numerical simulations in numerous applications of computational science and engineering.
Jens Deussen and Uwe Naumann
120 Being Rational about Approximating Scientific Data [abstract]
Abstract: Scientific datasets are becoming increasingly challenging to transfer, analyze, and store. There is a need for methods to transform these datasets into compact representations that facilitate their downstream management and analysis, and ideally model the underlying scientific phenomena with defined numerical fidelity. To address this need, we propose nonuniform rational B-splines (NURBS) for modeling discrete scientific datasets; not only to compress input data points, but also to enable further analysis directly on the continuous fitted model, without the need for decompression. First, we evaluate three different methods for NURBS fitting, and compare their performance relative to unweighted least squares approximation (B-splines). We then extend current state-of-the-art B-spline adaptive approximation to NURBS; that is, adaptively determining optimal rational basis functions and weighted control point locations that approximate given input data points to prespecified accuracy. Additionally, we present a novel local adaptive algorithm to iteratively approximate large data input domains. This method takes advantage of NURBS local support to refine regions of the approximated model, acting locally on both input and model subdomains, without affecting other regions of the global approximation. We evaluate our methods in terms of approximated model compactness, achieved accuracy, and computational cost on both synthetic smooth functions and real-world scientific data.
Youssef Nashed, Tom Peterka, Vijay Mahadevan and Iulian Grindeanu
336 Design of a High-Performance Tensor-Vector Multiplication with BLAS [abstract]
Abstract: Tensor contraction is an important mathematical operation for many scientific computing applications that use tensors to store massive multidimensional data. Based on the Loops-over-GEMMs (LOG) approach, this paper discusses the design of high-performance algorithms for the mode-q tensor-vector multiplication using efficient implementations of the matrix-vector multiplication (GEMV). Given dense tensors with any non-hierarchical storage format, tensor order and dimensions, the proposed algorithms either directly call GEMV with tensors or recursively apply GEMV on higher-order tensor slices multiple times. We analyze strategies for loop-fusion and parallel execution of slice-vector multiplications with higher-order tensor slices. Using OpenBLAS, our implementations attain up to 113% of the GEMV's peak performance. Our parallel version of the tensor-vector multiplication achieves speedups of up to 12.6x over other state-of-the-art approaches.
Cem Bassoy
388 High Performance Partial Coherent X-ray Ptychography [abstract]
Abstract: During the last century, X-ray science has enabled breakthrough discoveries in fields as diverse as medicine, material science or electronics, and recently, ptychography has risen as a reference imaging technique in the field. It provides resolutions of a billionth of a meter, macroscopic field of view, or the capability to retrieve chemical or magnetic contrast, among other features. The goal of ptychography is to reconstruct a 2D visualization of a sample from a collection of diffraction patterns generated from the interaction of a light source with the sample. Reconstruction involves solving a nonlinear optimization problem employing a large amount of measured data —typically two orders of magnitude bigger than the reconstructed sample— so high performance solutions are normally required. A common problem in ptychography is that the majority of the flux from the light sources is often discarded to define the coherence of an illumination. Gradient Decomposition of the Probe (GDP) is a novel method devised to address this issue. It provides the capability to significantly improve the quality of the image when partial coherence effects take place, at the expense of a three-fold increase of the memory requirements and computation. This downside, along with the fine-grained degree of parallelism of the operations involved in GDP, makes it an ideal target for GPU acceleration. In this paper we propose the first high performance implementation of GDP for partial coherence X-ray ptychography. The proposed solution exploits an efficient data layout and multi-gpu parallelism to achieve massive acceleration and efficient scaling. The experimental results demonstrate the enhanced reconstruction quality and performance of our solution, able process up to 4 million input samples per second on a single high-end workstation, and compare its performance with a reference HPC ptychography pipeline.
Pablo Enfedaque, Stefano Marchesini, Huibin Chang, Bjoern Enders and David Shapiro
452 Monte Carlo Analysis of Local Cross-Correlation ST-TBD Algorithm [abstract]
Abstract: The Track-Before-Detect (TBD) algorithms allow the estimation of the state of an object, even if the signal is hidden in the background noise. The application of local cross-correlation for modified Information Update formula improves this estimation for extended objects (tens of cells in the measurement space) compared to direct application of the Spatio-Temporal TBD (ST-TBD) algorithm. Monte Carlo test was applied to evaluate algorithms by using a variable standard deviation of additive Gaussian noise. Proposed solution does not require prior knowledge of the size or measured values of the object.
Przemyslaw Mazurek and Robert Krupinski