Advances in High-Performance Computational Earth Sciences: Applications and Frameworks (IHPCES) Session 1

Time and Date: 14:10 - 15:50 on 13th June 2017

Room: HG D 3.2

Chair: Xing Cai

596	Demonstration of nonhydrostatic adaptive mesh dynamics for multiscale climate models [abstract] Abstract: Many of the atmospheric phenomena with the greatest potential impact in future warmer climates are inherently multiscale. Such meteorological systems include hurricanes and tropical cyclones, atmospheric rivers, and other types of hydrometeorological extremes. These phenomena are challenging to simulate in conventional climate models due to the relatively coarse uniform model resolutions relative to the native nonhydrostatic scales of the phenomonological dynamics. To enable studies of these systems with sufficient local resolution for the multiscale dynamics yet with sufficient speed for climate-change studies, we have developed a nonhydrostatic adaptive mesh dynamical core climate studies. In this talk, we present an adaptive, conservative finite volume approach for moist non-hydrostatic atmospheric dynamics. The approach is based on the compressible Euler equations on 3D thin spherical shells, where the radial direction is treated implicitly to eliminate time step constraints from vertical acoustic waves. The spatial discretization is the equiangular cubed-sphere mapping, with a fourth-order accurate discretization to compute flux averages on faces. By using both space-and time-adaptive mesh refinement, the solver allocates computational effort only where greater accuracy is needed. The main focus are demonstrations of the performance of this AMR dycore on idealized problems directly pertinent to atmospheric fluid dynamics. We start with test simulations of the shallow water equations on a sphere. We show that the accuracy of the AMR solutions is comparable to that of conventional, quasi-form mesh solutions at high resolution, yet the AMR solutions are attained with 10 to 100x fewer operations and hence much greater speed. The remainder of the talk concerns the performance of the dycore on a series of tests of increasing complexity from the Dynamical Core Model Intercomparison Project, including tests without and with idealized moist physics to emulate hydrological processes. The tests demonstrate that AMR dynamics is a viable and highly economical alternative for attaining the ultra-high resolutions required to reproduce atmospheric extreme phenomena with sufficient accuracy and fidelity.	William Collins, Hans Johansen, Christiane Jablonowski and Jared Ferguson
601	Exploring an Ensemble-Based Approach to Atmospheric Climate Modeling and Testing at Scale [abstract] Abstract: A strict throughput requirement has placed a cap on the degree to which we can depend on the execution of single, long, fine spatial grid simulations to explore global atmospheric climate behavior in more detail. Running an ensemble of short simulations is economical as compared to traditional long simulations for the same number of simulated years, making it useful for tests of climate reproducibility with non-bit for bit changes. We test the null hypothesis that the climate statistics of a full-complexity atmospheric model derived from an ensemble of independent short simulation is equivalent to that from a long simulation. The climate statistics of short simulation ensembles are statistically distinguishable from that of a long simulation in terms of the distribution of global annual means, largely due to the presence of low-frequency atmospheric intrinsic variability in the long simulation. We also find that model climate statistics of the simulation ensemble are sensitive to the choice of compiler optimizations. While some answer-changing optimization choices do not effect the climate state in terms of mean, variability and extremes, aggressive optimizations can result in significantly different climate states.	Salil Mahajan, Abigail Gaddis, Katherine Evans and Matthew Norman
404	Study of Algorithms for Fast Computation of Crack Expansion Problem [abstract] Abstract: A problem of quasi-static growth of an arbitrary shaped-crack along an interface requires many times of iterations not only for find a spatial distribution of discontinuity but also for determining the crack tip. This is crucial when refining model resolution and also when the phenomena progresses quickly from one step to another. We propose a mathematical reformulation of the problem as a nonlinear equation and adopt different numerical methods to solve it efficiently. Compared to a previous work of the authors, the resulting code shows a great improvement of performance. This gain is important for further application of aseismic slip process along the fault interface, in the context of plate convergence as well as the reactivation of fault systems in reservoirs.	Farid Smai and Hideo Aochi
433	TNT-NN: A Fast Active Set Method for Solving Large Non-Negative Least Squares Problems [abstract] Abstract: In 1974 Lawson and Hanson produced a seminal active set strategy to solve least-squares problems with non-negativity constraints that remains popular today. In this paper we present TNT-NN, a new active set method for solving non-negative least squares (NNLS) problems. TNT-NN uses a different strategy not only for the construction of the active set but also for the solution of the unconstrained least squares sub-problem. This results in dramatically improved performance over traditional active set NNLS solvers, including the Lawson and Hanson NNLS algorithm and the Fast NNLS (FNNLS) algorithm, allowing for computational investigations of new types of scientific and engineering problems. For the small systems tested (5000x5000 or smaller), it is shown that TNT-NN is up to 95x faster than FNNLS. Recent studies in rock magnetism have revealed a need for fast NNLS algorithms to address large problems (on the order of 10^5 x 10^5 or larger). We apply the TNT-NN algorithm to a representative rock magnetism inversion problem where it is 60x faster than FNNLS. We also show that TNT-NN is capable of solving large (45000x45000) problems more than 150x faster than FNNLS. These large test problems were previously considered to be unsolvable, due to the excessive execution time required by traditional methods.	Joseph Myre, Erich Frahm, David Lilja and Martin Saar

Advances in High-Performance Computational Earth Sciences: Applications and Frameworks (IHPCES) Session 2

Time and Date: 16:20 - 18:00 on 13th June 2017

Room: HG D 3.2

Chair: Xing Cai

590	Fast Finite Element Analysis Method Using Multiple GPUs for Crustal Deformation and its Application to Stochastic Inversion Analysis with Geometry Uncertainty [abstract] Abstract: Crustal deformation computation using 3-D high-fidelity models has been in heavy demand due to accumulation of observational data. This approach is computationally expensive and more than 100,000 repetitive computations are required for various application including Monte Carlo simulation, stochastic inverse analysis, and optimization. To handle the massive computation cost, we develop a fast Finite Element (FE) analysis method using multi-GPUs for crustal deformation. We use algorithms appropriate for GPUs and accelerate calculations such as sparse matrix-vector product. By reducing the computation time, we are able to conduct multiple crustal deformation computations in a feasible timeframe. As an application example, we conduct stochastic inverse analysis considering uncertainties in geometry and estimate coseismic slip distribution in the 2011 Tohoku Earthquake, by performing 360,000 crustal deformation computations for different 80,000,000 DOF FE models using the proposed method.	Takuma Yamaguchi, Kohei Fujita, Tsuyoshi Ichimura, Takane Hori, Muneo Hori and Lalith Wijerathne
599	Optimizing domain decomposition in an ocean model: the case of NEMO [abstract] Abstract: Earth System Models are critical tools for the study of our climate and its future trends. These models are in constant evolution and their growing complexity entails an incrementing demand of the resources they require. Since the cost of using these state-of-the-art models is huge, looking closely at the factors that are able to impact their computational performance is mandatory. In the case of the state-of-the-art ocean model NEMO (Nucleus for European Modelling of the Ocean), used in many projects around the world, not enough attention has been given to the domain decomposition. In this work we show the impact that the selection of a particular domain decomposition can have on computational performance and how the proposed methodology substantially improves it.	Oriol Tintó Prims, Mario Acosta, Miguel Castrillo, Ana Cortés, Alícia Sanchez, Kim Serradell and Francisco J. Doblas-Reyes
154	Data Management and Volcano Plume Simulation with Parallel SPH Method and Dynamic Halo Domains [abstract] Abstract: This paper presents data management and strategies for implementing smoothed particle hydrodynamics (SPH) method to simulate volcano plumes. These simulations require a careful definition of the domain of interest and multi-phase material involved in the flow, both of which change over time and involve transport over vast distances in a short time. Computational strategies are developed to overcome these challenges by building mechanisms for efficient creation and deletion of particles for simulation, parallel processing (using the message passing interface (MPI)) and a dynamically defined halo domain (a domain that "optimally" captures all the material involved in the flow). A background grid is adopted to reduce neighbor search costs and to decompose the domain. A Space Filing Curve (SFC) based ordering is used to assign unique identifiers to background grid entities and particles. Time-dependent SFC based indices are assigned to particles to guarantee uniqueness of the identifier. Both particles and background grids are managed by hash tables which can ensure quick and flexible access. An SFC based three dimensional (3D) domain decomposition and a dynamic load balancing strategy are implemented to ensure good load balance. Several strategies are developed to improve performance: dynamic halo domains, calibrated particle weight and optimized work load check intervals. Numerical tests show that our code has good scalability and performance. The strategies described in this paper can be further applied to many other implementations of mesh-free methods, especially those implementations that require flexibility in adding and deleting of particles.	Zhixuan Cao, Abani Patra and Matthew Jones