Numerical and computational developments to advance multi-scale Earth System Models (MSESM) Session 1

Time and Date: 10:15 - 11:55 on 2nd June 2015

Room: M208

Chair: K.J. Evans

141	Progress in Fast, Accurate Multi-scale Climate Simulations [abstract] Abstract: We present a survey of physical and computational techniques that have the potential to contribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enabling improved accuracy and fidelity in simulation of dynamics and allow more complete representations of climate features at the global scale. At the same time, partnerships with computer science teams have focused on taking advantage of evolving computer architectures, such as many-core processors and GPUs, so that these approaches that were previously prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas are poised to transform climate modeling in the coming decades.	William Collins, Katherine Evans, Hans Johansen, Carol Woodward, Peter Caldwell
107	Parallel Performance Optimizations on Unstructured Mesh-Based Simulations [abstract] Abstract: This paper addresses two key parallelization challenges in the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intra-node data movement and maximize data reuse. The techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2x. Additionally, many of these solutions and can be broadly applied to a wide variety of unstructured grid-based computations.	Abhinav Sarje, Sukhyun Song, Douglas Jacobsen, Kevin Huck, Jeffrey Hollingsworth, Allen Malony, Samuel Williams, Leonid Oliker
565	A Higher-Order Finite Volume Nonhydrostatic Dynamical Core with Space-Time Refinement [abstract] Abstract: We present an adaptive non-hydrostatic dynamical core based on a higher-order finite volume discretization on the cubed sphere. Adaptivity is both in space, using nested horizontal refinement; and in time, using subcycling in refined regions. The algorithm is able to maintain scalar conservation with careful flux construction at refinement boundaries, as well as conservative coarse-fine interpolation. We show results for simple tests as well as more challenging ones that highlight the benefits of refinement.	Hans Johansen

Numerical and computational developments to advance multi-scale Earth System Models (MSESM) Session 2

Time and Date: 14:10 - 15:50 on 2nd June 2015

Room: M208

Chair: K.J. Evans

97	On the scalability of the Albany/FELIX first-order Stokes approximation ice sheet solver for large-scale simulations of the Greenland and Antarctic ice sheets [abstract] Abstract: We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow [1]. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems, which arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multi-level (ML) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the ML preconditioner results in faster linear solve times but the ILU preconditioner exhibits better scalability. A weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. Here, we show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the ML preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the ML preconditioner. [1] I. Kalashnikova, M. Perego, A. Salinger, R. Tuminaro, and S. Price. Albany/FELIX: A parallel, scalable and robust finite element higher-order stokes ice sheet solver built for advance analysis. Geosci. Model Develop. Discuss., 7:8079-8149, 2014.	Irina Kalashnikova, Raymond Tuminaro, Mauro Perego, Andrew Salinger, Stephen Price
145	On the Use of Finite Difference Matrix-Vector Products in Newton-Krylov Solvers for Implicit Climate Dynamics with Spectral Elements [abstract] Abstract: Efficient solutions of global climate models require effectively handling disparate length and time scales. Implicit solution approaches allow time integration of the physical system with a step size governed by accuracy of the processes of interest rather than by stability of the fastest time scales present. Implicit approaches, however, require the solution of nonlinear systems within each time step. Usually, a Newton's method is applied to solve these systems. Each iteration of the Newton's method, in turn, requires the solution of a linear model of the nonlinear system. This model employs the Jacobian of the problem-defining nonlinear residual, but this Jacobian can be costly to form. If a Krylov linear solver is used for the solution of the linear system, the action of the Jacobian matrix on a given vector is required. In the case of spectral element methods, the Jacobian is not calculated but only implemented through matrix-vector products. The matrix-vector multiply can also be approximated by a finite difference approximation which may introduce inaccuracy in the overall nonlinear solver. In this paper, we review the advantages and disadvantages of finite difference approximations of these matrix-vector products for climate dynamics within the spectral element shallow water dynamical core of the Community Atmosphere Model (CAM).	Carol Woodward, David Gardner, Katherine Evans
503	Accelerating Time Integration for Climate Modeling Using GPUs [abstract] Abstract: The push towards larger and larger computational platforms has made it possible for climate simulations to resolve climate dynamics across multiple spatial and temporal scales. This direction in climate simulation has created a strong need to develop scalable time stepping methods capable of accelerating throughput on high performance computing. This work details the recent advances in the implementation of implicit time stepping of the spectral element dynamical core within the United States Department of Energy (DOE) Accelerated Climate Model for Energy (ACME) on graphical processing units (GPU) based machines. We demonstrate how solvers in the Trilinos project are interfaced with ACME and GPU kernels to increase computational speed of the residual calculations in the implicit time stepping method for the atmosphere dynamics. We show the optimization gains and data structure reorganization that facilitates the performance improvements.	Rick Archibald, Katherine Evans, Andrew Salinger
543	A Time-Split Discontinuous Galerkin Transport Scheme for Global Atmospheric Model [abstract] Abstract: A time-split transport scheme has been developed for the high-order multiscale atmospheric model (HOMAM). The spacial discretization of HOMAM is based on the discontinuous Galerkin method, combining the 2D horizontal elements on the cubed-sphere surface and 1D vertical elements in a terrain-following height-based coordinate. The accuracy of the time-splitting scheme is tested with a set of new benchmark 3D advection problems. The split time-integrators are based on the Strang-type operator-split method. The convergence of standard error norms shows a second-order accuracy with the smooth scalar field, irrespective of a particular time-integrator. The results with the split scheme is comparable with that of the established models.	Ram Nair, Lei Bao, Michael Toy

Numerical and computational developments to advance multi-scale Earth System Models (MSESM) Session 3

Time and Date: 16:20 - 18:00 on 2nd June 2015

Room: M208

Chair: K.J. Evans

321	Analysis of ocean-atmosphere coupling algorithms : consistency and stability [abstract] Abstract: This paper is focused on the numerical and computational issues associated to ocean-atmosphere coupling. It is shown that usual coupling methods do not provide the solution to the correct problem, but to an approaching one since they are equivalent to performing one single iteration of an iterative coupling method. The stability analysis of these ad-hoc methods is presented, and we motivate and propose the adaptation of a Schwarz domain decomposition method to ocean-atmosphere coupling to obtain a stable and consistent coupling method.	Florian Lemarie, Eric Blayo, Laurent Debreu
658	Exploring the Effects of a High-Order Vertical Coordinate in a Non-Hydrostatic Global Model [abstract] Abstract: As atmospheric models are pushed towards non-hydrostatic resolutions, there is a growing need for new numerical discretizations that are accurate, robust and effective at these scales. In this paper we describe a new arbitrary-order staggered nodal finite-element method (SNFEM) vertical discretization motivated by the flux reconstruction formulation. The SNFEM formulation generalizes traditional second-order vertical discretizations, including Lorenz and Charney-Phillips discretizations, to arbitrary order-of-accuracy while preserving desirable properties such as energy conservation. Preliminary results from application of this method to an idealized baroclinic instability are given, demonstrating the effect of improvements in order of accuracy on the structure of the instability.	Paul Ullrich, Jorge Guerra
494	High-Order / Low-Order Methods for Ocean Modeling [abstract] Abstract: We examine a High Order / Low Order (HOLO) approach for a z-level ocean model and show that the traditional semi-implicit and split-explicit methods, as well as a recent preconditioning strategy, can easily be cast in the framework of HOLO methods. The HOLO formulation admits an implicit-explicit method that is algorithmically scalable and second-order accurate, allowing timesteps much larger than the barotropic time scale. We show how HOLO approaches, in particular the implicit-explicit method, can provide a solid route for ocean simulation to heterogeneous computing and exascale environments.	Chris Newman, Geoff Womeldorff, Luis Chacon, Dana Knoll
134	Aeras: A Next Generation Global Atmosphere Model [abstract] Abstract: Sandia National Laboratories is developing a new global atmosphere model named Aeras that is performance portable and supports the quantification of uncertainties. These next-generation capabilities are enabled by building Aeras on top of Albany, a code base that supports the rapid development of scientific application codes while leveraging Sandia's foundational mathematics and computer science packages in Trilinos and Dakota. Embedded uncertainty quantification is an original design capability of Albany, and performance portability is a recent upgrade for Albany. Other required features, such as shell-type elements, spectral elements, efficient explicit and semi-implicit time-stepping, transient sensitivity analysis, and concurrent ensembles, were not components of Albany as the project began, and have been (or are being) added by the Aeras team. We present early sensitivity analysis and performance portability results for the shallow water equations.	William Spotz, Thomas Smith, Irina Demeshko, Jeffrey Fike