Tools for Program Development and Analysis in Computational Science (TOOLS) Session 1

Time and Date: 10:15 - 11:55 on 2nd June 2015

Room: M209

Chair: Jie Tao

602 Cube v.4 : From Performance Report Explorer to Performance Analysis Tool [abstract]
Abstract: Cube v.3 has been a powerful tool to examine Scalasca performance reports, but was basically unable to perform analyses on its own. With Cube v.4, we addressed several shortcomings of Cube v.3. We generalized the Cube data model, extended the list of supported data types, and allow operations with nontrivial algebras, e.g. for performance models or statistical data. Additionally, we introduced two major new features that greatly enhance the performance analysis features of Cube: Derived metrics and GUI plugins. Derived metrics can be used to create and manipulate metrics directly within the GUI, using a powerful domain-specific language called CubePL. Cube GUI plugins allow the development of novel performance analysis techniques based on Cube data without changing the source code of the Cube GUI.
Michael Knobloch, Bernd Mohr, Anke Visser, Pavel Saviankou
51 Visual MPI Performance Analysis using Event Flow Graphs [abstract]
Abstract: Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while preserving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application. In this paper we explore the usage of event flow graphs in the context of visual performance analysis. We show that graphs can be used to quickly spot performance problems, helping to better understand the behavior of an application. We demonstrate our performance analysis approach with MiniFE, a mini-application that mimics the key performance aspects of finite-element applications in High Performance Computing (HPC).
Xavier Aguilar, Karl Fürlinger, Erwin Laure
75 Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler [abstract]
Abstract: Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understanding, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage. Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec benchmarks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos.
Tomislav Janjusic, Christos Kartsaklis
326 Graphical high level analysis of communication in distributed virtual reality applications [abstract]
Abstract: Analysing distributed virtual reality applications communicating through message-passing is challenging. Their development is complex, and knowing if something is wrong depends on the states of each process, defects (bugs) cause software crashes, hangs, and generation of incorrect results. To address this daunting problem we specify functional behavior models (for example, using synchronization barriers and shared variables) for these applications that ensures correctness. We also developed the GTracer tool, which compares the functional behavior models developed with the messages transmitted among processes. GTracer checks for violations of these models automatically and displays the message traffic graphically. It is a tool made for libGlass, a message library for distributed computing. We have been able to find several non-trivial defects during the tests of this tool.
Marcelo Guimarães, Bruno Gnecco, Diego Dias, José Brega, Luis Trevelin

Tools for Program Development and Analysis in Computational Science (TOOLS) Session 2

Time and Date: 14:10 - 15:50 on 2nd June 2015

Room: M209

Chair: Jie Tao

368 Providing Parallel Debugging for DASH Distributed Data Structures with GDB [abstract]
Abstract: The C++ DASH template library provides distributed data container for Partitioned Global Address Space (PGAS)-like programming. Because DASH is new and under development no debugger is capable to handle the parallel processes or access/modify container elements in a convenient way. This paper describes how the DASH library has to be extended to interrupt the start-up process to connect a debugger with all started processes and to enable the debugger for accessing and modifying DASH container elements. Furthermore, an GDB extension to output well formatted DASH container information is presented.
Denis Hünich, Andreas Knüpfer, José Gracia
156 Sequential Performance: Raising Awareness of the Gory Details [abstract]
Abstract: The advent of multicore and manycore processors, including GPUs, in the customer market encouraged developers to focus on extraction of parallelism. While it is true that parallelism can deliver performance boosts, parallelization is also very complex and error-prone task. Many applications are still sequential, or dominated by sequential sections. Modern micro-architectures have become extremely complex, and they usually do a very good job at executing fast a given sequence of instructions. When they occasionally fail, however, the penalty may be severe. Pathological behaviors often have their roots in very low-level implementation details of the micro-architecture, hardly available to the programmer. We argue that the impact of these low-level features on performance has been overlooked, often relegated to experts. We show that a few metrics can be easily defined to help assess the overall performance of an applications, and quickly diagnose a problem. Finally we illustrate our claim with a simple prototype, along with several use cases.
Erven Rohou, David Guyon
544 Evolving Fortran types with inferred units-of-measure [abstract]
Abstract: Dimensional analysis is a well known technique for checking the consistency of equations involving physical quantities, constituting a kind of type system. Various type systems for dimensional analysis, and its refinement to units-of-measure, have been proposed. In this paper, we detail the design and implementation of a units-of-measure system for Fortran, implemented as a pre-processor. Our system is designed to aid adding units to existing code base: units may be polymorphic and can be inferred. Furthermore, we introduce a technique for reporting to the user a set of critical variables}which should be explicitly annotated with units to get the maximum amount of unit information with the minimal number of explicit declarations. This aids adoption of our type system to existing code bases, of which there are many in computational science projects.
Dominic Orchard, Andrew Rice and Oleg Oshmyan