Tools for Program Development and Analysis in Computational Science (TOOLS) Session 2

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Macaw

Chair: Jie Tao

447 Online MPI Trace Compression using Event Flow Graphs and Wavelets [abstract]
Abstract: Performance analysis of scientific parallel applications is essential to use High Performance Computing (HPC) infrastructures efficiently. Nevertheless, collecting detailed data of large-scale parallel programs and long-running applications is infeasible due to the huge amount of performance information generated. Even though there are no technological constraints in storing Terabytes of performance data, the constant flushing of such data to disk introduces a massive overhead into the application that makes the performance measurements worthless. This paper explores the use of Event flow graphs together with wavelet analysis and EZW-encoding to provide MPI event traces that are orders of magnitude smaller while preserving accurate information on timestamped events. Our mechanism compresses the performance data online while the application runs, thus, reducing the pressure put on the I/O system due to buffer flushing. As a result, we achieve lower application perturbation, reduced performance data output, and the possibility to monitor longer application runs.
Xavier Aguilar, Karl Fuerlinger, Erwin Laure
194 WOWMON: A Machine Learning-based Profiler for Self-adaptive Instrumentation of Scientific Workflows [abstract]
Abstract: Performance debugging using program profiling and tracing for scientific workflows can be extremely difficult for two reasons. 1) Existing performance tools lack the ability to automatically produce global performance data based on local information from coupled scientific applications, particularly at runtime. 2) Profiling/tracing with static instrumentation may incur high overhead and significantly slow down science-critical tasks. To gain more insights on work- flows we introduce a lightweight workflow monitoring infrastructure, WOWMON (WOrkfloW MONitor), which enables user’s access not only to cross-application performance data such as end-to-end latency and execution time of individual workflow components at runtime, but also to customized performance events. To reduce profiling overhead, WOWMON uses adaptive selection of performance metrics based on machine learning algorithms to guide profilers collecting only metrics that have most impact on performance of workflows. Through the study of real scientific workflows (e.g., LAMMPS) with the help of WOWMON, we found that performance of workflows can be significantly affected by both software and hardware factors, such as policy of process mapping and hardware configurations of clusters. Moreover, we experimentally show that WOWMON can reduce data movement for profiling by up to 54% without missing key metrics for performance debugging.
Xuechen Zhang, Hasan Abbasi, Kevin Huck, Allen Malony
334 A DSL based toolchain for design space exploration in structured parallel programming [abstract]
Abstract: We introduce a DSL based toolchain supporting the design of parallel applications where parallelism is structured after parallel design pattern compositions. A DSL provides the possibility to write high level parallel design pattern expressions representing the structure of parallel applications, to refactor the pattern expressions, to evaluate their non-functional properties (e.g. ideal performance, total parallelism degree, etc.) and finally to generate parallel code ready to be compiled and run on different target architectures. We discuss a proof-of-concept prototype implementation of the proposed toolchain generating FastFlow code and show some preliminary results achieved using the prototype implementation.
Marco Danelutto, Massimo Torquati, Peter Kilpatrick