ICCS 2019 Main Track (MT) Session 3

Time and Date: 16:50 - 18:30 on 12th June 2019

Room: 1.5

Chair: Youssef Nashed

12 Forecasting Model for Network Throughput of Remote Data Access in Computing Grids [abstract]
Abstract: Computing grids are one of the key enablers of computational science. Researchers from many fields (High Energy Physics, Bioinformatics, Climatology, etc.) employ grids for execution of distributed computational jobs. Such computing workloads are typically data-intensive. The current state of the art approach for data access in grids is data placement: a job is scheduled to run at a specific data center, and its execution commences only when the complete input data has been transferred there. An alternative approach is remote data access: a job may stream the input data directly from storage elements. Remote data access brings two innovative benefits: (1) the jobs can be executed asynchronously with respect to the data transfer; (2) when combined with data placement on the policy level, it may help to optimize the network load grid-wide, since these two data access methodologies partially exhibit nonoverlapping bottlenecks. However, in order to employ such a technique systematically, the properties of its network throughput need to be studied carefully. This paper presents results of experimental identification of parameters influencing the throughput of remote data access, a statistically tested formalization of these parameters and a derived throughput forecasting model. The model is applicable to large computing workloads, robust with respect to arbitrary dynamic changes in the grid infrastructure and exhibits a long-term forecasting horizon. Its purpose is to assist various stakeholders of the grid in decision-making related to data access patterns. This work is based on measurements taken on the Worldwide LHC Computing Grid at CERN.
Volodimir Begy, Martin Barisits, Mario Lassnig and Erich Schikuta
408 Collaborative Simulation Development Accelerated by Cloud Based Computing and Software as a Service Model [abstract]
Abstract: Simulations are increasingly used in pharmaceutical development to deliver medicines to patients more quickly; more efficiently; and with better designs, safety, and effect. These simulations need high performance computing resources as well as a variety of software to model the processes and effects on the pharmaceutical product at various scales of scrutiny: from the atomic scale to the entire production process. The demand curve for these resources has many peaks and can shift in a time scale much faster than a typical procurement process. Both on-demand cloud based computing capability and software as a service models have been growing in use. This presentation describes the efforts of the Enabling Technology Consortium to apply these information technology models to pharmaceutical simulations which have special needs of documentation and security. It is expected that the environment will have more benefits as the cloud can be configured for collaborative work among companies in the non-competitive space and all the work can be made available for use by contract service vendors or health authorities. The expected benefits of this computing environment include economies of scale for both the providers and the consumer, increased resources and for consumers by the information available to accelerate and improve delivery of pharmaceutical products.
Howard Stamato
487 Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows [abstract]
Abstract: While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of application-level models that have been developed and used in the context of scientific workflow executions. To this end, we profile two production scientific workflows on a distributed platform instrumented with power meters. We then conduct an analysis of power and energy consumption measurements. This analysis shows that power consumption is not linearly related to CPU utilization and that I/O operations significantly impact power, and thus energy, consumption. We then propose a power consumption model that accounts for I/O operations, including the impact of waiting for these op- erations to complete, and for concurrent task executions on multi-socket, multi-core compute nodes. We implement our proposed model as part of a simulator that allows us to draw direct comparisons between real-world and modeled power and energy consumption. We find that our model has high accuracy when compared to real-world executions. Furthermore, our model improves accuracy by about two orders of magnitude when compared to the traditional models used in the energy-efficient workflow scheduling literature.
Rafael Ferreira Da Silva, Anne-Cécile Orgerie, Henri Casanova, Ryan Tanaka, Ewa Deelman and Frédéric Suter
62 Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications [abstract]
Abstract: Online analysis of runtime behavior is essential for performance tuning in streaming scientific workflows. Integration of anomaly detection and visualization is necessary to support human-centered analysis, such as verification of candidate anomalies utilizing domain knowledge. In this work, we propose an efficient and scalable visual analytics system for online performance analysis of scientific workflows toward the exascale scenario. Our approach uses a call stack tree representation to encode the structural and temporal information of the function executions. Based on the call stack tree features (e.g., execution time of the root function or vector representation of the tree structure), we employ online anomaly detection approaches to identify candidate anomalous function executions. We also present a set of visualization tools for verification and exploration in a level-of-detailed manner. General information, such as distribution of execution times, are provided in an overview visualization. The detailed structure (e.g., function invocation relations) and the temporal information (e.g., message communication) of the execution call stack of interest are also visualized. The usability and efficiency of our methods are verified in the NWChem use case.
Cong Xie, Wonyong Jeong, Gyorgy Matyasfalvi, Hubertus Van Dam, Klaus Mueller, Shinjae Yoo and Wei Xu