ICCS 2018 Main Track (MT) Session 10

Time and Date: 15:25 - 17:05 on 12th June 2018

Room: M2

Chair: Pablo Enfedaque

216	Elastic CPU Cap Mechanism for Timely Dataflow Applications [abstract] Abstract: Sudden surges in the incoming workload can cause adverse consequences on the run-time performance of data-flow applications. Our work addresses the problem of limiting CPU associated with the elastic scaling of timely data-flow (TDF) applications running in a shared computing environment while each application can possess a different quality of service (QoS) requirement. The key argument here is that an unwise consolidation decision to dynamically scale up/out the computing re- sources for responding to unexpected workload changes can degrade the performance of some (if not all) collocated applications due to their fierce competition getting the shared resources (such as the last level cache). The proposed solution uses a queue-based model to predict the performance degradation of running data-flow applications together. The problem of CPU cap adjustment is addressed as an optimization problem, where the aim is to reduce the quality of service violation incidents among applications while raising the CPU utilization level of server nodes as well as preventing the formation of bottlenecks due to the fierce competition among collocated applications. The controller uses and efficient dynamic method to find a solution at each round of the controlling epoch. The performance evaluation is carried out by comparing the proposed controller against an enhanced QoS-aware version of round robin strategy which is deployed in many commercial packages. Experimental results confirmed that the proposed solution improves QoS satisfaction by near to 148% on average while it can reduce the latency of processing data records for applications in the highest QoS classes by near to 19% during workload surges.	M. Reza Hoseinyfarahabady, Nazanin Farhangsadr, Albert Zomaya, Zahir Tari and Samee Khan
351	Blockchain-based transaction integrity in distributed big data marketplace [abstract] Abstract: Today Big Data occupies crucial part as in scientific research areas as in large companies business analysis. Each company tries to find the best way how generated big data can be made valuable and profitable. However, in most cases, companies have not enough opportunities and budget to solve this complex problem. On the other hand, there are companies (i.e., in insurance, banking) who can significantly improve their business organization by applying hidden knowledge extracted from such big data. This situation leads to the necessity of building a platform for the exchange, processing, and sale of collected big data. In this paper, we propose a distributed big data platform that implements digital data market, based on the blockchain mechanism for data transaction integrity	Denis Nasonov, Alexander Visheratin and Alexander Boukhanovsky
363	Workload Characterization and Evolutionary Analyses of Tianhe-1A Supercomputer [abstract] Abstract: Currently, supercomputer systems face a variety of application challenges, includ-ing high-throughput, data-intensive, and stream-processing applications. At the same time, there is more challenge to improve user satisfaction at the supercom-puters such as Tianhe-1A, Tianhe-2 and TaihuLight, because of the commercial service model. It is important to understand HPC workloads and their evolution to facilitate informed future research and improve user satisfaction. In this paper, we present a methodology to characterize workloads on the commercial supercomputer (users need to pay), at a particular period and its evo-lution over time. We apply this method to the workloads of Tianhe-1A at the Na-tional Supercomputer Center in Tianjin. This paper presents the concept of quota-constrained waiting time for the first time, which has significance for optimizing scheduling and enhancing user satisfaction on the commercial supercomputer.	Jinghua Feng, Guangming Liu, Jian Zhang, Zhiwei Zhang, Jie Yu and Zhaoning Zhang
378	The Design of Fast and Energy-Efficient Linear Solvers: On The potential Of Half Precision Arithmetic And Iterative Refinement Techniques [abstract] Abstract: As parallel computers approach the exascale, power efficiency in High-performance computing (HPC) systems is of increasing concern. Exploiting both, the hardware features, and algorithms is an effective solution to achieve power efficiency, and address the energy constraints in modern and future HPC systems. In this work, we present a novel design and implementation of an energy efficient solution for dense linear systems of equations, which are at the heart of large-scale HPC applications. Energy efficient linear system solvers are based on two main components: (1) iterative refinement techniques, and (2) reduced precision computing features in the modern accelerators and co-processors. While most of the energy efficiency approaches aim to reduce the consumption with a minimal performance penalty, our method improves both, the performance and the energy-efficiency. Compared to highly optimised linear system solvers, our kernels are up to 2X faster to deliver the same accuracy solution, and reduce the energy consumption up to half on Intel KNL architectures. By using efficiently the tensor cores available in the NVIDIA V100 PCIe GPUs, the speedups can be up to 4X with more than 80\% reduction on the energy consumption.	Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov and Jack Dongarra
386	Design of Parallel BEM Analyses Framework for SIMD Processors [abstract] Abstract: A software framework titled BEM-BB has been developed to conduct parallel boundary element method (BEM) analyses. By Imple- menting a fundamental solution or a Green’s function that is the most important element of the BEM, and it depends on the targeted physical phenomenon, the users get the benefit of MPI and OpenMP hybrid par- allelization with H-matrix approximation provided by the framework. However, the framework does not take into account the single instruc- tion multiple data SIMD vectorization, which is important for high- performance computing and is supported by majority of the existing processors. Dealing with SIMD vectorization of a user-defined function is difficult because SIMD exploits instruction-level parallelization and is closely associated with the user-defined function. This study describes the conceptual framework for enhancing the SIMD vectorization. The new framework was evaluated using the two BEM problems of static electric field analysis with a perfect conductor and dielectric on an Intel Broadwell processor and an Intel Xeon Phi KNL. We observed that the framework provides good vectorization with limited SIMD knowledge. The numerical results illustrate the improved performance of the frame- work. In particular, perfect conductor analyses using H-matrix achieved performance improvements of 2.22x and 4.33x as compared with that achieved using the original BEM-BB framework for Broadwell processor and KNL, respectively.	Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa and Kengo Nakajima