Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY) Session 1

Time and Date: 10:35 - 12:15 on 1st June 2015

Room: M208

Chair: Stephane Louise

743	Alchemy Workshop Keynote: Programming heterogeneous, manycore machines: a runtime system's perspective [abstract] Abstract: Heterogeneous manycore parallel machines, mixing multicore CPUs with manycore accelerators provide an unprecedented amount of processing power per node. Dealing with such a large number of heterogeneous processing units -- providing a highly unbalanced computing power -- is one of the biggest challenge that developpers of HPC applications have to face. To Fully tap into the potential of these heterogeneous machines, pure offloading approaches, that consist in running an application on host cores while offloading part of the code on accelerators, are not sufficient. In this talk, I will go through the major software techniques that were specifically designed to harness heterogeneous architectures, focusing on runtime systems. I will discuss some of the most critical issues programmers have to consider to achieve portability of performance, and how programming languages may evolve to meet such as goal. Eventually, I will give some insights about the main challenges designers of programming environments will have to face in upcoming years.	Raymond Namyst
433	On the Use of a Many-core Processor for Computational Fluid Dynamics Simulations [abstract] Abstract: The increased availability of modern embedded many-core architectures supporting floating-point operations in hardware makes them interesting targets in traditional high performance computing areas as well. In this paper, the Lattice Boltzmann Method (LBM) from the domain of Computational Fluid Dynamics (CFD) is evaluated on Adapteva’s Epiphany many-core architecture. Although the LBM implementation shows very good scalability and high floating-point efficiency in the lattice computations, current Epiphany hardware does not provide adequate amounts of either local memory or external memory bandwidth to provide a good foundation for simulation of the large problems commonly encountered in real CFD applications.	Sebastian Raase, Tomas Nordström
263	A short overview of executing Γ Chemical Reactions over the ΣC and τC Dataflow Programming Models [abstract] Abstract: Many-core processors offer top computational power while keeping the energy consumption reasonable compared to complex processors. Today, they enter both high-performance computing systems, as well as embedded systems. However, these processors require dedicated programming models to efficiently benefit from their massively parallel architectures. The chemical programming paradigm has been introduced in the late eighties as an elegant way of formally describing distributed programs. Data are seen as molecules that can freely react thanks to operators to create new data. This paradigm has also been used within the context of grid computing and now seems to be relevant for many-core processors. Very few implementations of runtimes for chemical programming have been proposed, none of them giving serious elements on how it can be deployed onto a real architecture. In this paper, we propose to implement some parts of the chemical paradigm over the ΣC dataflow programming language, that is dedicated to many-core processors. We show how to represent molecules using agents and communication links, and to iteratively build the dataflow graph following the chemical reactions. A preliminary implementation of the chemical reaction mechanisms is provided using the τC dataflow compilation toolchain, a language close to ΣC, in order to demonstrate the relevance of the proposition.	Loïc Cudennec, Thierry Goubier
435	Threaded MPI Programming Model for the Epiphany RISC Array Processor [abstract] Abstract: The Adapteva Epiphany RISC array processor offers high computational energy-efficiency and parallel scalability. However, extracting performance with a standard parallel programming model remains a great challenge. We present an effective programming model for the low-power Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a networked parallel distributed cluster. Furthermore, our approach enables codes written with MPI to execute on the RISC array processor with little modification. We present experimental results for the threaded MPI implementation of matrix-matrix multiplication and highlight the importance of fast inter-core data transfers. Our high-level programming methodology achieved an on-chip performance of 9.1 GFLOPS.	David Richie, James Ross, Song Park and Dale Shires