Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY) Session 3

Time and Date: 16:40 - 18:20 on 1st June 2015

Room: M208

Chair: Stephane Louise

528 Towards an automatic co-generator for manycores’ architecture and runtime: STHORM case-study [abstract]
Abstract: The increasing design complexity of manycore architectures at the hardware and software levels imposes to have powerful tools capable of validating every functional and non-functional property of the architecture. At the design phase, the chip architect needs to explore several parameters from the design space, and iterate on different instances of the architecture, in order to meet the defined requirements. Each new architectural instance requires the configuration and the generation of a new hardware model/simulator, its runtime, and the applications that will run on the platform, which is a very long and error-prone task. In this context, the IP-XACT standard has become widely used in the semiconductor industry to package IPs and provide low level SW stack to ease their integration. In this work, we present a primer work on a methodology to automatically configuring and assembling an IP-XACT golden model and generating the corresponding manycore architecture HW model, low-level software runtime and applications. We use the STHORM manycore architecture and the HBDC application as a case study.
Charly Bechara, Karim Ben Chehida, Farhat Thabet
249 Retargeting of the Open Community Runtime to Intel Xeon Phi [abstract]
Abstract: The Open Community Runtime (OCR) is a recent effort in the search for a runtime for extreme scale parallel systems. OCR relies on the concept of a dynamically generated task graph to express the parallelism of a program. Rather than being directly used for application development, the main purpose of OCR is to become a low-level runtime for higher-level programming models and tools. Since manycore architectures like the Intel Xeon Phi are likely to play a major role in future high performance systems, we have implemented the OCR API for shared-memory machines, including the Xeon Phi. We have also implemented two benchmark applications and performed experiments to investigate the viability of the OCR as a runtime for manycores. Our experiments and a comparison with OpenMP indicate that OCR can be an efficient runtime system for current and emerging manycore systems.
Jiri Dokulil, Siegfried Benkner
14 Prefetching Challenges in Distributed Memories for CMPs [abstract]
Abstract: Prefetch engines working on distributed memory systems behave independently by analyzing the memory accesses that are addressed to the attached piece of cache. They potentially generate prefetching requests targeted at any other tile on the system that depends on the computed address. This distributed behavior involves several challenges that are not present when the cache is unified. In this paper, we identify, analyze, and quantify the effects of these challenges, thus paving the way to future research on how to implement prefetching mechanisms at all levels of this kind of system with shared distributed caches.
Marti Torrents, Raul MartĂ­nez, Carlos Molina