Main Track (MT) Session 5

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Kuranda

Chair: M. Wagner

288	OS Support for Load Scheduling in Accelerator-based Heterogeneous Systems [abstract] Abstract: The involvement of accelerators is becoming widespread in the field of heterogeneous processing, performing computation tasks through a wide range of applications. With the advent of the various computing architectures existing currently, the need for a system-wide multitasking environment is increasing. Therefore, we present an OpenCL-based scheduler that is designed as a multi-user computing environment to make use of the full potential of available resources while running as a daemon. Multiple tasks can be issued by means of a C++ API that relies on the OpenCL C++! wrapper. At this point, the daemon takes over the control immediately and performs load scheduling. Due to its implementation, our approach can be easily applicable to a common OS. We validate our method through extensive experiments deploying a set of applications, which show that the low scheduling costs remain constant in total over a wide range of input size. Besides the different CPUs, a variety of modern GPU and other accelerator architectures are used in the experiments.	Ayman Tarakji, Niels Ole Salscheider, David Hebbeker
369	Efficient Global Element Indexing for Parallel Adaptive Flow Solvers [abstract] Abstract: Many grid-based solvers for partial differential equations (PDE) assemble matrices explicitely for discretizing the underlying PDE operators and/or for the underlying (non-)linear systems of equations. Often, the data structures or solver packages require a consecutive global numbering of the degrees of freedom across the boundaries of different parallel subdomains. Straightforward approaches to realize this global indexing in parallel frequently result in serial parts of the assembling algorithms which causes a considerable bottleneck, in particular in large-scale applications. We present an efficient way to set up such a global indexing numbering scheme for large configurations via a position-based numeration on all parallel processes locally. The global number of shared nodes is determined via a tree-based communication pattern. We verified our implementation via state-of-the-art benchmark scenarios for incompressible flow simulations. A small performance study shows the parallel capability of our approach. The corresponding results can be generalized to other grid-based solvers that demand for global indexing in the context of large-scale parallelization.	Michael Lieb, Tobias Neckel, Hans-Joachim Bungartz, Thomas Schöps
382	Performance Improvements for a Large-Scale Geological Simulation [abstract] Abstract: Geological models have been successfully used to identify and study geothermal energy resources. Many computer simulations based on these models are data-intensive applications. Large-scale geological simulations require high performance computing (HPC) techniques to run within reasonable time constraints and performance levels. One research area that can benefit greatly from HPC techniques is the modeling of heat flow beneath the Earth’s surface. This paper describes the application of HPC techniques to increase the scale of research with a well-established geological model. Recently, a serial C++ application based on this geological model was ported to a parallel HPC applications using MPI. An area of focus was to increase the performance of the MPI version to enable state or regional scale simulations using large numbers of processors. First, synchronous communications among MPI processes was replaced by overlapping communication and computation (asynchronous communication). Asynchronous communication improved performance over synchronous communications by averages of 28% using 56 cores in one environment and 46% using 56 cores in another. Second, an approach for load balancing involving repartitioning the data at the start of the program resulted in runtime performance improvements of 32% using 48 cores in the first environment and 14% using 24 cores in the second when compared to the asynchronous version. An additional feature, modeling of erosion, was also added to the MPI code base. The performance improvement techniques under erosion were less effective.	David Apostal, Kyle Foerster, Travis Desell, Will Gosnold
168	Lattice Gas Model for Budding Yeast: A New Approach for Density Effects [abstract] Abstract: Yeasts in culture media grow exponentially in early period but eventually stop growing. The saturation of population growth is due to “density effect”. The budding yeast, Saccharomyces cerevisiae, is known to exhibit an age-dependent cell division. Daughter cell, which gives no birth, has longer generation time than mother, because daughter needs maturing period. So far, investigations in exponential growth period have been intensively accumulated; very little is known for the stage dependence of density effect. Here we present an "in vivo" study of density effect, applying a lattice gas model to explore the age-structure dynamics. It is, however hard to solve basic equations, because they have an infinite number of variables and parameters. The basic equations are constructed from several simplified models which have few variables and parameters. These simplified models are compared with experimental data to report two findings for stage-dependent density effect: 1) paradox of decline birthrate (PDB), and 2) mass suicide. These events suddenly and temporarily occur at early stage of density effect. The mother-daughter model leads to PDB. Namely, when the birthrate of population is decreased, then the fraction of daughter is abruptly increased. Moreover, find the average age of yeast population suddenly decreases at the inflection point. This means the mass apoptosis of aged mothers. Our results imply the existence of several types of "pheromones" that specifically inhibit the population growth.	Kei-Ichi Tainaka, Takashi Ushimaru, Toshiyuki Hagiwara, Jin Yoshimura
185	Characteristics of displacement data due to time scale for the combination of Brownian motion with intermittent adsorption [abstract] Abstract: Single-molecule tracking data near solid surfaces contains information on diffusion that is potentially affected by adsorption. However, molecular adsorption can occur in an intermittent manner, and the overall phenomenon is regarded as slower yet normal diffusion if the time scale of each adsorption event is sufficiently shorter than the interval of data acquisition. We compare simple numerical model systems that vary in the time scale of adsorption event while sharing the same diffusion coefficient, and show that the shape of the displacement distribution depends on the time resolution. We also evaluate the characteristics by statistical quantities related to the large deviation principle.	Itsuo Hanasaki, Satoshi Uehara, Satoyuki Kawano