Main Track (MT) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Kuranda

Chair: Young Choon Lee

152	An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster [abstract] Abstract: Map-Reduce programming model is commonly used for efficient scientific computations, as it executes tasks in parallel and distributed manner on large data volumes. The HPC infrastructure can effectively increase the parallelism of map-reduce tasks. However such an execution will incur high energy and data transmission costs. Here we empirically study how the energy efficiency of a map-reduce job varies with increase in parallelism and network bandwidth on a HPC cluster. We also investigate the effectiveness of power-aware systems in managing the energy consumption of different types of map-reduce jobs. We comprehend that for some jobs the energy efficiency degrades at high degree of parallelism, and for some it improves at low CPU frequency. Consequently we suggest strategies for configuring the degree of parallelism, network bandwidth and power management features in a HPC cluster for energy efficient execution of map-reduce jobs.	Nidhi Tiwari, Santonu Sarkar, Umesh Bellur, Maria Indrawan-Santiago
167	Optimal Run Length for Discrete-Event Distributed Cluster-Based Simulations [abstract] Abstract: In scientific simulations the results generated usually come from a stochastic process. New solutions with the aim of improving these simulations have been proposed, but the problem is how to compare these solutions since the results are not deterministic. Consequently how to guarantee that the output results are statistically trusted. In this work we apply a statistical approach in order to define the transient and steady state in discrete event distributed simulation. We used linear regression and batch method to find the optimal simulation size. As contributions of our work we can enumerate: we have applied and adapted the simple statistical approach in order to define the optimal simulation length; we propose the approximate approach to normal distribution instead of generate replications sufficiently large; and the method can be used in other kind of non-terminating science simulations where the data either have a normal distribution or can be approximated by a normal distribution.	Francisco Borges, Albert Gutierrez-Milla, Remo Suppi, Emilio Luque
173	A CUDA Based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization [abstract] Abstract: The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. It is an NP-hard problem and finding optimal solutions for MKP may be intractable. In this paper we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, we propose a parallel implementation under the GPGPU paradigm (General Purpose Graphics Processing Units) using CUDA. To use the algorithm presented in this paper, it is necessary to balance the number of ants, number of rounds used, and whether local search is used or not, depending on the quality of the solution desired. In other words, there is a compromise between time and quality of solution. We obtained very promising experimental results and we compared our implementation with those in the literature. The results obtained show that ant colony optimization is a viable approach to solve MKP efficiently, even for large instances, with the parallel approach.	Henrique Fingler, Edson Cáceres, Henrique Mongelli, Siang Song
174	Comparison of High Level FPGA Hardware Design for Solving Tri-Diagonal Linear Systems [abstract] Abstract: Reconfigurable computing devices can increase the performance of compute intensive algorithms by implementing application specific co-processor architectures. The power cost for this performance gain is often an order of magnitude less than that of modern CPUs and GPUs. Exploiting the potential of reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs) is typically a complex and tedious hardware engineering task. Re- cently the major FPGA vendors (Altera, and Xilinx) have released their own high-level design tools, which have great potential for rapid development of FPGA based custom accelerators. In this paper, we will evaluate Altera’s OpenCL Software Development Kit, and Xilinx’s Vivado High Level Sythesis tool. These tools will be compared for their per- formance, logic utilisation, and ease of development for the test case of a tri-diagonal linear system solver.	David Warne, Neil Kelson, Ross Hayward
181	Blood Flow Arterial Network Simulation with the Implicit Parallelism Library SkelGIS [abstract] Abstract: Implicit parallelism computing is an active research domain of computer science. Most implicit parallelism solutions to solve partial differential equations, and scientific simulations, are based on the specificity of numerical methods, where the user has to call specific functions which embed parallelism. This paper presents the implicit parallel library SkelGIS which allows the user to freely write its numerical method in a sequential programming style in C++. This library relies on four concepts which are applied, in this paper, to the specific case of network simulations. SkelGIS is evaluated on a blood flow simulation in arterial networks. Benchmarks are first performed to compare the performance and the coding difficulty of two implementations of the simulation, one using SkelGIS, and one using OpenMP. Finally, the scalability of the SkelGIS implementation, on a cluster, is studied up to 1024 cores.	Hélène Coullon, Jose-Maria Fullana, Pierre-Yves Lagrée, Sébastien Limet, Xiaofei Wang