Session2 16:30 - 18:10 on 10th June 2014

Main Track (MT) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Kuranda

Chair: Young Choon Lee

152	An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster [abstract] Abstract: Map-Reduce programming model is commonly used for efficient scientific computations, as it executes tasks in parallel and distributed manner on large data volumes. The HPC infrastructure can effectively increase the parallelism of map-reduce tasks. However such an execution will incur high energy and data transmission costs. Here we empirically study how the energy efficiency of a map-reduce job varies with increase in parallelism and network bandwidth on a HPC cluster. We also investigate the effectiveness of power-aware systems in managing the energy consumption of different types of map-reduce jobs. We comprehend that for some jobs the energy efficiency degrades at high degree of parallelism, and for some it improves at low CPU frequency. Consequently we suggest strategies for configuring the degree of parallelism, network bandwidth and power management features in a HPC cluster for energy efficient execution of map-reduce jobs.	Nidhi Tiwari, Santonu Sarkar, Umesh Bellur, Maria Indrawan-Santiago
167	Optimal Run Length for Discrete-Event Distributed Cluster-Based Simulations [abstract] Abstract: In scientific simulations the results generated usually come from a stochastic process. New solutions with the aim of improving these simulations have been proposed, but the problem is how to compare these solutions since the results are not deterministic. Consequently how to guarantee that the output results are statistically trusted. In this work we apply a statistical approach in order to define the transient and steady state in discrete event distributed simulation. We used linear regression and batch method to find the optimal simulation size. As contributions of our work we can enumerate: we have applied and adapted the simple statistical approach in order to define the optimal simulation length; we propose the approximate approach to normal distribution instead of generate replications sufficiently large; and the method can be used in other kind of non-terminating science simulations where the data either have a normal distribution or can be approximated by a normal distribution.	Francisco Borges, Albert Gutierrez-Milla, Remo Suppi, Emilio Luque
173	A CUDA Based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization [abstract] Abstract: The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. It is an NP-hard problem and finding optimal solutions for MKP may be intractable. In this paper we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, we propose a parallel implementation under the GPGPU paradigm (General Purpose Graphics Processing Units) using CUDA. To use the algorithm presented in this paper, it is necessary to balance the number of ants, number of rounds used, and whether local search is used or not, depending on the quality of the solution desired. In other words, there is a compromise between time and quality of solution. We obtained very promising experimental results and we compared our implementation with those in the literature. The results obtained show that ant colony optimization is a viable approach to solve MKP efficiently, even for large instances, with the parallel approach.	Henrique Fingler, Edson Cáceres, Henrique Mongelli, Siang Song
174	Comparison of High Level FPGA Hardware Design for Solving Tri-Diagonal Linear Systems [abstract] Abstract: Reconfigurable computing devices can increase the performance of compute intensive algorithms by implementing application specific co-processor architectures. The power cost for this performance gain is often an order of magnitude less than that of modern CPUs and GPUs. Exploiting the potential of reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs) is typically a complex and tedious hardware engineering task. Re- cently the major FPGA vendors (Altera, and Xilinx) have released their own high-level design tools, which have great potential for rapid development of FPGA based custom accelerators. In this paper, we will evaluate Altera’s OpenCL Software Development Kit, and Xilinx’s Vivado High Level Sythesis tool. These tools will be compared for their per- formance, logic utilisation, and ease of development for the test case of a tri-diagonal linear system solver.	David Warne, Neil Kelson, Ross Hayward
181	Blood Flow Arterial Network Simulation with the Implicit Parallelism Library SkelGIS [abstract] Abstract: Implicit parallelism computing is an active research domain of computer science. Most implicit parallelism solutions to solve partial differential equations, and scientific simulations, are based on the specificity of numerical methods, where the user has to call specific functions which embed parallelism. This paper presents the implicit parallel library SkelGIS which allows the user to freely write its numerical method in a sequential programming style in C++. This library relies on four concepts which are applied, in this paper, to the specific case of network simulations. SkelGIS is evaluated on a blood flow simulation in arterial networks. Benchmarks are first performed to compare the performance and the coding difficulty of two implementations of the simulation, one using SkelGIS, and one using OpenMP. Finally, the scalability of the SkelGIS implementation, on a cluster, is studied up to 1024 cores.	Hélène Coullon, Jose-Maria Fullana, Pierre-Yves Lagrée, Sébastien Limet, Xiaofei Wang

Main Track (MT) Session 9

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully I

Chair: S. Chuprina

301	Study of the Network Impact on Earthquake Early Warning in the Quake-Catcher Network Project [abstract] Abstract: The Quake-Catcher Network (QCN) project uses the low cost sensors, i.e., accelerometers attached to volunteers' computers, to detect earthquakes. The master-worker topology currently used in QCN and other similar projects suffers from major weaknesses. The centralized master can fail to collect data if the volunteers' computers are not connected to the network, or it can introduce significant delays in the warning if the network is congested. We propose to solve these problems by using multiple servers in a more advanced network topology than the simple master-worker configuration. We first consider several critical scenarios in which the current master-worker configuration of QCN can hinder the early warning of an earthquake, and then integrate the advanced network topology around multiple servers and emulate these critical scenarios in a simulation environment to quantify the benefits and costs of our proposed solution. We show how our solution can reduce the time to detect an earthquake from 1.8 s to 173 ms in case of network congestion and the number of lost trickle messages from 2,013 to 391 messages in case of network failure.	Marcos Portnoi, Samuel Schlachter, Michela Taufer
315	The p-index: Ranking Scientists using Network Dynamics [abstract] Abstract: The indices currently used by scholarly databases, such as Google scholar, to rank scientists, do not attach weights to the citations. Neither is the underlying network structure of citations considered in computing these metrics. This results in scientists cited by well-recognized journals not being rewarded, and may lead to potential misuse if documents are created purely to cite others. In this paper we introduce a new ranking metric, the p-index (pagerank-index), which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation. The index is a percentile score, and can potentially be implemented in public databases such as Google scholar, and can be applied at many levels of abstraction. We demonstrate that the metric aids in fairer ranking of scientists compared to h-index and its variants. We do this by simulating a realistic model of the evolution of citation and collaboration networks in a particular field, and comparing h-index and p-index of scientists under a number of scenarios. Our results show that the p-index is immune to author behaviors that can result in artificially bloated h-index values.	Upul Senanayake, Mahendrarajah Piraveenan, Albert Zomaya
191	A Clustering-based Link Prediction Method in Social Networks [abstract] Abstract: Link prediction is an important task in social network analysis, which also has applications in other domains like, recommender systems, molecular biology and criminal investigations. The classical methods of link prediction are based on graph topology structure and path features but few consider clustering information. The cluster in graphs is densely connected group of vertices and sparsely connected to other groups. Actually, the clustering results contain the essential information for link prediction, and these vertices common neighbors may play different roles depending on if they belong to the same cluster. Based on this assumption and characteristics of the common social networks, in this paper, we propose a link prediction method based on clustering and global information. Our experiments on both synthetic and real-world networks show that this method can improve link prediction accuracy as the number of cluster grows.	Fenhua Li, Jing He, Guangyan Huang, Yanchun Zhang, Yong Shi
345	A Technology for BigData Analysis Task Description using Domain-Specific Languages [abstract] Abstract: The article presents a technology for dynamic knowledge-based building of Domain-Specific Languages (DSL) for description of data-intensive scientific discovery tasks using BigData technology. The proposed technology supports high level abstract definition of analytic and simulation parts of the task as well as integration into the composite scientific solutions. Automatic translation of the abstract task definition enables seamless integration of various data sources within single solution.	Sergey Kovalchuk, Artem Zakharchuk, Jiaqi Liao, Sergey Ivanov, Alexander Boukhanovsky
66	Characteristics of Dynamical Phase Transitions for Noise Intensities [abstract] Abstract: We simulate and analyze dynamical phase transitions in a Boolean neural network with initial random connections. Since we treat a stochastic evolution by using a noise intensity, we show from our condition that there exists a critical value for the noise intensity. The nature of the phase transition are found numerically and analytically in two connections (of probability density function) and one random network.	Muyoung Heo, Jong-Kil Park, Kyungsik Kim

Dynamic Data Driven Application Systems (DDDAS) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully II

Chair: Frederica Darema

43	Towards a Dynamic Data Driven Wildfire Behavior Prediction System at European Level [abstract] Abstract: Southern European countries are severely affected by forest fires every year, which lead to very large environmental damages and great economic investments to recover affected areas. All affected countries invest lots of resources to minimize fire damages. Emerging technologies are used to help wildfire analysts determine fire behavior and spread aiming at a more efficient use of resources in fire fighting. In the case of trans-boundary fires, the European Forest Fire Information System (EFFIS) works as a complementary system to national and regional systems in the countries, providing information required for international collaboration on forest fire prevention and fighting. In this work, we describe a way of exploiting all the available information in the system to feed a dynamic data driven wildfire behavior prediction model that can deliver results to support operational decisions. The model is able to calibrate the unknown parameters based on the real observed data, such as wind condition and fuel moistures, using a steering loop. Since this process is computational intensive, we exploit multi-core platforms using a hybrid MPI-OpenMP programming paradigm.	Tomàs Artés, Andrés Cencerrado, Ana Cortes, Tomas Margalef, Darío Rodríguez, Thomas Petroliagkis, Jesus San Miguel
91	Fast Construction of Surrogates for UQ Central to DDDAS -- Application to Volcanic Ash Transport [abstract] Abstract: In this paper we present new ideas to greatly enhance the quality of uncertainty quantification in the DDDAS framework. We build on ongoing work in large scale transport of geophysical mass of volcanic origin -- a danger to both land based installations and airborne vehicles.	A. K. Patra, E. R. Stefanescu, R. M. Madankan, M. I Bursik, E. B. Pitman, P. Singla, T. Singh, P. Webley
306	A Dynamic Data-driven Decision Support for Aquaculture Farm Closure [abstract] Abstract: We present a dynamic data-driven decision support for aquaculture farm closure. In decision support, we use machine learning techniques in predicting closures of a shellfish farm. As environmental time series are used in closure, we propose two approaches using time series and machine learning for closure prediction. In one approach, we consider time series prediction and then using expert rules to predict closure. In other approach, we use time series classification for closure prediction. Both approaches exploit a dynamic data-driven technique where prediction models are updated with the update of new data to predict closure decisions. Experimental results at a case study shellfish farm validate the applicability of the proposed method in aquaculture decision support.	Md. Sumon Shahriar, John McCulloch
76	An Open Framework for Dynamic Big-Data-Driven Application Systems (DBDDAS) Development [abstract] Abstract: In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.	Craig C. Douglas

Agent Based Simulations, Adaptive Algorithms and Solvers (ABS-AA-S) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully III

Chair: Piotr Gurgul

180	Modeling phase-transitions using a high-performance, Isogeometric Analysis framework [abstract] Abstract: In this paper, we present a high-performance framework for solving partial differential equations using Isogeometric Analysis. It is called PetIGA, and in this work we show how it can be used to solve phase-field problems. We specifically chose the Cahn-Hilliard equation, and the phase-field crystal equation as study-problems. These two models allow us to highlight some of the main advantages that we have access to while using PetIGA for scientific computing.	Philippe Vignal, Lisandro Dalcin, Nathan Collier, Victor Calo
233	Micropolar Fluids using B-spline DivergenceConforming Spaces [abstract] Abstract: We discretized the two-dimensional linear momentum, microrotation, energy and mass conservation equations from the microrotational theory, with the finite element method, using B-spline basis to create divergence conforming spaces to obtain pointwise divergence free solutions [8]. Weak boundary conditions impositions was handled using Niche’s method for tangential conditions, while normal conditions were imposed strongly.We solved the heat driven cavity problem as a test case, including a variation of the parameters that differentiate micropolar fluids from conventional fluids under different Rayleigh numbers, for a better understanding of the system.	Adel Sarmiento, Daniel Garcia, Lisandro Dalcin, Nathan Collier, Victor Calo
24	Hypergraph grammar based adaptive linear computational cost projection solvers for two and three dimensional modeling of brain [abstract] Abstract: In this paper we present a hypergraph grammar model for transformations of two and three dimensional grids. The hypergraph grammar describes the proces for generating uniform grids with two or three dimensional rectangular or hexahedral elements, followed by the proces of h refinements, which involves breaking selected elements into four or eight son elements, in two or three dimensions, respectively. We also provide graph grammar productions for two projection algorithms we use to pre-process material data. The first one is the projection based interpolation solver algorithm used for computing H1 or L2 projections of MRI scan of human head, in two and three dimensions. The second one is utilized for solving the non-stationary problem modeling the three dimensional heat transport in the human head generated by the cellphone usage.	Damian Goik, Marcin Sieniek, Maciej Woźniak, Anna Paszyńska, Maciej Paszynski
160	Implementation of an adaptive BDF2 formula and comparison with the MATLAB ode15s [abstract] Abstract: After applying the Finite Element Method (FEM) to the diffusion-type and wave-type Partial Differential Equations (PDEs), a first order and a second order Ordinary Differential Equation (ODE) systems are obtained respectively. These ODE systems usually present high stiffness, so numerical methods with good stability properties are required in their resolution. MATLAB offers a set of open source adaptive step functions for solving ODEs. One of these functions is the ode15s recommended for stiff problems and which is based on the Backward Differentiation Formulae (BDF). We describe the error estimation and the step size control implemented in this function. The ode15s is a variable order algorithm, and even though it has an adaptive step size implementation, the advancing formula and the local error estimation that uses correspond to the constant step size formula. We have focused on the second order accurate and unconditionally stable BDF (BDF2) and we have implemented a real adaptive step size BDF2 algorithm using the same strategy as the BDF2 implemented in the ode15s, resulting the new algorithm more efficient than the one implemented in MATLAB.	Elisabete Alberdi Celaya, Juan José Anza Aguirrezabala, Panagiotis Chatzipantelidis
63	Fast graph transformation based direct solver algorithm for regular three dimensional grids [abstract] Abstract: This paper presents a graph-transformation-based multi-frontal direct solver with an optimization technique that allows for a significant decrease of time complexity in some multi-scale simulations of the Step and Flash Imprint Lithography (SFIL). The multi-scale simulation consists of a macro-scale linear elasticity model with thermal expansion coefficient and a nano-scale molecular statics model. The algorithm is exemplified with a photopolimerization simulation that involves densification of a polymer inside a feature followed by shrinkage of the feature after removal of the template. The solver is optimized thanks to a mechanism of reusing sub-domains with similar geometries and similar material properties. The graph transformation formalism is used to describe the algorithm - such an approach helps automatically localize sub-domains that can be reused.	Marcin Sieniek

Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY) Session 1

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Bluewater I

Chair: Stéphane Louise

348	τC: C with Process Network Extensions for Embedded Manycores [abstract] Abstract: Current and future embedded manycores targets bring complex and heterogeneous architectures with a large number of processing cores, making both parallel programming to this scale and understanding the architecture itself a daunting task. Process Networks and other dataflow based Models of Computation (MoC) are a good base to present a universal model of the underlying manycore architectures to the programmer. If a language displays a simple to grasp MoC in a consistent way across architectures, the programmer can concentrate the efforts on optimizing the expression of parallelism in the application instead of porting a given code on a given system. Such goal would provide the C-language equivalent for the manycores. In this paper, we present a process network extension to C called τ C and its mapping to both a POSIX target and the P2012/STHORM platform, and show how the language offers an architecture independent solution of this problem.	Thierry Goubier, Damien Couroussé, Selma Azaiez
96	Application-Level Performance Optimization: A Computer Vision Case Study on STHORM [abstract] Abstract: Computer vision applications constitute one of the key drivers for embedded many-core architectures. In order to exploit the full potential of such systems, a balance between computation and communication is critical, but many computer vision algorithms present a highly data-dependent behavior that complexify this task. To enable application performance optimization, the development environment must provide the developer with tools for fast and precise application-level performance analysis. We describe the process to port and optimize a face detection application onto the STHORM many-core accelerator using the STHORM OpenCL SDK. We identify the main factors that limit performance and discern the contributions arising from: the application itself, the OpenCL programming model, and the STHORM OpenCL SDK. Finally, we show how these issues can be addressed in the future to enable developers to further improve application performance.	Vítor Schwambach, Sébastien Cleyet-Merle, Alain Issard, Stéphane Mancini
387	Generating Code and Memory Buffers to Reorganize Data on Many-core Architectures [abstract] Abstract: The dataflow programming model has shown to be a relevant approach to efficiently run massively parallel applications over many-core architectures. In this model, some particular builtin agents are in charge of data reorganizations between user agents. Such agents can Split, Join and Duplicate data onto their communication ports. They are widely used in signal processing for example. These system agents, and their associated implementations, are of major importance when it comes to performances, because they can stand on the critical path (think about Amdhal's law). Furthermore, a particular data reorganization can be expressed by the developer in several ways, that may lead to inefficient solutions (mostly unneeded data copies and transfers). In this paper, we propose several strategies to manage data reorganization at compile time, with a focus on indexed accesses to shared buffers to avoid data copies. These strategies are complementary: they ensure correctness for each system agent configuration, as well as performance when possible. They have been implemented within the Sigma-C industry-grade compilation toolchain and evaluated over the Kalray MPPA 256-core processor.	Loïc Cudennec, Paul Dubrulle, François Galea, Thierry Goubier, Renaud Sirdey
359	Self-Timed Periodic Scheduling For Cyclo-Static DataFlow Model [abstract] Abstract: Real-time and Time constrained applications programmed on many-core systems can suffer from unmet timing constraints even with correct-by-construction schedules. Such unexpected results are usually caused by unaccounted for delays of resource sharing (\emph{e.g.} the communication medium). In this paper we address the three main sources of unpredictable behaviors: First, we propose to use a deterministic Model of Computation (MoC), more specifically, the well-formed CSDF subset of process networks; Second, we propose a run-time management strategy of shared resources to avoid unpredictable timings; Third, we promote the use of a new scheduling policy, the so-said Self-Timed Periodic (STP) scheduling, to improve performance and decrease synchronization costs by taking into account resource sharing or resource constraints. This is a quantitative improvement above state-of-the-art scheduling policies which assumed fixed delays of inter-processor communication and did not take correctly into account subtle effects of synchronization.	Dkhil Ep.Jemal Amira, Xuankhanh Do, Stephane Louise, Dubrulle Paul, Christine Rochange

Workshop on Computational Chemistry and Its Applications (CCA) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Bluewater II

Chair: Ponnadurai Ramasami

21	A Computational Study of 2-Selenobarbituric Acid: Conformational Analysis, Enthalpy of Formation, Acidity and Basicity [abstract] Abstract: A computational study of the compound containing selenium, 2-selenobarbituric acid, has been carried out. Tautomerism has been studied not only in neutral forms but also in the protonated and deprotonated species. The most stable tautomers for neutral and deprotonated species are equivalent to those obtained by different authors for the analogous barbituric and 2-thiobarbituric acids. However, the most stable tautomer for the protonated 2-selenobarbituric acid differs of that proposed for the analogous compounds. The enthalpy of formation in the gas phase, and the gas-phase acidity and basicity of 2-selenobarbituric acid have been calculated at the G3 and G4 levels, together with the corresponding values for barbituric and 2-thiobarbituric acids. The calculated acidity shows that 2-selenobarbituric acid is a very strong Brønsted acid in the gas phase.	Rafael Notario
139	Origin of the Extra Stability of Alloxan.A Computation Study [abstract] Abstract: Detailed DFT computations and quantum dynamics simulations have been carried out to establish the origin of the extra stability of alloxan.. The effect of solvent, basis set and DFT methods have been examined. Two non-covalent intermolecular dimers of alloxan, namely the H-bonded and the dipolar dimers have been investigated to establish their relative stability. Quantum chemical topology features and NBO analysis have been performed.	Saadullah Aziz, Rifaat Hilal, Basmah Allehyani, Shabaan Elroby
303	The Impact of p-orbital on Optmization of ReH7(PMe3)2 Compound [abstract] Abstract: This study investigates the importance of the p-function used in the computational modeling. The geometric changes of ReH7(PMe3)2 system is used as the model compound. 6-31G, 6-311G and 6-311++G basis sets were used for all elements except Re, which used Christiansen et. al. basis set. Upon removing the p-function on metal, we noticed the geometric changes are minimal as long as triple-zeta basis sets are used for rest of elements. While the relative energy profile of a reaction would still reasonably assemble each other, a direct comparison in energy between the basis set with and without p-function is not recommended	Nnenna Elechi, Daniel Tran, Mykala Taylor, Odaro Adu, Huajun Fan
60	Exploring the Conical Intersection Seam in Cytosine: A DFT and CASSCF Study [abstract] Abstract: The geometry, energetics and dipole moment of the most stable conformers of cytosine in the ground state were calculated at different density functional methods, namely, B3LYP, M06-2X, ωB97-D and PEBPEB methods and the 6-311++G(3df,3pd) basis set. The most stable conformer, the keto-amino conformer is only 1 Kcal/mol more stable than the imino-enol form. The ultrafast radiationless decay mechanism has been theoretically investigated using Complete Active Space Multiconfiguration SCF calculation. The conical intersection seam was searched in the full dimensional space for the vibrational degrees of freedom. A new conical intersection has been identified, a semi-planar conical intersection (SPCI) with main deformations inside the cytosine ring and C=O bond. The g-vector and h-vector for the semi-planar conical intersection were calculated and discussed along with their geometrical parameters. A classical trajectory dynamic simulation has been performed to characterize and identify the evolution of geometry and energy changes of the SPCI with time.	Rifaat Hilal, Saadullah Aziz, Shabaan Elrouby, Walid Hassan