Main Track (MT) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Kuranda

Chair: J. Betts

12	SparseHC: a memory-efficient online hierarchical clustering algorithm [abstract] Abstract: Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space in terms of the number objects, the design of memory-efficient approaches is of high importance to research. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted (possibly sparse) distance matrix chunk-by-chunk and a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The key insight used is that for finding the cluster pair with the smallest distance, it is not necessary to wait for completing the computation of all cluster pair distances. Partial information can be used to determine a lower bound on cluster pair distances that are used for cluster distance comparison. Our experimental results show that SparseHC achieves a linear empirical memory complexity, which is a significant improvement compared to existing algorithms.	Thuy Diem Nguyen, Bertil Schmidt, Chee Keong Kwoh
131	Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms [abstract] Abstract: The introduction of auto-tuning techniques in linear algebra routines using hybrid combinations of multiple CPU and GPU computing resources is analyzed. Basic models of the execution time and} information obtained during the installation of the routines are used to optimize the execution time with a balanced assignation of the work to the computing components in the system. The study is carried out with a basic kernel (matrix-matrix multiplication) and a higher level routine (LU factorization) using GPUs and the host multicore processor. Satisfactory results are obtained, with experimental execution times close to the lowest experimentally achievable.	Gregorio Bernabé, Javier Cuenca, Domingo Gimenez, Luis-Pedro García
143	A portable OpenCL Lattice Boltzmann code for multi- and many-core processor architectures [abstract] Abstract: The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes.	Enrico Calore, Sebastiano F. Schifano, Raffaele Tripiccione
146	Accelerating Solid-Fluid Interaction using Lattice-Boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms [abstract] Abstract: We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow. The proposed method uses a Cartesian uniform grid that incorporates both the fluid and the solid domain. This is a very optimum and novel method to solve this problem and is a growing research topic in Computational Fluid Dynamics. We explain in detail the parallelization of the whole method on both GPUs and an heterogeneous GPU-Multicore platform and describe different optimizations, focusing on memory management and CPU-GPU communication. Our performance evaluation consists of a series of numerical experiments that simulate situations of industrial and research interest. Based on these tests, we have shown that the baseline LBM implementation achieves satisfactory results on GPUs. Unfortunately, when coupling LBM and IB methods on GPUs, the overheads of IB degrade the overall performance. As an alternative we have explored an heterogeneous implementation that is able to hide such overheads and allows us to exploit both Multicore and GPU resources in a cooperative way.	Pedro Valero-Lara, Alfredo Pinelli, Manuel Prieto-Matías
110	Spatio-temporal Sequential Pattern Mining for Tourism Sciences [abstract] Abstract: Flickr presents an abundance of geotagged photos for data mining. Particularly, we propose the concept of extracting spatio-temporal meta data from Flickr photos, combining a collection of such photos together results in a spatio-temporal entity movement trail, a \textit{trajectory} describing an individual's movements. Using these spatio-temporal Flickr photographer trajectories we aim to extract valuable tourist information about where people are going, what time they are going there, and where they are likely to go next. In order to achieve this goal we present our novel spatio-temporal trajectory RoI mining and SPM framework. It is different from previous work because it forms RoIs taking into consideration both space and time simultaneously, and thus we reason producing higher-quality RoIs and thus by extension higher-quality sequential patterns too. We test our framework's ability to uncover interesting patterns for the tourism sciences industry by performing experiments using a large dataset of Queensland photo taker movements for the year 2012. Experimental results validate the usefulness of our approach at finding new, information rich spatio-temporal tourist patterns from this dataset, especially in comparison the 2D approaches shown in the literature.	Bermingham Luke, Ickjai Lee

Main Track (MT) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Kuranda

Chair: Young Choon Lee

152	An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster [abstract] Abstract: Map-Reduce programming model is commonly used for efficient scientific computations, as it executes tasks in parallel and distributed manner on large data volumes. The HPC infrastructure can effectively increase the parallelism of map-reduce tasks. However such an execution will incur high energy and data transmission costs. Here we empirically study how the energy efficiency of a map-reduce job varies with increase in parallelism and network bandwidth on a HPC cluster. We also investigate the effectiveness of power-aware systems in managing the energy consumption of different types of map-reduce jobs. We comprehend that for some jobs the energy efficiency degrades at high degree of parallelism, and for some it improves at low CPU frequency. Consequently we suggest strategies for configuring the degree of parallelism, network bandwidth and power management features in a HPC cluster for energy efficient execution of map-reduce jobs.	Nidhi Tiwari, Santonu Sarkar, Umesh Bellur, Maria Indrawan-Santiago
167	Optimal Run Length for Discrete-Event Distributed Cluster-Based Simulations [abstract] Abstract: In scientific simulations the results generated usually come from a stochastic process. New solutions with the aim of improving these simulations have been proposed, but the problem is how to compare these solutions since the results are not deterministic. Consequently how to guarantee that the output results are statistically trusted. In this work we apply a statistical approach in order to define the transient and steady state in discrete event distributed simulation. We used linear regression and batch method to find the optimal simulation size. As contributions of our work we can enumerate: we have applied and adapted the simple statistical approach in order to define the optimal simulation length; we propose the approximate approach to normal distribution instead of generate replications sufficiently large; and the method can be used in other kind of non-terminating science simulations where the data either have a normal distribution or can be approximated by a normal distribution.	Francisco Borges, Albert Gutierrez-Milla, Remo Suppi, Emilio Luque
173	A CUDA Based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization [abstract] Abstract: The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. It is an NP-hard problem and finding optimal solutions for MKP may be intractable. In this paper we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, we propose a parallel implementation under the GPGPU paradigm (General Purpose Graphics Processing Units) using CUDA. To use the algorithm presented in this paper, it is necessary to balance the number of ants, number of rounds used, and whether local search is used or not, depending on the quality of the solution desired. In other words, there is a compromise between time and quality of solution. We obtained very promising experimental results and we compared our implementation with those in the literature. The results obtained show that ant colony optimization is a viable approach to solve MKP efficiently, even for large instances, with the parallel approach.	Henrique Fingler, Edson Cáceres, Henrique Mongelli, Siang Song
174	Comparison of High Level FPGA Hardware Design for Solving Tri-Diagonal Linear Systems [abstract] Abstract: Reconfigurable computing devices can increase the performance of compute intensive algorithms by implementing application specific co-processor architectures. The power cost for this performance gain is often an order of magnitude less than that of modern CPUs and GPUs. Exploiting the potential of reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs) is typically a complex and tedious hardware engineering task. Re- cently the major FPGA vendors (Altera, and Xilinx) have released their own high-level design tools, which have great potential for rapid development of FPGA based custom accelerators. In this paper, we will evaluate Altera’s OpenCL Software Development Kit, and Xilinx’s Vivado High Level Sythesis tool. These tools will be compared for their per- formance, logic utilisation, and ease of development for the test case of a tri-diagonal linear system solver.	David Warne, Neil Kelson, Ross Hayward
181	Blood Flow Arterial Network Simulation with the Implicit Parallelism Library SkelGIS [abstract] Abstract: Implicit parallelism computing is an active research domain of computer science. Most implicit parallelism solutions to solve partial differential equations, and scientific simulations, are based on the specificity of numerical methods, where the user has to call specific functions which embed parallelism. This paper presents the implicit parallel library SkelGIS which allows the user to freely write its numerical method in a sequential programming style in C++. This library relies on four concepts which are applied, in this paper, to the specific case of network simulations. SkelGIS is evaluated on a blood flow simulation in arterial networks. Benchmarks are first performed to compare the performance and the coding difficulty of two implementations of the simulation, one using SkelGIS, and one using OpenMP. Finally, the scalability of the SkelGIS implementation, on a cluster, is studied up to 1024 cores.	Hélène Coullon, Jose-Maria Fullana, Pierre-Yves Lagrée, Sébastien Limet, Xiaofei Wang

Main Track (MT) Session 3

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Kuranda

Chair: E. Luque

186	Triplet Finder: On the Way to Triggerless Online Reconstruction with GPUs for the PANDA Experiment [abstract] Abstract: PANDA is a state-of-the-art hadron physics experiment currently under construction at FAIR, Darmstadt. In order to select events for offline analysis, PANDA will use a software-based triggerless online reconstruction, performed with a data rate of 200 GB/s. To process the raw data rate of the detector in realtime, we design and implement a GPU version of the Triplet Finder, a fast and robust first-stage tracking algorithm able to reconstruct tracks with good quality, specially designed for the Straw Tube Tracker subdetector of PANDA. We reduce the algorithmic complexity of processing many hits together by splitting them into bunches, which can be processed independently. We evaluate different ways of processing bunches, GPU dynamic parallelism being one of them. We also propose an optimized technique for associating hits with reconstructed track candidates. The evaluation of our GPU implementation demonstrates that the Triplet Finder can process almost 6 Mhits/s on a single K20X GPU, making it a promising algorithm for the online event filtering scheme of PANDA.	Andrew Adinetz, Andreas Herten, Jiri Kraus, Marius Mertens, Dirk Pleiter, Tobias Stockmanns, Peter Wintz
189	A Technique for Parallel Share-Frequent Sensor Pattern Mining from Wireless Sensor Networks [abstract] Abstract: WSNs generate huge amount of data in the form of streams and mining useful knowledge from these streams is a challenging task. Existing works generate sensor association rules using occurrence frequency of patterns with binary frequency (either absent or present) or support of a pattern as a criterion. However, considering the binary frequency or support of a pattern may not be a sufficient indicator for finding meaningful patterns from WSN data because it only reflects the number of epochs in the sensor data which contain that pattern. The share measure of sensorsets could discover useful knowledge about numerical values associated with sensor in a sensor database. Therefore, in this paper, we propose a new type of behavioral pattern called share-frequent sensor patterns by considering the non-binary frequency values of sensors in epochs. To discover share-frequent sensor patterns from sensor dataset, we propose a novel parallel and distributed framework. In this framework, we develop a novel tree structure, called parallel share-frequent sensor pattern tree (PShrFSP-tree) that is constructed at each local node independently, by capturing the database contents to generate the candidate patterns using a pattern growth technique with a single scan and then merges the locally generated candidate patterns at the final stage to generate global share-frequent sensor patterns. Comprehensive experimental results show that our proposed model is very efficient for mining share-frequent patterns from WSN data in terms of time and scalability.	Md Mamunur Rashid, Dr. Iqbal Gondal, Joarder Kamruzzaman
205	Performance-Aware Energy Saving Mechanism in Interconnection Networks for Parallel Systems [abstract] Abstract: Growing processing power of parallel computing systems require interconnection networks a higher level of complexity and higher performance, thus consuming more energy. Link components contributes a substantial proportion of the total energy consumption of the networks. Many researchers have proposed approaches to judiciously change the link speed as a function of traffic to save energy when the traffic is light. However, the link speed reduction incurs an increase in average packet latency, thus degrades network performance. This paper addresses that issue with several proposals. The simulation results show that the extended energy saving mechanism in our proposals outperforms the energy saving mechanisms in open literature.	Hai Nguyen, Daniel Franco, Emilio Luque
214	Handling Data-skew Effects in Join Operations using MapReduce [abstract] Abstract: For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.	Mostafa Bamha, Frédéric Loulergue, Mohamad Al Hajj Hassan
216	Speeding-Up a Video Summarization Approach using GPUs and Multicore-CPUs [abstract] Abstract: The recent progress of digital media has stimulated the creation, storage and distribution of data, such as digital videos, generating a large volume of data and requiring ecient technologies to increase the usability of these data. Video summarization methods generate concise summaries of video contents and enable faster browsing, indexing and accessing of large video collections, however, these methods often perform slow with large duration and high quality video data. One way to reduce this long time of execution is to develop a parallel algorithm, using the advantages of the recent computer architectures that allow high parallelism. This paper introduces parallelizations of a summarization method called VSUMM, targetting either Graphic Processor Units (GPUs) or multicore Central Processor Units (CPUs), and ultimately a sensible distribution of computation steps onto both hardware to maximise performance, called \hybrid". We performed experiments using 180 videos varying frame resolution (320 x 240, 640 x 360, and 1920 x 1080) and video length (1, 3, 5, 10, 20, and 30 minutes). From the results, we observed that the hybrid version reached the best results in terms of execution time, achieving 7 speed up in average.	Suellen Almeida, Antonio Carlos Nazaré Jr, Arnaldo De Albuquerque Araújo, Guillermo Cámara-Chávez, David Menotti

Main Track (MT) Session 4

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Kuranda

Chair: Y. Cui

222	GPU Optimization of Pseudo Random Number Generators for Random Ordinary Differential Equations [abstract] Abstract: Solving differential equations with stochastic terms involves a massive use of pseudo random numbers. We present an application for the simulation of wireframe buildings under stochastic earthquake excitation. The inherent potential for vectorization of the application is used to its full extent on GPU accelerator hardware. A representative set of pseudo random number generators for uniformly and normally distributed pseudo random numbers has been implemented, optimized, and benchmarked. The resulting optimized variants outperform standard library implementations on GPUs. The techniques and improvements shown in this contribution using the Kanai-Tajimi model can be generalized to other random differential equations or stochastic models as well as other accelerators.	Christoph Riesinger, Tobias Neckel, Florian Rupp, Alfredo Parra Hinojosa, Hans-Joachim Bungartz
229	Design and Implementation of Hybrid and Native Communication Devices for Java HPC [abstract] Abstract: MPJ Express is a messaging system that allows computational scientists to write and execute parallel Java applications on High Performance Computing (HPC) hardware. The software is capable of executing in two modes namely cluster and multicore modes. In the cluster mode, parallel applications execute in a typical cluster environment where multiple processing elements communicate with one another using a fast interconnect like Gigabit Ethernet or other proprietary networks like Myrinet and Infiniband. In this context, the MPJ Express library provides communication devices for Ethernet and Myrinet. In the multicore mode, the parallel Java application executes on a single system comprising of shared memory or multicore processors. In this paper, we extend the MPJ Express software to provide two new communication devices namely the native and hybrid device. The goal of the native communication device is to interface the MPJ Express software with native—typically written in C—MPI libraries. In this setting the bulk of messaging logic is offloaded to the underlying MPI library. This is attractive because MPJ Express can exploit latest features, like support for new interconnects and efficient collective communication algorithms of the native MPI library. The second device, called the hybrid device, is developed to allow efficient execution of parallel Java applications on clusters of shared memory or multicore processors. In this setting the MPJ Express runtime system runs a single multithreaded process on each node of the cluster—the number of threads in each process is equivalent to processing elements within a node. Our performance evaluation reveals that the native device allows MPJ Express to achieve comparable performance to native MPI libraries—for latency and bandwidth of point-to-point and collective communications—which is a significant gain in performance compared to existing communication devices. The hybrid communication device—without any modifications at application level—also helps parallel applications achieve better speedups and scalability. We witnessed comparative performance for various benchmarks—including NAS Parallel Benchmarks—with hybrid device as compared to the existing Ethernet communication device on a cluster of shared memory/multicore processors.	Bibrak Qamar, Ansar Javed, Mohsan Jameel, Aamir Shafi, Bryan Carpenter
231	Deploying a Large Petascale System: the Blue Waters Experience [abstract] Abstract: Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.	Celso Mendes, Brett Bode, Gregory Bauer, Jeremy Enos, Cristina Beldica, William Kramer
248	FPGA-based acceleration of detecting statistical epistasis in GWAS [abstract] Abstract: Genotype-by-genotype interactions (epistasis) are believed to be a significant source of unexplained genetic variation causing complex chronic diseases but have been ignored in genome-wide association studies (GWAS) due to the computational burden of analysis. In this work we show how to benefit from FPGA technology for highly parallel creation of contingency tables in a systolic chain with a subsequent statistical test. We present the implementation for the FPGA-based hardware platform RIVYERA S6-LX150 containing 128 Xilinx Spartan6-LX150 FPGAs. For performance evaluation we compare against the method iLOCi. iLOCi claims to outperform other available tools in terms of accuracy. However, analysis of a dataset from the Wellcome Trust Case Control Consortium (WTCCC) with about 500,000 SNPs and 5,000 samples still takes about 19 hours on a MacPro workstation with two Intel Xeon quad-core CPUs, while our FPGA-based implementation requires only 4 minutes.	Lars Wienbrandt, Jan Christian Kässens, Jorge González-Domínguez, Bertil Schmidt, David Ellinghaus, Manfred Schimmler

Main Track (MT) Session 5

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Kuranda

Chair: M. Wagner

288	OS Support for Load Scheduling in Accelerator-based Heterogeneous Systems [abstract] Abstract: The involvement of accelerators is becoming widespread in the field of heterogeneous processing, performing computation tasks through a wide range of applications. With the advent of the various computing architectures existing currently, the need for a system-wide multitasking environment is increasing. Therefore, we present an OpenCL-based scheduler that is designed as a multi-user computing environment to make use of the full potential of available resources while running as a daemon. Multiple tasks can be issued by means of a C++ API that relies on the OpenCL C++! wrapper. At this point, the daemon takes over the control immediately and performs load scheduling. Due to its implementation, our approach can be easily applicable to a common OS. We validate our method through extensive experiments deploying a set of applications, which show that the low scheduling costs remain constant in total over a wide range of input size. Besides the different CPUs, a variety of modern GPU and other accelerator architectures are used in the experiments.	Ayman Tarakji, Niels Ole Salscheider, David Hebbeker
369	Efficient Global Element Indexing for Parallel Adaptive Flow Solvers [abstract] Abstract: Many grid-based solvers for partial differential equations (PDE) assemble matrices explicitely for discretizing the underlying PDE operators and/or for the underlying (non-)linear systems of equations. Often, the data structures or solver packages require a consecutive global numbering of the degrees of freedom across the boundaries of different parallel subdomains. Straightforward approaches to realize this global indexing in parallel frequently result in serial parts of the assembling algorithms which causes a considerable bottleneck, in particular in large-scale applications. We present an efficient way to set up such a global indexing numbering scheme for large configurations via a position-based numeration on all parallel processes locally. The global number of shared nodes is determined via a tree-based communication pattern. We verified our implementation via state-of-the-art benchmark scenarios for incompressible flow simulations. A small performance study shows the parallel capability of our approach. The corresponding results can be generalized to other grid-based solvers that demand for global indexing in the context of large-scale parallelization.	Michael Lieb, Tobias Neckel, Hans-Joachim Bungartz, Thomas Schöps
382	Performance Improvements for a Large-Scale Geological Simulation [abstract] Abstract: Geological models have been successfully used to identify and study geothermal energy resources. Many computer simulations based on these models are data-intensive applications. Large-scale geological simulations require high performance computing (HPC) techniques to run within reasonable time constraints and performance levels. One research area that can benefit greatly from HPC techniques is the modeling of heat flow beneath the Earth’s surface. This paper describes the application of HPC techniques to increase the scale of research with a well-established geological model. Recently, a serial C++ application based on this geological model was ported to a parallel HPC applications using MPI. An area of focus was to increase the performance of the MPI version to enable state or regional scale simulations using large numbers of processors. First, synchronous communications among MPI processes was replaced by overlapping communication and computation (asynchronous communication). Asynchronous communication improved performance over synchronous communications by averages of 28% using 56 cores in one environment and 46% using 56 cores in another. Second, an approach for load balancing involving repartitioning the data at the start of the program resulted in runtime performance improvements of 32% using 48 cores in the first environment and 14% using 24 cores in the second when compared to the asynchronous version. An additional feature, modeling of erosion, was also added to the MPI code base. The performance improvement techniques under erosion were less effective.	David Apostal, Kyle Foerster, Travis Desell, Will Gosnold
168	Lattice Gas Model for Budding Yeast: A New Approach for Density Effects [abstract] Abstract: Yeasts in culture media grow exponentially in early period but eventually stop growing. The saturation of population growth is due to “density effect”. The budding yeast, Saccharomyces cerevisiae, is known to exhibit an age-dependent cell division. Daughter cell, which gives no birth, has longer generation time than mother, because daughter needs maturing period. So far, investigations in exponential growth period have been intensively accumulated; very little is known for the stage dependence of density effect. Here we present an "in vivo" study of density effect, applying a lattice gas model to explore the age-structure dynamics. It is, however hard to solve basic equations, because they have an infinite number of variables and parameters. The basic equations are constructed from several simplified models which have few variables and parameters. These simplified models are compared with experimental data to report two findings for stage-dependent density effect: 1) paradox of decline birthrate (PDB), and 2) mass suicide. These events suddenly and temporarily occur at early stage of density effect. The mother-daughter model leads to PDB. Namely, when the birthrate of population is decreased, then the fraction of daughter is abruptly increased. Moreover, find the average age of yeast population suddenly decreases at the inflection point. This means the mass apoptosis of aged mothers. Our results imply the existence of several types of "pheromones" that specifically inhibit the population growth.	Kei-Ichi Tainaka, Takashi Ushimaru, Toshiyuki Hagiwara, Jin Yoshimura
185	Characteristics of displacement data due to time scale for the combination of Brownian motion with intermittent adsorption [abstract] Abstract: Single-molecule tracking data near solid surfaces contains information on diffusion that is potentially affected by adsorption. However, molecular adsorption can occur in an intermittent manner, and the overall phenomenon is regarded as slower yet normal diffusion if the time scale of each adsorption event is sufficiently shorter than the interval of data acquisition. We compare simple numerical model systems that vary in the time scale of adsorption event while sharing the same diffusion coefficient, and show that the shape of the displacement distribution depends on the time resolution. We also evaluate the characteristics by statistical quantities related to the large deviation principle.	Itsuo Hanasaki, Satoshi Uehara, Satoyuki Kawano

Main Track (MT) Session 6

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Kuranda

Chair: Andrew Lewis

199	Mechanism of Traffic Jams at Speed Bottlenecks [abstract] Abstract: In the past 20 years of complexity science, traffic has been studied as a complex sys- tem with a large amount of interacting agents. Since traffic has become an important aspect of our lives, understanding traffic system and how it interacts with various factors is essential. In this paper, the interactions between traffic flow and road topology will be studied, particularly regarding the relationship between a sharp bend in a road segment and traffic jams. As suggested by Sugiyama[1], when car density exceed a critical density, the fluctuations in speed of each car triggers a greater fluctuation in speed of the car be- hind. This enhancement of fluctuation leads to the congestion of vehicles. Using a cellular automata model modified from Nagel-Schreckenberg CA model[2], the simulation results suggests that the mechanism of traffic jam at bottlenecks is similar to this. Instead of directly causing the congestion in cars, bottleneck on roads only causes the local density of traffic to increase. The resultant congestion is still due to the enhancement of fluctuations. Results of this study opened up a large number of possible analytical studies which could be used as grounds for future works.	Wei Liang Quek, Lock Yue Chew
234	Computing, a powerful tool in flood prediction [abstract] Abstract: Floods have caused widespread damages throughout the world. Modelling and simulation provide solutions and tools enabling us to face this reality in order to forecast and to make necessary prevention. One problem that must be handled by physical systems simulators is the parameters uncertainty and their impact on output results, causing prediction errors. In this paper, we address input parameters uncertainty towards providing a methodology to tune a flood simulator and achieve lower error between simulated and observed results. The tuning methodology, through a parametric simulation technique, implements a first stage to finding an adjusted set of critical parameters which will be used in a next stage to validate the predictive capability of the simulator in order to reduce the disagreement between observed data and simulated results. We concentrate our experiments in three significant monitoring stations and the percentage of improvement over the original simulator values ranges from 33 to 60%.	Adriana Gaudiani, Emilo Luque, Pablo Garcia, Mariano Re, Marcelo Naiouf, Armando De Giusti
117	Benchmarking and Data Envelopment Analysis. An Approach Based on Metaheuristics [abstract] Abstract: Data Envelopment Analysis (DEA) is a non-parametric technique to estimate the current level of efficiency of a set of entities. DEA provides information on how to remove inefficiency through the determination of benchmarking information. This paper is devoted to study DEA models based on closest efficient targets, which are related to the shortest projection to the production frontier and allow inefficient firms to find the easiest way to improve their performance. Usually, these models have been solved by means of unsatisfactory methods since all of them are related in some sense to a combinatorial NP-hard problem. In this paper, the problem is approached by metaheuristic techniques. Due to the high number of restrictions of the problem, finding solutions to be used in the metaheuristic algorithm is a difficult problem. Thus, this paper analyzes and compares some heuristic algorithms to obtain solutions of the problem. Each restriction determines the design of these heuristics. Thus, the problem is considered by adding constraints one by one. In this paper, the problem is presented and studied taking into account 9 of the 14 constraints, and the solution to this new problem is an upper bound of the optimal value of the original problem.	Jose J. Lopez-Espin, Juan Aparicio, Domingo Gimenez, Jesús T. Pastor
249	Consensus reaching in swarms ruled by a hybrid metric-topological distance [abstract] Abstract: Recent empirical observations of three-dimensional bird flocks and human crowds have challenged the long-prevailing assumption that a metric interaction distance rules swarming behaviors. In some cases, individual agents are found to be engaged in local information exchanges with a fixed number of neighbors, i.e. a topological interaction. However, complex system dynamics based on pure metric or pure topological distances both face physical inconsistencies in low and high density situations. Here, we propose a hybrid metric-topological interaction distance overcoming these issues and enabling a real-life implementation in artificial robotic swarms. We use network- and graph-theoretic approaches combined with a dynamical model of locally interacting self-propelled particles to study the consensus reaching process for a swarm ruled by this hybrid interaction distance. Specifically, we establish exactly the probability of reaching consensus in the absence of noise. In addition, simulations of swarms of self-propelled particles are carried out to assess the influence of the hybrid distance and noise.	Yilun Shang and Roland Bouffanais
258	Simulating Element Creation in Supernovae with the Computational Infrastructure for Nuclear Astrophysics at nucastrodata.org [abstract] Abstract: The elements that make up our bodies and the world around us are produced in violent stellar explosions. Computational simulations of the element creation processes occurring in these cataclysmic phenomena are complex calculations that track the abundances of thousands of species of atomic nuclei throughout the star. These species are created and destroyed by ~60,000 thermonuclear reactions whose rates are stored in continually updated databases. Previously, delays of up to a decade were experienced before the latest experimental reaction rates were used in astrophysical simulations. The Computational Infrastructure for Nuclear Astrophysics (CINA), freely available at the website nucastrodata.org, reduces this delay from years to minutes! With over 100 unique software tools developed over the last decade, CINA comprises a “lab-to-star” connection. It is the only cloud computing software system in this field and it is accessible via an easy-to-use, web-deliverable, cross-platform Java application. The system gives users the capability to robustly simulate, share, store, analyze and visualize explosive nucleosynthesis events such as novae, X-ray bursts and (new in 2013) core-collapse supernovae. In addition, users can upload, modify, merge, store and share the complex input data required by these simulations. Presently, we are expanding the capabilities of CINA to meet the needs of our users who currently come from 141 institutions and 32 countries. We will describe CINA’s current suite of software tools and the comprehensive list of online nuclear astrophysics datasets available at the nucastrodata.org website. This work is funded by the DOE’s Office of Nuclear Physics under the US Nuclear Data Program.	E. J. Lingerfelt, M. S. Smith, W. R. Hix and C. R. Smith

Main Track (MT) Session 7

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Kuranda

Chair: Maria Indrawan-Santiago

321	Evolving Agent-based Models using Complexification Approach [abstract] Abstract: This paper focuses on parameter search for multi-agent based models using evolutionary algorithms. Large numbers and variable dimensions of parameters require a search method which can efficiently handle a high dimensional search space. We are proposing the use of complexification as it emulates the natural way of evolution by starting with a small constrained search space and expanding it as the evolution progresses. We examined the effects of this method on an EA by evolving parameters for two multi-agent based models.	Michael Wagner, Wentong Cai, Michael Harold Lees, Heiko Aydt
356	Discrete modeling and simulation of business processes using event logs. [abstract] Abstract: An approach to business process modelling for short term KPI prediction, based on event logs and values of environment variables, is proposed. Ready-for-simulation process model is built semi-automatically, expert only inputs desired environment variables, which are used as features during the learning process. Process workflow is extracted as a Petri Net model using a combination of process mining algorithms. Dependencies between features and process variables are formalized using decision and regression trees techniques. Experiments were conducted to predict KPIs of real companies.	Ivan Khodyrev, Svetlana Popova
376	Modeling and Simulation Framework for Development of Interactive Virtual Environments [abstract] Abstract: The article presents a framework for interactive virtual environments’ development for simulation and modeling of complex systems. The framework uses system’s structural model as a core concept for composition and control of simulation-based scientific experiments not in terms of technological processes or workflows but in terms of domain-specific objects and their interconnection within the investigated system. The proposed framework enables integration and management of resources available within a cloud computing environment in order to support automatic simulation management and to provide the user with an interactive visual domain-specific interface to the system.	Konstantin Knyazkov, Sergey Kovalchuk
34	Using interactive 3D game play to make complex medical knowledge more accessible [abstract] Abstract: This research outlines a new approach, that takes complex medical, nutritional & activity data and presents it to the diabetic patient in the form of a mobile app/game that uses interactive 3D computer graphics & game play to make this complex information more accessible. The pilot randomized control study results indicate that the Diabetes Visualizer’s use of interactive 3D game play increased the participants understanding of the condition, and its day-to-day management. More importantly the Diabetes Visualizer app stimulated participants interest in, and desire to engage in the task of diabetes management.	Dale Patterson

Main Track (MT) Session 8

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Tully I

Chair: Michela Taufer

115	The influence of network topology on reverse-engineering of gene-regulatory networks [abstract] Abstract: Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern computational biology investigations into gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from gene expression time-course data. Common mathematical formalisms used to represent such models capture both the relative weight or strength of a regulator gene and the type of the regulator (activator, repressor) with a single model parameter. The goal of this study is to quantify the role this parameter plays in terms of the computational performance of the reverse-engineering process and the predictive power of the inferred GRN models. We carried out three sets of computational experiments on a GRN system consisting of 22 genes. While more comprehensive studies of this kind are ultimately required, this computational study demonstrates that models with similar training (reverse-engineering) error that have been inferred under varying degrees of a priori known topology information, exhibit considerably different predictive performance. This study was performed with a newly developed multiscale modeling and simulation tool called MultiGrain/MAPPER.	Alexandru Mizeranschi, Noel Kennedy, Paul Thompson, Huiru Zheng, Werner Dubitzky
188	Maximizing the Cumulative Influence through a Social Network when Repeat Activation Exists [abstract] Abstract: We study the problem of employing social networks for propagate influence when repeat activation is involved. While influence maximization has been extensively studied as the fundamental solution, it neglects the reality that a user may purchase a product/service repeatedly, incurring cumulative sales of the product/service. In this paper, we explore a new problem of cumulative influence maximization that brings the influence maximization a step closer to real-world viral marketing applications. In our problem setting, repeat activation exists and we aim to find a set of initial users, through which the maximal cumulative influence can be stimulated in a given time period. To describe the repeat activation behavior, we adopt the voter model to reflect the variation of activations over time. Under the voter model, we formulate the maximization problem and present an effective algorithm. We test and compare our method with heuristic algorithms on real-world data sets. Experimental results demonstrate the utility of the proposed method.	Chuan Zhou, Peng Zhang, Wenyu Zang, Li Guo
320	Mining Large-scale Knowledge about Events from Web Text [abstract] Abstract: This paper addresses the problem of automatic acquisition of semantic relations between events. Since most of the previous researches rely on annotated corpus, the main challenge is the need for more generic methods to identify related event pairs and to extract event-arguments (particularly the predicate, subject and object). Motivated by this background, we develop a three-phased approach that acquires causality from the Web. Firstly, we use explicit connective markers (such as “because”) as linguistic cues to discover causal related events. Then, we extract the event-arguments based on local dependency parse trees of event expressions. In the final phase, we propose a statistical model to measure the potential causal relations. The present results of our empirical evaluation on a large-scale Chinese Web corpus have shown that (a) the use of local dependency tree extensively improves both the accuracy and recall of event-arguments extraction task; (b) our measure which is an improvement on PMI has a better performance.	Yanan Cao, Peng Zhang, Jing Guo, Li Guo
200	Discovering Multiple Diffusion Source Nodes in Social Networks [abstract] Abstract: Social networks have greatly amplified spread of information across different communities. However, we recently observe that various malicious information, such as computer virus and rumors, are broadly spread via social networks. To restrict these malicious information, it is critical to develop effective method to discover the diffusion source nodes in social networks. Many pioneer works have explored the source node identification problem, but they all based on an ideal assumption that there is only a single source node, neglecting the fact that malicious information are often diffused from multiple sources to intentionally avoid network audit. In this paper, we present a multi-source locating method based on a given snapshot of partially and sparsely observed infected nodes in the network. Specifically, we first present a reverse propagation method to detect recovered and unobserved infected nodes in the network, and then we use community cluster algorithms to change the multi-source locating problem into a bunch of single source locating problems. At the last step, we identify the nodes having the largest likelihood estimations as the source node on the infected clusters. Experiments on three different types of complex networks show the performance of the proposed method.	Wenyu Zang, Peng Zhang, Chuan Zhou, Li Guo
293	The Origin of Control in Complex Networks [abstract] Abstract: Recent work at the borders of network science and control theory have begun to investigate the control of complex systems by studying their underlying network representations. A majority of the work in this nascent field has looked at the number of controls required in order to fully control a network. In this talk I will present research that provides a ready breakdown of this number into categories that are both easy to observe in real world networks as well as instructive in understanding the underlying functional reasons for why the controls must exist. This breakdown is able to shed light on several observations made in the previous literature regarding controllability of networks. This decomposition produces a mechanism to cluster networks into classes that are consistent with their large scale architecture and purpose. Finally, we observe that synthetic models of formation generate networks with control breakdowns substantially different from what is observed in real world networks.	Justin Ruths

Main Track (MT) Session 9

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully I

Chair: S. Chuprina

301	Study of the Network Impact on Earthquake Early Warning in the Quake-Catcher Network Project [abstract] Abstract: The Quake-Catcher Network (QCN) project uses the low cost sensors, i.e., accelerometers attached to volunteers' computers, to detect earthquakes. The master-worker topology currently used in QCN and other similar projects suffers from major weaknesses. The centralized master can fail to collect data if the volunteers' computers are not connected to the network, or it can introduce significant delays in the warning if the network is congested. We propose to solve these problems by using multiple servers in a more advanced network topology than the simple master-worker configuration. We first consider several critical scenarios in which the current master-worker configuration of QCN can hinder the early warning of an earthquake, and then integrate the advanced network topology around multiple servers and emulate these critical scenarios in a simulation environment to quantify the benefits and costs of our proposed solution. We show how our solution can reduce the time to detect an earthquake from 1.8 s to 173 ms in case of network congestion and the number of lost trickle messages from 2,013 to 391 messages in case of network failure.	Marcos Portnoi, Samuel Schlachter, Michela Taufer
315	The p-index: Ranking Scientists using Network Dynamics [abstract] Abstract: The indices currently used by scholarly databases, such as Google scholar, to rank scientists, do not attach weights to the citations. Neither is the underlying network structure of citations considered in computing these metrics. This results in scientists cited by well-recognized journals not being rewarded, and may lead to potential misuse if documents are created purely to cite others. In this paper we introduce a new ranking metric, the p-index (pagerank-index), which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation. The index is a percentile score, and can potentially be implemented in public databases such as Google scholar, and can be applied at many levels of abstraction. We demonstrate that the metric aids in fairer ranking of scientists compared to h-index and its variants. We do this by simulating a realistic model of the evolution of citation and collaboration networks in a particular field, and comparing h-index and p-index of scientists under a number of scenarios. Our results show that the p-index is immune to author behaviors that can result in artificially bloated h-index values.	Upul Senanayake, Mahendrarajah Piraveenan, Albert Zomaya
191	A Clustering-based Link Prediction Method in Social Networks [abstract] Abstract: Link prediction is an important task in social network analysis, which also has applications in other domains like, recommender systems, molecular biology and criminal investigations. The classical methods of link prediction are based on graph topology structure and path features but few consider clustering information. The cluster in graphs is densely connected group of vertices and sparsely connected to other groups. Actually, the clustering results contain the essential information for link prediction, and these vertices common neighbors may play different roles depending on if they belong to the same cluster. Based on this assumption and characteristics of the common social networks, in this paper, we propose a link prediction method based on clustering and global information. Our experiments on both synthetic and real-world networks show that this method can improve link prediction accuracy as the number of cluster grows.	Fenhua Li, Jing He, Guangyan Huang, Yanchun Zhang, Yong Shi
345	A Technology for BigData Analysis Task Description using Domain-Specific Languages [abstract] Abstract: The article presents a technology for dynamic knowledge-based building of Domain-Specific Languages (DSL) for description of data-intensive scientific discovery tasks using BigData technology. The proposed technology supports high level abstract definition of analytic and simulation parts of the task as well as integration into the composite scientific solutions. Automatic translation of the abstract task definition enables seamless integration of various data sources within single solution.	Sergey Kovalchuk, Artem Zakharchuk, Jiaqi Liao, Sergey Ivanov, Alexander Boukhanovsky
66	Characteristics of Dynamical Phase Transitions for Noise Intensities [abstract] Abstract: We simulate and analyze dynamical phase transitions in a Boolean neural network with initial random connections. Since we treat a stochastic evolution by using a noise intensity, we show from our condition that there exists a critical value for the noise intensity. The nature of the phase transition are found numerically and analytically in two connections (of probability density function) and one random network.	Muyoung Heo, Jong-Kil Park, Kyungsik Kim

Main Track (MT) Session 10

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Tully I

Chair: S. Smanchat

18	A Workflow Application for Parallel Processing of Big Data from an Internet Portal [abstract] Abstract: The paper presents a workflow application for efficient parallel processing of data downloaded from an Internet portal. The workflow partitions input files into subdirectories which are further split for parallel processing by services installed on distinct computer nodes. This way, analysis of the first ready subdirectories can start fast and is handled by services implemented as parallel multithreaded applications using multiple cores of modern CPUs. The goal is to assess achievable speed-ups and determine which factors influence scalability and to what degree. Data processing services were implemented for assessment of context (positive or negative) in which the given keyword appears in a document. The testbed application used these services to determine how a particular brand was recognized by either authors of articles or readers in comments in a specific Internet portal focused on new technologies. Obtained execution times as well as speed-ups are presented for data sets of various sizes along with discussion on how factors such as load imbalance and memory/disk bottlenecks limit performance.	Pawel Czarnul
273	A comparative study of scheduling algorithms for the multiple deadline-constrained workflows in heterogeneous computing systems with time windows [abstract] Abstract: Scheduling tasks with precedence constraints on a set of resources with different performances is a well-known NP-complete problem, and a number of effective heuristics has been proposed to solve it. If the start time and the deadline for each specific workflow are known (for example, if a workflow starts execution according to periodic data coming from the sensors, and its execution should be completed before data acquisition), the problem of multiple deadline-constrained workflows scheduling arises. Taking into account that resource providers can give only restricted access to their computational capabilities, we consider the case when resources are partially available for workflow execution. To address the problem described above, we study the scheduling of deadline-constrained scientific workflows in non-dedicated heterogeneous environment. In this paper, we introduce three scheduling algorithms for mapping the tasks of multiple workflows with different deadlines on the static set of resources with previously known free time windows. Simulation experiments show that scheduling strategies based on a proposed staged scheme give better results than merge-based approach considering all workflows at once.	Klavdiya Bochenina
292	Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds [abstract] Abstract: Scientific workflows are used to model applications of high throughput computation and complex large scale data analysis. In recent years, Cloud computing is fast evolving as the target platform for such applications among researchers. Furthermore, new pricing models have been pioneered by Cloud providers that allow users to provision resources and to use them in an efficient manner with significant cost reductions. In this paper, we propose a scheduling algorithm that schedules tasks on Cloud resources using two different pricing models (spot and on-demand instances) to reduce the cost of execution whilst meeting the workflow deadline. The proposed algorithm is fault tolerant against the premature termination of spot instances and also robust against performance variations of Cloud resources. Experimental results demonstrate that our heuristic reduces up to 70% execution cost as against using only on-demand instances.	Deepak Poola, Kotagiri Ramamohanarao, Rajkumar Buyya
308	On Resource Efficiency of Workflow Schedules [abstract] Abstract: This paper presents the Maximum Effective Reduction (MER) algorithm, which optimizes the resource efficiency of a workflow schedule generated by any particular scheduling algorithm. MER trades the minimal makespan increase for the maximal resource usage reduction by consolidating tasks with the exploitation of resource inefficiency in the original workflow schedule. Our evaluation shows that the rate of resource usage reduction far outweighs that of the increase in makespan, i.e., the number of resources used is halved on average while incurring an increase in makespan of less than 10%.	Young Choon Lee, Albert Y. Zomaya, Hyuck Han
346	GridMD: a Lightweight Portable C++ Library for Workflow Management [abstract] Abstract: In this contribution we present the current state of the open source GridMD workflow library (http://gridmd.sourceforge.net). The library was originally designed for programmers of distributed Molecular Dynamics (MD) simulations, however nowadays it serves as a universal tool for creating and managing general workflows from a compact client application. GridMD is a programming tool aimed at the developers of distributed software that utilizes local or remote compute capabilities to perform loosely coupled computational tasks. Unlike other workflow systems and platforms, GridMD is not integrated with heavy infrastructure such as Grid systems, web portals, user and resource management systems and databases. It is a very lightweight tool accessing and operating on a remote site by delegated user credentials. For starting compute jobs the library supports Globus Grid environment; a set of cluster queuing managers such as PBS(Torque) or SLURM and Unix/Windows command shells. All job starting mechanisms may either be used locally or remotely via the integrated SSH protocol. Working with different queues, starting of parallel (MPI) jobs and changing job parameters is generically supported by the API. The jobs are started and monitored in a “passive” way, not requiring any special task management agents to be running or even installed on the remote system. The workflow execution is monitored by an application (task manager performing GridMD API calls) running on a client machine. Data transfer between different compute resources and from the client machine and a compute resource is performed by the exchange of files (gridftp or ssh channels). Task manager is able to checkpoint and restart the workflow and to recover from different types of errors without recalculating the whole workflow. Task manager itself can easily be terminated/restarted on the client machine or transferred to another client without breaking the workflow execution. Apart from the separated tasks such as command series or application launches, GridMD workflow may also manage integrated tasks that are described by the code compiled as part of task manager. Moreover, the integrated tasks may change the workflow dynamically by adding additional jobs or dependencies to the existing workflow graph. The dynamical management of the workflow graph is an essential feature of GridMD, which adds large flexibility for the programmer of the distributed scenarios. GridMD also provides a set of useful workflow skeletons for standard distributed scenarios such as Pipe, Fork, Parameter Sweep, Loop (implemented as dynamical workflow). In the talk we will discuss the architecture and special features of GridMD. We will also briefly describe the recent applications of GridMD as a base for distributed job manager, for example in the multiscale OLED simulation platform (EU-Russia IM3OLED project).	Ilya Valuev and Igor Morozov

Main Track (MT) Session 11

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Tully I

Chair: Dieter Kranzlmuller

360	Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms [abstract] Abstract: With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.	Jianwu Wang, Prakashan Korambath, Ilkay Altintas, Jim Davis, Daniel Crawl
36	Large Eddy Simulation of Flow in Realistic Human Upper Airways with Obstructive Sleep Apnea [abstract] Abstract: Obstructive sleep apnea (OSA) is a common type of sleep disorder characterized by abnormal repetitive cessation in breathing during sleep caused by partial or complete narrowing of pharynx in the upper airway. The upper airway surgery is commonly performed for this disorder, however the success rate is limited because the lack of the thorough understanding of the primary mechanism associated with OSA. The computational fluid dynamics (CFD) simulation with Large Eddy Simulation approach is conducted to investigate a patient-specific upper airway flow with severe OSA. Both pre and post-surgical upper airway models are simulated to reveal the effect of the surgical treatment. Only the inhaled breathing is conducted with six periods (about 15 second) unsteady flow. Compared with the results before and after treatment, it is illustrated that there exists a significant pressure and shear stress dropping region near the soft palate before treatment; and after the treatment the flow resistance in the upper airway is decreased and the wall shear stress value is significantly reduced.	Mingzhen Lu, Yang Liu, Jingying Ye, Haiyan Luo
86	Experiments on a Parallel Nonlinear Jacobi-Davidson Algorithm [abstract] Abstract: The Jacobi-Davidson (JD) algorithm is very well suited for the computation of a few eigenpairs of large sparse complex symmetric nonlinear eigenvalue problems. The performance of JD crucially depends on the treatment of the so-called correction equation, in particular the preconditioner, and the initial vector. Depending on the choice of the spectral shift and the accuracy of the solution, the convergence of JD can vary from linear to cubic. We investigate parallel preconditioners for the Krylov space method used to solve the correction equation. We apply our nonlinear Jacobi-Davidson (NLJD) method to quadratic eigenvalue problems that originate from the time-harmonic Maxwell equation for the modeling and simulation of resonating electromagnetic structures.	Yoichi Matsuo, Hua Guo, Peter Arbenz
184	Improving Collaborative Recommendation via Location-based User-Item Subgroup [abstract] Abstract: Collaborative filter has been widely and successfully applied in recommendation system. It typically associates a user with a group of like-minded users based on their preferences over all the items, and recommends to the user those items enjoyed by others in the group. Some previous studies have explored that there exist many user-item subgroups each consisting of a subset of items and a group of like-minded users on these items and subgroup analysis can get better accuracy. While, we find that geographical information of user have impacts on user group preference for items. Hence, In this paper, we propose a Bayesian generative model to describe the generative process of user-item subgroup preference under considering users' geographical information. Experimental results show the superiority of the proposed model.	Zhi Qiao, Peng Zhang, Yanan Cao, Chuan Zhou, Li Guo
90	Optimizing Shared-Memory Hyperheuristics on top of Parameterized Metaheuristics [abstract] Abstract: This paper studies the auto-tuning of shared-memory hyperheuristics developed on top of a unified shared-memory metaheuristic scheme. A theoretical model of the execution time of the unified scheme is empirically adapted for particular metaheuristics and hyperheuristics through experimentation. The model is used to decide at running time the number of threads to obtain a reduced execution time. The number of threads is different for the different basic functions in the scheme, and depends on the problem to be solved, the metaheuristic scheme, the implementation of the basic functions and the computational system where the problem is solved. The applicability of the proposal is shown with a problem of minimization of electricity consumption in exploitation of wells. Experimental results show that satisfactory execution times can be achieved with auto-tuning techniques based on theoretical-empirical models of the execution time.	José Matías Cutillas Lozano, Domingo Gimenez

Main Track (MT) Session 12

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Tully I

Chair: Luiz DeRose

187	The K computer Operations: Experiences and Statistics [abstract] Abstract: The K computer, released on September 29, 2012, is a large-scale parallel supercomputer system consisting of 82,944 compute nodes. We have been able to resolve a significant number of operation issues since its release. Some system software components have been fixed and improved to obtain higher stability and utilization. We achieved 94% service availability because of a low hardware failure rate and approximately 80% node utilization by careful adjustment of operation parameters. We found that the K computer is an extremely stable and high utilization system.	Keiji Yamamoto, Atsuya Uno, Hitoshi Murai, Toshiyuki Tsukamoto, Fumiyoshi Shoji, Shuji Matsui, Ryuichi Sekizawa, Fumichika Sueyasu, Hiroshi Uchiyama, Mitsuo Okamoto, Nobuo Ohgushi, Katsutoshi Takashina, Daisuke Wakabayashi, Yuki Taguchi, Mitsuo Yokokawa
195	Quantum mechanics study of hexane isomers through gamma-ray and graph theory combined with C1s binding energy and nuclear magnetic spectra (NMR) [abstract] Abstract: Quantum mechanically calculated positron-electron annihilation gamma-ray spectra, C1s binding energy spectra and NMR spectra are employed to study the electronic structures of hexane and its isomers, which is assisted using graph theory. Our recent positron-electron annihilation gamma-ray spectral study of n-hexane in gas phase and core ionization (IPs) spectral studies for small alkanes and their isomers, have paved the path for the present correlation study where quantum mechanics is combined with graph theory, C1s ionization spectroscopy and nuclear magnetic resonance (NMR), to further understand the electronic structure and topology for the hexane isomers. The low-energy plane wave positron (LEPWP) model indicated that the positrophilic electrons of a molecule are dominated by the electrons in the lowest occupied valence orbital (LOVO). The most recent results using NOMO indicated that the electronic wave functions dominate the electron-positron wave functions for molecular systems. In addition to quantum mechanics, chemical graphs are also studied and are presented in the present study.	Subhojyoti Chatterjee and Feng Wang
257	Dendrogram Based Algorithm for Dominated Graph Flooding [abstract] Abstract: In this paper, we are concerned with the problem of flooding undirected weighted graphs under ceiling constraints. We provide a new algorithm based on a hierarchical structure called {\em dendrogram}, which offers the significant advantage that it can be used for multiple flooding with various scenarios of the ceiling values. In addition, when exploring the graph through its dendrogram structure in order to calculate the flooding levels, independent sub-dendrograms are generated, thus offering a natural way for parallel processing. We provide an efficient implementation of our algorithm through suitable data structures and optimal organisation of the computations. Experimental results show that our algorithm outperforms well established classical algorithms, and reveal that the cost of building the dendrogram highly predominates over the total running time, thus validating both the efficiency and the hallmark of our method. Moreover, we exploit the potential parallelism exposed by the flooding procedure to design a multi-thread implementation. As the underlying parallelism is created on the fly, we use a queue to store the list of the sub-dendrograms to be explored, and then use a dynamic round-robin scheduling to assign them to the participating threads. This yields a load balanced and scalable process as shown by additional benchmark results. Our program runs in few seconds on an ordinary computer to flood graphs with more that $20$ millions of nodes.	Claude Tadonki
278	HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON [abstract] Abstract: The demands of improving energy efficiency for high performance scientific applications arise crucially nowadays. Software-controlled hardware solutions directed by Dynamic Voltage and Frequency Scaling (DVFS) have shown their effectiveness extensively. Although DVFS is beneficial to green computing, introducing DVFS itself can incur non-negligible overhead, if there exist a large number of frequency switches issued by DVFS. In this paper, we propose a strategy to achieve the optimal energy savings for distributed matrix multiplication via algorithmically trading more computation and communication at a time adaptively with user-specified memory costs for less DVFS switches, which saves 7.5% more energy on average than a classic strategy. Moreover, we leverage a high performance communication scheme for fully exploiting network bandwidth via pipeline broadcast. Overall, the integrated approach achieves substantial energy savings (up to 51.4%) and performance gain (28.6% on average) compared to ScaLAPACK pdgemm() on a cluster with an Ethernet switch, and outperforms ScaLAPACK and DPLASMA pdgemm() respectively by 33.3% and 32.7% on average on a cluster with an Infiniband switch.	Li Tan, Longxiang Chen, Zizhong Chen, Ziliang Zong, Rong Ge, Dong Li
279	Evaluating the Performance of Multi-tenant Elastic Extension Tables [abstract] Abstract: An important challenge in the design of databases that support multi-tenant applications is to provide a platform to manage large volumes of data collected from different businesses, social media networks, emails, news, online texts, documents, and other data sources. To overcome this challenge we proposed in our previous work a multi-tenant database schema called Elastic Extension Tables (EET) that combines multi-tenant relational tables and virtual relational tables in a single database schema. Using this approach, the tenants’ tables can be extended to support the requirements of individual tenants. In this paper, we discuss the potentials of using EET multi-tenant database schema, and show how it can be used for managing physical and virtual relational data. We perform several experiments to measure the feasibility and effectiveness of EET by comparing it with a commercially available multi-tenant schema mapping technique used by SalesForce.com. We report significant performance improvements obtained using EET when compared to Universal Table Schema Mapping (UTSM), making the EET schema a good candidate for the management of multi-tenant data in Software as a Service (SaaS) and Big Data applications.	Haitham Yaish, Madhu Goyal, George Feuerlicht

Main Track (MT) Session 13

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Tully I

Chair: I. Moser

344	Finite difference method for solving acoustic wave equation using locally adjustable time-steps [abstract] Abstract: Explicit finite difference method has been widely used for seismic modeling in heterogeneous media with strong discontinuities in physical properties. In such cases, due to stability considerations, the time step size is primarily determined by the medium with higher wave speed propagation, resulting that the higher the speed, the lower the time step needs to be to ensure stability throughout the whole domain. Therefore, the use of different temporal discretizations can greatly reduce the computational cost involved when solving this kind of problem. In this paper we propose an algorithm for the local temporal discretization setting named Region Triangular Transition (RTT), which allows the local temporal discretizations to be related by any integer value that enables these discretizations to operate at the stability limit of the finite difference approximations used.	Alexandre Antunes, Regina Leal-Toledo, Otton Filho, Elson Toledo
347	Identifying Self-Excited Vibrations with Evolutionary Computing [abstract] Abstract: This study uses Differential Evolution to identify the coefficients of second-order differential equations of self-excited vibrations from a time signal. The motivation is found in the ample occurrence of this vibration type in engineering and physics, in particular in the real-life problem of vibrations of hydraulic structure gates. In the proposed method, an equation structure is assumed at the level of the ordinary differential equation and a population of candidate coefficient vectors undergoes evolutionary training. In this way the numerical constants of non-linear terms of various self-excited vibration types were recovered from the time signal and the velocity value only at the initial time. Comparisons are given regarding accuracy and computing time. The presented evolutionary method shows good promise for future application in engineering systems, in particular operational early-warning systems that recognise oscillations with negative damping before they can cause damage.	Christiaan Erdbrink, Valeria Krzhizhanovskaya
85	Rendering of Feature-Rich Dynamically Changing Volumetric Datasets on GPU [abstract] Abstract: Interactive photo-realistic representation of dynamic liquid volumes is a challenging task for today's GPUs and state-of-the-art visualization algorithms. Methods of the last two decades consider either static volumetric datasets applying several optimizations for volume casting, or dynamic volumetric datasets with rough approximations to realistic rendering. Nevertheless, accurate real-time visualization of dynamic datasets is crucial in areas of scientific visualization as well as areas demanding for accurate rendering of feature-rich datasets. An accurate and thus realistic visualization of such datasets leads to new challenges: due to restrictions given by computational performance, the datasets may be relatively small compared to the screen resolution, and thus each voxel has to be rendered highly oversampled. With our volumetric datasets based on a real-time lattice Boltzmann fluid simulation creating dynamic cavities and small droplets, existing real-time implementations are not applicable for a realistic surface extraction. This work presents a volume tracing algorithm capable of producing multiple refractions which is also robust to small droplets and cavities. Furthermore we show advantages of our volume tracing algorithm compared to other implementations.	Martin Schreiber, Atanas Atanasov, Philipp Neumann, Hans-Joachim Bungartz
136	Motor learning in physical interfaces for computational problem solving [abstract] Abstract: Continuous Interactive Simulation (CIS) maps computational problems concerning the control of dynamical systems to physical tasks in a 3D virtual environment for users to perform. However, deciding on the best mapping for a particular problem is not straightforward. This paper considers how a motor learning perspective can assist when designing such mappings. To examine this issue an experiment was performed to compare an arbitrary mapping with one designed by considering a range of motor learning factors. The particular problem studied was a nonlinear policy setting problem from economics. The results show that choices about how a problem is presented can indeed have a large effect on the ability of users to solve the problem. As a result we recommend the development of guidelines for the application of CIS based on motor learning considerations.	Rohan McAdam
151	Change Detection and Visualization of Functional Brain Networks using EEG Data [abstract] Abstract: Mining dynamic and non-trivial patterns of interactions of functional brain network has gained significance due to the recent advances in the field of computational neuroscience. Sophisticated data search capabilities, advanced signal processing techniques, statistical methods, complex network and graph mining algorithms to unfold and discover hidden patterns in the functional brain network supported with efficient visualization techniques are essential for making potential inferences of the results obtained. Visualization of change in activity during cognitive function is useful to discover and get insights into the hidden, novel and complex neuronal patterns and trends during the normal and cognitive load conditions from the graph/temporal representation of the functional brain network. This paper explores novel methods to explore and model the dynamics and complexity of the brain. It also uses a new tool called Functional Brain Network Analysis and Visualization (FBNAV) tool to visualize the outcomes of various computational analyses to enable us to identify and study the changing neuronal patterns during various states of the brain activity using augmented/customised Topoplots and Headplots. These techniques may be helpful to locate and identify patterns in certain abnormal mental states resulting due to some mental disorders such as stress.	R Vijayalakshmi, Naga Dasari, Nanda Nandagopal, R Subhiksha, Bernadine Cocks, Nabaraj Dahal, M Thilaga

Main Track (MT) Session 14

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Tully I

Chair: Jin Chao Jin

207	Visual Analytics of Topological Higher Order Information for Emergency Management based on Tourism Trajectory Datasets [abstract] Abstract: Trajectory datasets have presented new opportunities for spatial computing applications and geo-informatics technologies with regard to emergency management. Existing research of trajectory analysis and data mining mainly employs algorithmic approaches and analyzing geometric information of trajectories. This study presents an efficient analytics tool based on visualization approaches for analyzing large volume of trajectory datasets. This approach is particular useful for emergency management when critical decisions based on semantic information are needed. Tourism trajectory datasets are used to demonstrate the proposed approach.	Ye Wang, Kyungmi Lee, Ickjai Lee
238	Modulight : A Framework for Efficient Dynamic Interactive Scientific Visualization [abstract] Abstract: The interactive scientific visualization applications are based on heterogeneous codes to implement simulation or data processing, visualization and interaction parts. These different parts need to be precisely assemble to construct an efficient application running in interactive time. Component-based approach is a good paradigm to express this kind of applications. The interactive scientific visualization domain is now classically extended with visual analysis applications. In this case, some parts of the application need to be added or removed dynamically during its execution. In this paper, we describe a component-based approach dedicated to dynamic interac- tive scientific visualization applications. We propose a framework called Modulight which implements our approach using the MPI2 library and the optimized socket library ØMQ. The performance of this framework is also analyzed from a real-life application of molecular dynamics.	Sébastien Limet, Millian Poquet, Sophie Robert
289	Visualization of long-duration acoustic recordings of the environment [abstract] Abstract: Acoustic recordings of the environment are an important aid to ecologists monitoring biodiversity and environmental health. However, rapid advances in recording technology, storage and computing make it possible to accumulate thousands of hours of recordings, of which, ecologists can only listen to a small fraction. The big-data challenge addressed in this paper is to visualize the content of long-duration audio recordings on multiple scales, from hours, days, months to years. The visualization should facilitate navigation and yield ecologically meaningful information. Our approach is to extract (at one minute resolution) acoustic indices which reflect content of ecological interest. An acoustic index is a statistic that summarizes some aspect of the distribution of acoustic energy in a recording. We combine indices to produce false-color images that reveal acoustic content and facilitate navigation through recordings that are months or even years in duration.	Michael Towsey, Liang Zhang, Mark Cottman-Fields, Jason Wimmer, Jinglan Zhang, Paul Roe
362	A computational science agenda for programming language research [abstract] Abstract: Scientific models are often expressed as large and complicated programs. These programs embody numerous assumptions made by the developer (e.g. for differential equations, the discretization strategy and resolution). The complexity and pervasiveness of these assumptions means that often the only true description of the model is the software itself. This has led various researchers to call for scientists to publish their source code along with their papers. We argue that this is unlikely to be beneficial since it is almost impossible to separate implementation assumptions from the original scientific intent. Instead we advocate higher-level abstractions in programming languages, coupled with lightweight verification techniques such as specification and type systems. In this position paper, we suggest several novel techniques and outline an evolutionary approach to applying these to existing and future models. One-dimensional heat flow is used as an example throughout.	Dominic Orchard, Andrew Rice