ICCS 2017 Main Track (MT) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG F 30

Chair: Youssef Nashed

525 Analysis of Computational Science Papers from ICCS 2001-2016 using Topic Modeling and Graph Theory [abstract]
Abstract: This paper presents results of topic modeling and networks of topics using the ICCS corpus, which contains domain specific(computational science) papers over sixteen years (5695 papers). We discuss topical structures of ICCS, how these topics evolve over time in response to topicality of various problems, technologies and methods, and how these topics relate to one another. This analysis illustrates multidisciplinary research and collaborations among scientific communities, by constructing static and dynamic networks of the topic modeling results and the authors’ keywords. The results of this study will help ICCS organizers to identify the past and future trends of core talking topics, and to organize workshops based on communities of topics which in return will satisfy the interests of participants by allowing them to attend the workshop which is directly related to their domain area. We used Non-negative Matrix Factorization(NMF) topic modeling algorithm to discover topics and labeled and grouped the results hierarchically. We used Gephi to study static networks of topics, and R library called DyA to analyze dynamic networks of topics.
Tesfamariam Abuhay, Sergey Kovalchuk, Klavdiya Bochenina, George Kampis, Valeria Krzhizhanovskaya and Michael Lees
43 Identifying Urban Inconsistencies via Street Networks [abstract]
Abstract: Street networks, comprised by its topology and geometry, can be used in problems related to ill-designed urban structures. Several works have focused on such application. Nevertheless, they lack a clear methodology to characterize and explain the urban space through a complex network. Aided by topo-geometrical measures from georeferenced networks, we present a methodology to identify what we call urban inconsistencies, which are characterized by low-access regions containing nodes that lack efficient access from or to other regions in a city. To this end, we devised algorithms capable of preprocessing and analyzing street networks, pointing to existing mobility problems in a city. Mainly, we identify inconsistencies that pertain to a given node where a facility of interest is currently placed. Our results introduce ways to assist in the urban planning and design processes. The proposed techniques are discussed through the visualization and analysis of a real-world city. Hence, our contributions provide a basis for further advancements on street networks applied to facilities location analysis.
Gabriel Spadon, Gabriel Gimenes and Jose Rodrigues-Jr
120 Impact of Neighbors on the Privacy of Individuals in Online Social Networks [abstract]
Abstract: The problem of user privacy enforcement in online social networks (OSN) cannot be ignored and, in recent years, Facebook and other providers have improved considerably their privacy protection tools. However, in OSN's the most powerful data protection "weapons" are the users themselves. The behavior of an individual acting in an OSN highly depends on her level of privacy attitude: an aware user tends not to share her private information, or the private information of her friends, while an unaware user could not recognize some information as private, and could share it without care to her contacts. In this paper, we experimentally study the role of the attitude on privacy of an individual and her friends on information propagation in social networks. We model information diffusion by means of an extension of the Susceptible-Infectious-Recovered (SIR) epidemic model that takes into account the privacy attitude of users. We employ this diffusion model in stochastic simulations on a synthetic social network, designed for miming the characteristics of the Facebook social graph.
Livio Bioglio and Ruggero G. Pensa
230 Mining Host Behavior Patterns From Massive Network and Security Logs [abstract]
Abstract: Mining host behavior patterns from massive logs plays an important and crucial role in anomalies diagnosing and management for large-scale networks. Almost all prior work gives a macroscopic link analysis of network events, but fails to microscopically analyze the evolution of behavior patterns for each host in networks. In this paper, we propose a novel approach, namely Log Mining for Behavior Pattern (LogM4BP), to address the limitations of prior work. LogM4BP builds a statistical model that captures each host's network behavior patterns with the nonnegative matrix factorization algorithm, and finally improve the interpretation and comparability of behavior patterns, and reduce the complexity of analysis. The work is evaluated on a public data set captured from a big marketing company. Experimental results show that it can describe network behavior patterns clearly and accurately, and the significant evolution of behavior patterns can be mapped to anomaly events in real world intuitively.
Jing Ya, Tingwen Liu, Quangang Li, Jinqiao Shi, Haoliang Zhang, Pin Lv and Li Guo

ICCS 2017 Main Track (MT) Session 2

Time and Date: 15:45 - 17:25 on 12th June 2017

Room: HG F 30

Chair: Anna-Lena Lamprecht

341 Resolving Entity Morphs based on Character-Word Embedding [abstract]
Abstract: Morph is a special type of fake alternative names. Internet users use morphs to achieve certain goals such as expressing special sentiment or avoiding censorship. For example, Chinese internet users often replace “马景涛” (Ma Jingtao) with “咆哮教主”(Roar Bishop). “咆哮教主”(Roar Bishop) is a morph and “马景涛” (Ma Jingtao) is the target entity of “咆哮教主”(Roar Bishop). This paper mainly focuses on morph resolution: given a morph, figure out the entity that it really refers to. After analysis the common characteristic of morphs and target entities from cross-source corpora, we exploit temporal and semantic constraints to collect target candidates. Next, we propose a character-word embeddings framework to rank target candidates. Our method does not need any human-annotated data. Experimental results demonstrate that our approach outperforms the state-of-the-art method. The results also show that the performance is better when morphs share any character with target entities.
Ying Sha, Zhenhui Shi, Rui Li, Qi Liang, Bin Wang and Li Guo
273 Graph Ranking Guarantees for Numerical Approximations to Katz Centrality [abstract]
Abstract: Graphs and networks are prevalent in modeling relational datasets from many fields of research. Using iterative solvers to approximate graph measures (specifically Katz Centrality) allows us to obtain a ranking vector, consisting of a number for each vertex in the graph identifying its relative importance. We use the residual to accurately estimate how much of the ranking from an approximate solution matches the ranking given by the exact solution. Using probabilistic matrix norms and applying numerical analysis to the computation of Katz Centrality, we obtain bounds on the accuracy of the approximation compared to the exact solution with respect to the highly ranked nodes. This relates the numerical accuracy of the linear solver to the data analysis accuracy of finding the correct ranking. In particular, we answer the question of which pairwise rankings are reliable given an approximate solution to the linear system. Experiments on many real-world networks up to several million vertices and several hundred million edges validate our theory and show that we are able to accurately estimate large portions of the approximation. By analyzing convergence error, we develop confidence in the ranking schemes of data mining.
Eisha Nathan, Geoffrey Sanders, James Fairbanks, Van Henson and David Bader
447 Simulating a Search Engine Service focusing on Network Performance [abstract]
Abstract: Large-scale computer systems like Search Engines provide services to thousands of users, and their user demand can change suddenly. This unstable demand impacts sensitively to the service components (like network and hosts). The system should be able to address unexpected scenarios; otherwise, users would be forced to leave the service. Creating tests scenarios is an alternative to deal with this variable workload before implementing new configuration in the system. However, the complexity and size of the system are a huge constraint to create physical models. Simulation can help to test promising models of search engines. In this paper we propose a method to model a Search Engine Service (SES) on small scale to analyze the impact of different configurations. We model the interaction of a typical search engine with three main components: a Front Service (FS), a Cache Service (CS) and an Index service (IS). The FS takes as input a query of user and search into a database with the support of a CS to improve the performance of the system. The proposed model processes a trace file from a real SES and based on the dependency relation among the messages, services and queries, it is modeled the full functionality of the SES. The output is, on one hand a simulated trace file to compare the model with the real system and on the other hand statistics about performance. The simulation allow us to test configurations of FS, CS, and IS, which can be unlikely in the real system.
Joe Carrión, Daniel Franco and Emilo Luque
245 Fully-Dynamic Graph Algorithms with Sublinear Time Inspired by Distributed Computing [abstract]
Abstract: We study dynamic graphs in the fully-dynamic centralized setting. In this setting the vertex set of size n of a graph G is fixed, and the edge set changes step-by-step, such that each step either adds or removes an edge. The goal in this setting is maintaining a solution to a certain problem (e.g., maximal matching, edge coloring) after each step, such that each step is executed efficiently. The running time of a step is called update-time. One can think of this setting as a dynamic network that is monitored by a central processor that is responsible for maintaining the solution. Currently, for several central problems, the best-known deterministic algorithms for general graphs are the naive ones which have update-time O(n). This is the case for maximal matching and proper O(Delta)-edge-coloring. The question of existence of sublinear in n update-time deterministic algorithms for dense graphs and general graphs remained wide open. In this paper we address this question by devising sublinear update-time deterministic algorithms for maximal matching in graphs with bounded neighborhood independence o(n/ log^2 n), and for proper O(Delta)-edge-coloring in general graphs. The family of graphs with bounded neighborhood independence is a very wide family of dense graphs. In particular, graphs with constant neighborhood independence include line-graphs, claw-free graphs, unit disk graphs, and many other graphs. Thus, these graphs represent very well various types of networks. For graphs with constant neighborhood independence, our maximal matching algorithm has ~O(\sqrt n) update-time. Our O(Delta)-edge-coloring algorithms has ~O(\sqrt Delta) update-time for general graphs. In order to obtain our results we employ a novel approach that adapts certain distributed algorithms of the LOCAL setting to the centralized fully-dynamic setting. This is achieved by optimizing the work each processors performs, and efficiently simulating a distributed algorithm in a centralized setting. The simulation is efficient thanks to a careful selection of the network parts that the algorithm is invoked on, and by deducing the solution from the additional information that is present in the centralized setting, but not in the distributed one. Our experiments on various network topologies and scenarios demonstrate that our algorithms are highly-efficient in practice. We believe that our approach is of independent interest and may be applicable to additional problems.
Leonid Barenboim and Tzalik Maimon

ICCS 2017 Main Track (MT) Session 3

Time and Date: 10:15 - 11:55 on 13th June 2017

Room: HG F 30

Chair: Witold Dzwinel

16 Models of pedestrian adaptive behaviour in hot outdoor public spaces [abstract]
Abstract: Current studies of outdoor thermal comfort are limited to calculating thermal indices or interviewing people. The first method does not take into account the way people use this space, whereas the second one is limited to one particular study area. Simulating people’s thermal perception along with their activities in public urban spaces will help architects and city planners to test their concepts and to design smarter and more liveable cities. In this paper, we propose an agent-based modelling approach to simulate people’s adaptive behaviour in space. Two levels of pedestrian behaviour are considered: reactive and proactive, and three types of thermal adaptive behaviour of pedestrians are modelled with single-agent scenarios: speed adaptation, thermal attraction/repulsion and vision-motivated route alternation. An "accumulated heat stress" parameter of the agent is calculated during the simulation, and pedestrian behaviour is analysed in terms of its ability to reduce the accumulated heat stress. This work is the first step towards the "human component" in urban microclimate simulation systems. We use these simulations to drive the design of real-life experiments, which will help calibrating model parameters, such as the heat-speed response, thermal sensitivity and admissible turning angles.
Valentin Melnikov, Valeria Krzhizhanovskaya and Peter Sloot
47 Agent-based Simulations of Swarm Response to Predator’s Attack [abstract]
Abstract: Animal groups provide paradigmatic examples of collective phenomena in which repeated interactions among individuals produce dynamic patterns and responses on a scale larger than individuals themselves. For instance, many swarming behaviors yield protective strategies for a groups undergoing a predator’s attack. The effectiveness of these evasive maneuvers is striking given: (i) the decentralized nature of such responses, (ii) the short time scales involved, and (iii) the competitive biological and physiological advantages of predators—e.g., in terms of size, speed, sensory capabilities—as compared to fleeing agents. Here, we report on results of agent-based simulations of collective anti-predatory response. Our prime goal is to gain insight into the nontrivial effect of sociality—a measure of the amount of social interaction—on the effectiveness of the collective response. Specifically, we characterize the responsiveness of the swarm by simulating a predator attack and measuring the survival rate of agents depending on their level of sociality for different interaction rules, based on either a metric or a topological interaction distance. Furthermore, evolutionary pressure selects strategies optimal for the individual and not necessarily for the group. This possibility has been explored by running evolutionary simulations. Interestingly, the results obtained clearly show the existence of an optimal anti-predatory response for a given amount of sociality, regardless of the interaction distance considered. The results of the evolutionary dynamics highlights the fact that the evolution of the distribution of sociality caused by the selective pressure of a predator’s attack has a phenomenology that cannot be derived from the short-time predator avoidance results.
Roland Bouffanais
334 Crowd Dynamics and Control in High-Volume Metro Rail Stations [abstract]
Abstract: Overcrowding in mass rapid transit stations is a chronic issue affecting daily commute in Metro Manila, Philippines. As a high-capacity public transportation, the Metro Rail Transit has been operating at a level above its intended capacity of 350,000 passengers daily. Despite numerous efforts in implementing an effective crowd control scheme, it still falls short in containing the formation of crowds and long lines, thus affecting the amount of time before they can proceed to the platforms. A crowd dynamics model of commuters in one of the high-volume terminal stations, the Taft Ave station, was developed to help discover emergent behavior in crowd formation and assess infrastructure preparedness. The agent-based model uses static floor fields derived from the MRT3 live feed, and implements a number of social force models to optimize the path-finding of the commuter agents. Internal face validation, historical validation and parameter variability-sensitivity analysis were employed to validate the crowd dynamics model and assess different operational scenarios. It was determined that during peak hours, when the expected crowd inflow may reach up to 7,500 commuters, at least 11 ticket booths and 6 turnstiles should be open to have low turnaround times of commuters. For non-peak hours, at least 10 ticket booths and 5 turnstiles are needed to handle a crowd inflow reaching up to 5,000 commuters. In the current set-up, the usual number of ticket booths open in the MRT Taft Station is 11, and there are usually 6 turnstiles open. It was observed that as the crowd inside the station increases to 200-250 commuters, there is a significant increase in the increase rate of the turnaround times of the commuters, which signifies the point at which the service provided starts to degrade and when officials should start to intervene.
Briane Paul Samson, Crisanto Iv Aldanese, Deanne Moree Chan, Jona Joyce San Pascual and Ma. Victoria Angelica Sido
348 A Serious Video Game To Support Decision Making On Refugee Aid Deployment Policy [abstract]
Abstract: The success of refugee support operations depends on the ability of humanitarian organizations and governments to deploy aid eectively. These operations require that decisions on resource allocation are made as quickly as possible in order to respond to urgent crises and, by antici- pating future developments, remain adequate as the situation evolves. Agent-based modeling and simulation has been used to understand the progression of past refugee crises, as well as a way to predict how new ones will unfold. In this work, we tackle the problem of refugee aid deployment as a variant of the Robust Facility Location Problem (RFLP). We present a serious video game that functions as an interface for an agent-based simulation run with data from past refugee crises. Having obtained good approximate solutions to the RFLP by implementing a game that frames the problem as a puzzle, we adapted its mechanics and interface to correspond to refugee situations. The game is intended to be played by both subject matter experts and the general public, as a way to crowd-source eective courses of action in these situations.
Luis Eduardo Perez Estrada, Derek Groen and Jose Emmanuel Ramirez-Marquez
510 The study of the influence of obstacles on crowd dynamics [abstract]
Abstract: This paper presents the research on the influence of obstacles on crowd dynamics. We have performed experiments for four base scenarios of interaction in crowd: unidirectional flow, bidirectional flow, merging flows and intersection. Movement of pedestrians has been studied in simple shape areas: straight corridor, T-junction and intersection. The volumes and basic directions of pedestrian flows were determined for each of the areas. Layout of physical obstacles has been built from different combinations of columns and barriers. In order to acquire characteristics of the crowd dynamics a set of simulations was conducted using PULSE simulation environment. In the result, we have managed to obtain several dependences between layout of obstacles and crowd dynamics were obtained.
Oksana Severiukhina, Daniil Voloshin, Michael Lees and Vladislav Karbovskii

ICCS 2017 Main Track (MT) Session 4

Time and Date: 14:10 - 15:50 on 13th June 2017

Room: HG F 30

Chair: Emilio Luque

2 Anomaly Detection in Clinical Data of Patients Undergoing Heart Surgery [abstract]
Abstract: We describe two approaches to detecting anomalies in time series of multi-parameter clinical data: (1) metric and model-based indicators and (2) information surprise. (1) Metric and model-based indicators are commonly used as early warning signals to detect transitions between alternate states based on individual time series. Here we explore the applicability of existing indicators to distinguish critical (anomalies) from non-critical conditions in patients undergoing cardiac surgery, based on a small anonymized clinical trial dataset. We find that a combination of time-varying autoregressive model, kurtosis, and skewness indicators correctly distinguished critical from non-critical patients in 5 out of 36 blood parameters at a window size of 0.3 (average of 37 hours) or higher. (2) Information surprise quantifies how the progression of one patient's condition differs from that of rest of the population based on the cross-section of time series. With the maximum surprise and slope features we detect all critical patients at the 0.05 significance level. Moreover we show that a naive outlier detection does not work, demonstrating the need for the more sophisticated approaches explored here. Our preliminary results suggest that future developments in early warning systems for patient condition monitoring may predict the onset of critical transition and allow medical intervention preventing patient death. Further method development is needed to avoid overfitting and spurious results, and verification on large clinical datasets.
Alva Presbitero, Rick Quax, Valeria Krzhizhanovskaya and Peter Sloot
453 Virtual Clinical Trials: A tool for the Study of Transmission of Nosocomial Infections [abstract]
Abstract: A clinical trial is a study designed to demonstrate the ecacy and safety of a drug, procedure, medical device, or diagnostic test. Since clinical trials involve research in humans, they must be carefully designed and must comply strictly with a set of ethical conditions. Logistical disadvantages, ethical constraints, costs and high execution times could have a negative impact on the execution of the clinical trial. This article proposes the use of a simulation tool, the MRSA-T-Simulator, to design and perform "virtual clinical trials" for the purpose of studying MRSA contact transmission among hospitalized patients. The main advantage of the simulator is its flexibility when it comes to configuring the patient population, healthcare staff and the simulation environment.
Cecilia Jaramillo Jaramillo, Dolores Rexachs Del Rosario, Emilio Luque Fadón and Francisco Epelde
543 Spectral Modes of Network Dynamics Reveal Increased Informational Complexity Near Criticality [abstract]
Abstract: What does the informational complexity of dynamical networked systems tell us about intrinsic mechanisms and functions of these complex systems? Recent complexity measures such as integrated information have sought to operationalize this problem taking a whole-versus-parts perspective, wherein one explicitly computes the amount of information generated by a network as a whole over and above that generated by the sum of its parts during state transitions. While several numerical schemes for estimating network integrated information exist, it is instructive to pursue an analytic approach that computes integrated information as a function of network weights. Our formulation of integrated information uses a Kullback-Leibler divergence between the multi-variate distribution on the set of network states versus the corresponding factorized distribution over its parts. Implementing stochastic Gaussian dynamics, we perform computations for several prototypical network topologies. Our findings show increased informational complexity near criticality, which remains consistent across network topologies. Spectral decomposition of the system's dynamics reveals how informational complexity is governed by eigenmodes of both, the network's covariance and adjacency matrices. We find that as the dynamics of the system approach criticality, high integrated information is exclusively driven by the eigenmode corresponding to the leading eigenvalue of the covariance matrix, while sub-leading modes get suppressed. The implication of this result is that it might be favorable for complex dynamical networked systems such as the human brain or communication systems to operate near criticality so that efficient information integration might be achieved.
Xerxes Arsiwalla, Pedro Mediano and Paul Verschure
537 Simulation of regulatory strategies in a morphogen based model of Arabidopsis leaf growth. [abstract]
Abstract: Simulation has become an important tool for studying plant physiology. An important aspect of this is discovering the processes that influence leaf growth at a cellular level. To this end, we have extended an existing, morphogen-based model for the growth of Arabidopsis leaves. We have fitted parameters to match important leaf growth properties reported in experimental data. A sensitivity analysis was performed, which allowed us to estimate the effect of these different parameters on leaf growth, and identify viable strategies for increasing leaf size.
Elise Kuylen, Gerrit Beemster, Jan Broeckhove and Dirk De Vos

ICCS 2017 Main Track (MT) Session 5

Time and Date: 16:20 - 18:00 on 13th June 2017

Room: HG F 30

Chair: Eleni Chatzi

267 Support managing population aging stress of emergency departments in a computational way [abstract]
Abstract: Old people usually have more complex health problems and use healthcare services more frequently than young people. It is obvious that the increasing old people both in number and proportion will challenge the emergency departments (ED). This paper firstly presents a way to quantitatively predict and explain this challenge by using simulation techniques. Then, we outline the capability of simulation for decision support to overcome this challenge. Specifically, we use simulation to predict and explain the impact of population aging over an ED. In which, a precise ED simulator which has been validated for a public hospital ED will be used to predict the behavior of an ED under population aging in the next 15 years. Our prediction shows that the stress of population aging to EDs can no longer be ignored and ED upgrade must be carefully planned. Based on this prediction, the cost and benefits of several upgrade proposals are evaluated.
Zhengchun Liu, Dolores Rexachs, Francisco Epelde and Emilio Luque
146 Hemocell: a high-performance microscopic cellular library [abstract]
Abstract: We present a high-performance computational framework (Hemocell) with validated cell-material models, which provides the necessary tool to target challenging biophysical questions in relation to blood flows, e.g. the influence of transport characteristics on platelet bonding and aggregation. The dynamics of blood plasma are resolved by using the lattice Boltzmann method (LBM), while the cellular membranes are implemented using a discrete element method (DEM) coupled to the fluid as immersed boundary method (IBM) surfaces. In the current work a selected set of viable technical solutions are introduced and discussed, whose application translates to significant performance benefits. These solutions extend the applicability of our framework to up to two orders of magnitude larger, physiologically relevant settings.
Gábor Závodszky, Britt van Rooij, Victor Azizi, Saad Alowayyed and Alfons Hoekstra
275 Brownian dynamics simulations to explore experimental microsphere diffusion with optical tweezers. [abstract]
Abstract: We develop two-dimensional Brownian dynamics simulations to examine the motion of disks under thermal fluctuations and Hookean forces. Our simulations are designed to be experimental-like, since the experimental conditions define the available time-scales which characterize the solution of Langevin equations. To define the fluid model and methodology, we explain the basics of the theory of Brownian motion applicable to quasi-twodimensional diffusion of optically-trapped microspheres. Using the data produced by the simulations, we propose an alternative methodology to calculate diffusion coefficients. We obtain that, using typical input parameters in video-microscopy experiments, the averaged values of the diffusion coefficient differ from the theoretical one less than a 1%.
Manuel Pancorbo, Miguel Ángel Rubio and Pablo Domínguez-García
377 Numerical simulation of a compound capsule in a constricted microchannel [abstract]
Abstract: Simulations of the passage of eukaryotic cells through a constricted channel aid in studying the properties of cancer cells and their transport through the bloodstream. Compound capsules, which explicitly model the outer cell membrane and nuclear lamina, have the potential to improve fidelity of computational models. However, general simulations of compound capsules through a constricted microchannel have not been conducted and the influence of the compound capsule model on computational performance is not well known. In this study, we extend a parallel hemodynamics application to simulate the fluid-structure interaction between compound capsules and fluid. With this framework, we compare the deformation of simple and compound capsules in constricted microchannels, and explore how this deformation depends on the capillary number and on the volume fraction of the inner membrane. The parallel performance of the computational framework in this setting is evaluated and lessons for future development are discussed.
John Gounley, Erik Draeger and Amanda Randles

ICCS 2017 Main Track (MT) Session 6

Time and Date: 9:00 - 10:40 on 14th June 2017

Room: HG F 30

Chair: Anna-Lena Lamprecht

106 Development of a new urban heat island modeling tool: Kent Vale case study [abstract]
Abstract: Urban heat island is intensified by anthropogenic activities and heat in conjunction with the built-up urban area, which absorbs more solar radiation during daytime and releases more heat during nighttime than rural areas. Air cooling systems in Singapore, as one of the anthropogenic heat sources, reject heat into the vicinity and consequently affect urban microclimate. In this paper, a new urban heat island modeling tool is developed to simulate stack effect of split type air-conditioners on high rise buildings and solar radiation induced thermal environment. By coupling the Computational Fluid Dynamics (CFD) program with the solar radiation model and perform parallel computing of conjugate heat transfer, the tool ensures both accuracy and efficiency in simulating air temperature and air relative humidity. The annual cycle of sun pathway in Singapore is well simulated and by decreasing the absorptivity or increasing the reflectivity and thermal conductivity of the buildings, the thermal environment around buildings could be improved.
Ming Xu, Marcel Bruelisauer and Matthias Berger
558 Fast Motion of Heaving Airfoils [abstract]
Abstract: Heaving airfoils can provide invaluable physical insight regarding the flapping flight of birds and insects. We examine the thrust-generation mechanism of oscillating foils, by coupling two-dimensional simulations with multi-objective optimization algorithms. We show that the majority of the thrust originates from the creation of low pressure regions near the leading edge of the airfoil. We optimize the motion of symmetric airfoils exploiting the Knoller-Betz-Katzmayr effect, to attain high speed and lower energy expenditure. The results of the optimization indicate an inverse correlation between energy-efficiency, and the heaving-frequency and amplitude for a purely-heaving airfoil.
Siddhartha Verma, Guido Novati, Flavio Noca and Petros Koumoutsakos
312 Using Temporary Explicit Meshes for Direct Flux Calculation on Implicit Surfaces [abstract]
Abstract: We focus on a surface evolution problem where the surface is represented as a narrow-band level-set and the local surface speed is defined by a relation to the direct visibility of a source plane above the surface. A level-set representation of the surface can handle complex evolutions robustly and is therefore a frequently encountered choice. Ray tracing is used to compute the visibility of the source plane for each surface point. Commonly, rays are traced directly through the level-set and the already available (hierarchical) volume data structure is used to efficiently perform intersection tests. We present an approach that performs ray tracing on a temporarily generated explicit surface mesh utilizing modern hardware-tailored single precision ray tracing frameworks. We show that the overhead of mesh extraction and acceleration structure generation is compensated by the intersection performance for practical resolutions leading to an at least three times faster visibility calculation. We reveal the applicability of single precision ray tracing by attesting a sufficient angular resolution in conjunction with an integration method based on an up to twelve times subdivided icosahedron.
Paul Manstetten, Josef Weinbub, Andreas Hössinger and Siegfried Selberherr
94 Assessing the Performance of the SRR Loop Scheduler [abstract]
Abstract: The input workload of an irregular application must be evenly distributed among its threads to enable cutting-edge performance. To address this need in OpenMP, several loop scheduling strategies were proposed. While having this ever-increasing number of strategies at disposal is helpful, it has become a non-trivial task to select the best one for a particular application. Nevertheless, this challenge becomes easier to be tackled when existing scheduling strategies are extensively evaluated. Therefore, in this paper, we present a performance and scalability evaluation of the recently-proposed loop scheduling strategy named Smart Round-Robin (SRR). To deliver a comprehensive analysis, we coupled a synthetic kernel benchmarking technique with several rigorous statistical tools, and considered OpenMP's Static and Dynamic loop schedulers as our baselines. Our results unveiled that SRR performs better on irregular applications with symmetric workloads and coarse-grained parallelization, achieving up to 1.9x and 1.5x speedup over OpenMP's Static and Dynamic schedulers, respectively.
Pedro Henrique Penna, Eduardo Camilo Inacio, Márcio Castro, Patrícia Plentz, Henrique Freitas, François Broquedis and Jean-François Méhaut
548 Molecular dynamics simulations of entangled polymers: The effect of small molecules on the glass transition temperature [abstract]
Abstract: Effect of small molecules, as they penetrate into a polymer system, is investigated via molecular dynamics simulations. It is found that small spherical particles reduce the glass transition temperature and thus introduce a softening of the material. Results are compared to experimental findings for the effect of different types of small molecules such as water, acetone and ethanol on the glass transition temperature of a polyurethane-based shape memory polymer. Despite the simplicity of the simulated model, MD results are found to be in good qualitative agreement with experimental data.
Elias Mahmoudinezhad, Axel Marquardt, Gunther Eggeler and Fathollah Varnik

ICCS 2017 Main Track (MT) Session 7

Time and Date: 13:25 - 15:05 on 14th June 2017

Room: HG F 30

Chair: Ming Xu

424 Efficient Simulation of Financial Stress Testing Scenarios with Suppes-Bayes Causal Networks [abstract]
Abstract: The most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk management which exploits statistical correlations among financial instruments, here we focus our analysis on the notion of probabilistic causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs), SBCNs are probabilistic graphical models that have many attractive features in terms of more accurate causal analysis for generating financial stress scenarios. In this paper, we present a novel approach for conducting stress testing of financial portfolios based on SBCNs in combination with classical machine learning classification tools. The resulting method is shown to be capable of correctly discovering the causal relationships among financial factors that affect the portfolios and thus, simulating stress testing scenarios with a higher accuracy and lower computational complexity than conventional Monte Carlo Simulations.
Gelin Gao, Bud Mishra and Daniele Ramazzotti
531 Simultaneous Prediction of Wind Speed and Direction by Evolutionary Fuzzy Rule Forest [abstract]
Abstract: An accurate estimate of wind speed and direction is important for many application domains including weather prediction, smart grids, and e.g. traffic management. These two environmental variables depend on a number of factors and are linked together. Evolutionary fuzzy rules, based on fuzzy information retrieval and genetic programming, have been used to solve a variety of real-world regression and classification tasks. They were, however, limited by the ability to estimate only one variable by a single model. In this work, we introduce an extended version of this predictor that facilitates an artificial evolution of forests of fuzzy rules. In this way, multiple variables can be predicted by a single model that is able to comprehend complex relations between input and output variables. The usefulness of the proposed concept is demonstrated by the evolution of forests of fuzzy rules for simultaneous wind speed and direction prediction.
Pavel Kromer and Jan Platos
557 Performance Improvement of Stencil Computations for Multi-core Architectures based on Machine Learning [abstract]
Abstract: Stencil computations are basis to solve many problems related to Partial Differential Equations (PDEs). Obtaining the best performance with such numerical kernels is a major issue as many critical parameters (architectural features, compiler flags, memory policies, multithreading strategies) must be finely tuned. In this context, auto-tuning methods have been extensively used last few years to improve the overall performance. However, the complexity of current architectures and the large number of optimizations to consider reduce the efficiency of this approach. This paper focuses on the use of Machine Learning to predict the performance of PDEs on multicore architectures. Low-level hardware counters (e.g. cache-misses and TLB misses) on a limited number of executions are used to build our predictive model. We have considered two different kernels (7-point Jacobi and seismic equation) to demonstrate the effectiveness of our approach. Our results show that the performance can be predicted and the best input configuration for stencil problems can be obtained by simulations of hardware counters and performance measurements.
Victor Martinez, Fabrice Dupros, Márcio Castro and Philippe Navaux
321 Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster [abstract]
Abstract: Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied.
Víctor Campos, Francesc Sastre, Maurici Yagües, Míriam Bellver, Xavier Giró-I-Nieto and Jordi Torres

ICCS 2017 Main Track (MT) Session 8

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 1.1

Chair: Xing Cai

370 Semi-Supervised Clustering Algorithms for Grouping Scientific Articles [abstract]
Abstract: Creating sessions in scientific conferences consists in grouping papers with common topics taking into account the size restrictions imposed by the conference schedule. Therefore, this problem can be considered as semi-supervised clustering of documents based on their content. This paper aims to propose modifications in traditional clustering algorithms to incorporate size constraints in each cluster. Specifically, two new algorithms are proposed to semi-supervised clustering, based on: binary integer linear programming with cannot-link constraints and a variation of the K-Medoids algorithm, respectively. The applicability of the proposed semi-supervised clustering methods is illustrated by addressing the problem of automatic configuration of conference schedules by clustering articles by similarity. We include experiments, applying the new techniques, over real conferences datasets: ICMLA-2014, AAAI-2013 and AAAI-2014. The results of these experiments show that the new methods are able to solve practical and real problems.
Diego Vallejo, Paulina Morillo and Cesar Ferri
263 Parallel Learning Portfolio-based solvers [abstract]
Abstract: Exploiting multi-core architectures is a way to tackle the CPU time consumption when solving SATisfiability (SAT) problems. Portfolio is one of the main techniques that implements this principle. It consists in making several solvers competing, on the same problem, and the winner will be the first that answers. In this work, we improved this technique by using a learning schema, namely the Exploration- Exploitation using Exponential weight (EXP3), that allows smart resource allocations. Our contribution is adapted to situations where we have to solve a bench of SAT instances issued from one or several sequence of problems. Our experiments show that our approach achieves good results.
Tarek Menouer and Souheib Baarir
298 Learning Entity and Relation Embeddings for Knowledge Resolution [abstract]
Abstract: Knowledge resolution is the task of clustering knowledge mentions, e.g., entity and relation mentions into several disjoint groups with each group representing a unique entity or relation. Such resolution is a central step in constructing high-quality knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a semantic dictionary or a knowledge graph. This may lead to poor performance on knowledge mentions with poor or not well-known contexts. In addition, it is also limited by the coverage of the semantic dictionary or knowledge graph. In this work, we propose ETransR, a method which automatically learns entity and relation feature representations in continuous vector spaces, in order to measure the semantic relatedness of knowledge mentions for knowledge resolution. Experimental results on two benchmark datasets show that our proposed method delivers significant improvements compared with the state-of-the-art baselines on the task of knowledge resolution.
Hailun Lin
12 3D High-quality Textile Reconstruction with Synthesized Texture [abstract]
Abstract: 3D textile model plays an important role in textile engineering. However, not much work focus on high-quality 3D textile reconstruction. The texture is also limited by photography methods in 3D scanning. This paper presents a novel framework of reconstructing a high-quality 3D textile model with a synthesized texture. Firstly, a pipeline of 3D textile processing is proposed to obtain a better 3D model based on KinectFusion. Then, convolutional neural networks (CNN) is used to synthesize a new texture. To our best knowledge, this is the first paper combining 3D textile reconstruction and texture synthesis. Experimental results show that our method can conveniently obtain high-quality 3D textile models and realistic textures.
Pengpeng Hu, Taku Komura, Duan Li, Ge Wu and Yueqi Zhong
255 A Proactive Cloud Scaling Model Based on Fuzzy Time Series and SLA Awareness [abstract]
Abstract: Cloud computing has emerged as an optimal option for almost all computational problems today. Using cloud services, customers and providers come to terms of usage conditions defined in Service Agreement Layer (SLA), which specifies acceptable Quality of Service (QoS) metric levels. From the view of cloud-based software developers, their application-level SLA must be mapped to provided virtual resource-level SLA. Hence, one of the important challenges in clouds today is to improve QoS of computing resources. In this direction, there are many studies dealing with the problem by bringing forward prediction consumption models. However, the SLA violation evaluation for these prediction models still has been received less attentions. In this paper, we focus on developing a comprehensive autoscaling solution for clouds based on forecasting resource consumption in advance and validating prediction-based scaling decisions. Our prediction model takes all advantages of fuzzy approach, genetic algorithm and neural network to process historical monitoring time series data. After that the scaling decisions are validated and adapted through evaluating SLA violations. Our solution is tested on real workload data generated from Google data center. The achieved results show significant efficiency and feasibility of our model.
Dang Tran, Nhuan Tran, Giang Nguyen and Binh Minh Nguyen

ICCS 2017 Main Track (MT) Session 9

Time and Date: 15:45 - 17:25 on 12th June 2017

Room: HG D 1.1

Chair: Craig Douglas

343 An Ensemble of Kernel Ridge Regression for Multi-class Classification [abstract]
Abstract: We propose an ensemble of kernel ridge regression based classifiers in this paper. Kernel ridge regression admits a closed form solution making it faster to compute and also making it suitable to use for ensemble methods for small and medium sized data sets. Our method uses random vector functional link network to generate training samples for kernel ridge regression classifiers. Several kernel ridge regression classifiers are constructed from different training subsets in each base classifier. The partitioning of the training samples into different subsets leads to a reduction in computational complexity when calculating matrix inverse compared with the standard approach of using all N samples for kernel matrix inversion. The proposed method is evaluated using well known multi-class UCI data sets. Experimental results show the proposed ensemble method outperforms the single kernel ridge regression classifier and its bagging version.
Rakesh Katuwal and Ponnuthurai Suganthan
385 Dynamic Profiles Using Sentiment Analysis for VAA’s Recommendation Design [abstract]
Abstract: In the context of elections, the Internet opens new and promising possibilities for parties and candidates looking for a better political strategy and visibility. In this way they can also organize their election campaign to gather funds, to mobilize support, and to enter into a direct dialogue with the electorate. This paper presents an ongoing research of recommender systems applied on e-government, particularly it is an extension of so-called voting advice applications (VAA's). VAA's are Web applications that support voters, providing relevant information on candidates and political parties by comparing their political interests with parties or candidates on different political issues. Traditional VAA's provide recommendations of political parties and candidates focusing on static profiles of users. The goal of this work is to develop a candidate profile based on different parameters, such as the perspective of voters, social network activities, and expert opinions, to construct a more accurate dynamic profile of candidates. Understanding the elements that compose a candidate profile will help citizens in the decision-making process when facing a lack of information related to the behavior and thinking of future public authorities. At the end of this work, a fuzzy-based visualization approach for a VAA design is given using as a case study the National Elections of Ecuador in 2013.
Luis Terán and Jose Mancera
25 Discriminative Learning from Selective Recommendation and Its Application in AdaBoost [abstract]
Abstract: The integration of semi-supervised learning and ensemble learning has been a promising research area. It is a typical procedure that one learner recommends the pseudo-labeled instances with high predictive confidence to another, so that the training dataset is expanded. However, the new learner’s demand on recommendation as well as the possibility of incorrect recommendation are neglected, which inevitably jeopardize the learning performance. To address these issues, this paper proposes the Discriminative Learning from Selective Recommendation (DLSR) method. On one hand, both reliability and informativeness of the pseudo-labeled instances are taken into account via selective recommendation. On the other hand, the potential in both correct and incorrect recommendation are formulated in discriminative learning. Based on DLSR, we further propose the selective semi-supervised AdaBoost. With both recommending and receiving learners engaged in ensemble model learning, the unlabeled instances are explored in a more effective way.
Xiao-Yu Zhang, Shupeng Wang, Chao Li, Shiming Ge, Yong Wang and Binbin Li
157 Distributed Automatic Differentiation for Ptychography [abstract]
Abstract: Synchrotron radiation light source facilities are leading the way to ultrahigh resolution X-ray imaging. High resolution imaging is essential to understanding the fundamental structure and interaction of materials at the smallest length scale possible. Diffraction based methods achieve nanoscale imaging by replacing traditional objective lenses by pixelated area detectors and computational image reconstruction. Among these methods, ptychography is quickly becoming the standard for sub-30 nanometer imaging of extended samples, but at the expense of increasingly high data rates and volumes. This paper presents a new distributed algorithm for solving the ptychographic image reconstruction problem based on automatic differentiation. Input datasets are subdivided between multiple graphics processing units (GPUs); each subset of the problem is then solved either entirely independent of other subsets (asynchronously) or through sharing gradient information with other GPUs (synchronously). The algorithm was evaluated on simulated and real data acquired at the Advanced Photon Source, scaling up to 192 GPUs. The synchronous variant of our method outperformed an existing multi-GPU implementation in terms of accuracy while running at a comparable execution time.
Youssef Nashed, Tom Peterka, Junjing Deng and Chris Jacobsen
57 Automatic Segmentation of Chinese Characters as Wire-Frame Models [abstract]
Abstract: There exist thousands of Chinese characters, used across several countries and languages. Their huge number induces various processing difficulties by computers. One challenging topic is for example the automatic font generation for such characters. Also, as these characters are in many cases recursive compounds, pattern (i.e. sub-character) detection is an insightful topic. In this paper, aiming at addressing such issues, we describe a segmentation method for Chinese characters, producing wire-frame models, thus vector graphics, compared to conventional raster approaches. While raster output would enable only very limited reusing of these wire-frame models, vector output would for instance support the automatic generation of vector fonts (Adobe Type 1, Apple True Type, etc.) for such characters. Our approach also enables significant performance increase compared to the raster approach. The proposed method is then experimented with a list of several Chinese characters. Next, the method is empirically evaluated and its average time complexity is assessed.
Antoine Bossard

ICCS 2017 Main Track (MT) Session 10

Time and Date: 10:15 - 11:55 on 13th June 2017

Room: HG D 1.1

Chair: Xing Cai

59 Erosion-Inspired Simulation of Aging for Deformation-Based Head Modeling [abstract]
Abstract: Simulation of age progression of 3D head models is an open problem in the field of computer graphics. Existing methods usually require a large set of training data, which may not be available. In this paper, a method for aging simulation of models created by deformation-based modeling is proposed that requires no training data. A user defines the position of wrinkles by selecting the position of endpoints of the desired wrinkles and the wrinkles are then generated using an erosion-inspired approach. The method can be used to simulate aging of any head model, however, if used for models created by deformations of a base model, the erosion factors can be calculated only for the base model and applied to the derived models. The results show that the approach is capable of creating visually plausible aged models.
Věra Skorkovská, Martin Prantl, Petr Martínek and Ivana Kolingerová
61 Extending Perfect Spatial Hashing to Index Tuple-based Graphs Representing Super Carbon Nanotubes [abstract]
Abstract: In this paper, we demonstrate how to extend perfect spatial hashing (PSH) to the problem domain of indexing nodes in a graph that represents of Super Carbon Nanotubes (SCNTs). The goal of PSH is to hash multidimensional data without collisions. Since PSH results from the research on computer graphics, its principles and methods have only been tested on 2− and 3−dimensional problems. In our case, we need to hash up to 28 dimensions. In contrast to the original applications of PSH, we do not focus on GPUs as target hardware but on an efficient CPU implementation. Thus, this paper highlights the extensions to the original algorithm to make it suitable for higher dimensions and the representation of SCNTs. Comparing the compression and performance results of the new PSH based graphs and a structure-tailored custom data structure in our parallelized SCNT simulation software, we find, that PSH in some cases achieves better compression by a factor of 1.7 while only increasing the total runtime by several percent. In particular, after our extension, PSH can also be employed to index sparse multidimensional scientific data from other domains.
Michael Burger, Giang Nam Nguyen and Christian Bischof
130 Effective and Scalable Data Access Control in Onedata Large Scale Distributed Virtual File System [abstract]
Abstract: Nowadays, as large amounts of data are generated, either from experiments, satellite imagery or via simulations, access to this data becomes challenging for users who need to further process them, since existing data management makes it difficult to effectively access and share large data sets. In this paper we present an approach to enabling easy and secure collaborations based on the state of the art authentication and authorization mechanisms, advanced group/role mechanism for flexible authorization management and support for identity mapping between local systems, as applied in an eventually consistent distributed file system called Onedata.
Michal Wrzeszcz, Lukasz Opiola, Konrad Zemek, Bartosz Kryza, Lukasz Dutka, Renata Slota and Jacek Kitowski
201 Devising a computational model based on data mining techniques to predict concrete compressive strength [abstract]
Abstract: Predicting the compressive strength of concrete is an essential task in the construction process, since a prior knowledge on such information helps enhancing speed and quality of the process. Recently, many computational methods and techniques have been developed to predict distinct properties of concrete. However, a practical use of these solutions requires a high degree of engineering expertise and programming skills. Alternatively, this work advocates that software packages with off-the-shelf data mining algorithms can empower researchers and engineers on this task, while demanding less effort. In this direction, we present a detailed study on the use of Weka, evaluating different regression algorithms for predicting the compressive strength of concrete. Using the most complete dataset available at the UCI dataset repository, we demonstrate that most of the techniques available in Weka produces results close to the best ones reported in the literature. For instance, most of the evaluated predicting models generates a Mean Absolute Error (MAE) inferior to 10, while the best result found is 8. Moreover, by fine-tuning the parameters of the regression algorithm Bagging with REPTree, we achieved a MAE value inferior to 3.3 for the evaluated dataset. Hence, the process considered in this study is also useful as a guideline to devise new computational models based on off-the-shelf data mining algorithms.
Daniel Alencar, Dárlinton Carvalho, Eduardus Koenders, Fernando Mourão and Leonardo Rocha
513 ParaView + Alya + D8tree: Integrating High Performance Computing and High Performance Data Analytics [abstract]
Abstract: Large scale time-dependent particle simulations can generate massive amounts of data, making it so that storing the results is often the slowest phase and the primary time bottleneck of the simulation. Furthermore, analysing this amount of data with traditional tools has become increasingly challenging, and it is often virtually impossible to have a visual representation of the full set. We propose a novel architecture that integrates a HPC-based multi-physics simulation code, a NoSQL database, and a data analysis and visualisation application. The goals are two: On the one hand, we aim to speed up the simulations taking advantage of the scalability of key-value data stores, while at the same time enabling real-time approximated data visualisation and interactive exploration. On the other hand, we want to make it efficient to explore and analyse the large data base of results produced. Therefore, this work represents a clear example of integrating High Performance Computing with High Performance Data Analytics. Our prototype proves the validity of our approach and shows great performance improvements. Indeed, we reduced by 67.5% the time to store the simulation while we made real-time queries run 52 times faster than alternative solutions.
Antoni Artigues, Cesare Cugnasco, Yolanda Becerra, Fernando Cucchietti, Guillaume Houzeaux, Mariano Vazquez, Jordi Torres, Eduard Ayguade and Jesus Labarta

ICCS 2017 Main Track (MT) Session 11

Time and Date: 14:10 - 15:50 on 13th June 2017

Room: HG D 1.1

Chair: Rick Quax

54 Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications [abstract]
Abstract: Scientific workflows are designed to solve complex scientific problems and accelerate scientific progress. Ideally, scientific workflows should improve the reproducibility of scientific applications by making it easier to share and reuse workflows between scientists. However, scientists often find it difficult to reuse others’ workflows, which is known as workflow decay. In this paper, we explore the challenges in reproducing scientific workflows, and propose a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems. Our framework allows dependencies to be archived in basic units of OS image, software and data instead of gigantic all-in-one images. We implement a prototype of our framework by integrating Umbrella, an execution environment creator, into Makeflow, a scientific workflow system. To evaluate our framework, we use it to run two bioinformatics scientific workflows, BLAST and BWA. The execution environment of the tasks in each workflow is specified as an Umbrella specification file, and sent to execution nodes where Umbrella is used to create the specified environment for running the tasks. For each workflow we evaluate the size of the Umbrella specification file, the time and space overheads of creating execution environments using Umbrella, and the heterogeneity of execution nodes contributing to each workflow. The evaluation results show that our framework improves the utilization of heterogeneous computing resources, and improves the portability and reproducibility of scientific workflows.
Haiyan Meng and Douglas Thain
539 Data Mining Approach for Feature Based Parameter Tunning for Mixed-Integer Programming Solvers [abstract]
Abstract: Integer Programming (IP) is the most successful technique for solving hard combinatorial optimization problems. Modern IP solvers are very complex programs composed of many different procedures whose execution is embedded in the generic Branch & Bound framework. The activation of these procedures as well the definition of exploration strategies for the search tree can be done by setting different parameters. Since the success of these procedures and strategies in improving the performance of IP solvers varies widely depending on the problem being solved, the usual approach for discovering a good set of parameters considering average results is not ideal. In this work we propose a comprehensive approach for the automatic tuning of Integer Programming solvers where the characteristics of instances are considered. Computational experiments in a diverse set of 308 benchmark instances using the open source COIN-OR CBC solver were performed with different parameter sets and the results were processed by data mining algorithms. The results were encouraging: when trained with a portion of the database the algorithms were able to predict better parameters for the remaining instances in 84% of the cases. The selection of a single best parameter setting would provide an improvement in only 56% of instances, showing that great improvements can be obtained with our approach.
Matheus Vilas Boas, Haroldo Santos, Luiz Merschmann and Rafael Martins
138 A Spectral Collocation Method for Systems of Singularly Perturbed Boundary Value Problems [abstract]
Abstract: We present a spectrally accurate method for solving coupled singularly perturbed second order two-point boundary value problems (BVPs). The method combines analytical coordinate transformations with a standard Chebyshev spectral collocation method; it is applicable to linear and to nonlinear problems. The method performs well in resolving very thin boundary layers. Compared to other methods which had been proposed for systems of BVPs this method is competitive in terms of accuracy, allows for different perturbation parameters in each of the equations, and does not require special properties of the coefficient functions.
Nathan Sharp and Manfred Trummer
344 Deriving Principles for Business IT Alignment through the Analysis of a non-linear Model [abstract]
Abstract: An enduring topic in Information Systems academic and practitioners’ literature is how Business and Information Technology (IT) resources can be aligned in order to generate value for companies (Gerow et al. 2014). Despite a considerable body of literature, alignment is still considered an unachieved objective in corporate practice and the topic constantly ranks on top priorities of companies’ CIOs (Kappelman et al. 2013). The inability to explain the process of alignment, i.e. how alignment is implemented in organisations, is considered one of the main reasons for the high misalignment level in companies (Chan and Reich 2007b). In an attempt to radically innovate alignment studies, researchers approached Complexity Science to investigate how Information Systems evolve in organisations (Merali 2006; Merali et al. 2012; Vessey and Ward 2013; Campbell and Peppard 2007) and derived a set of principles potentially capable of improving alignment (Benbya and Mc Kelvey 2006). However, studies have mainly adopted a qualitative and descriptive approach and alignment principles have been drawn by analogy between Information Systems and other complex systems existing in nature and extensively studied rather than as the result of a theoretical explanation and modelling (Kallinikos 2005). In our study we developed a model that describes how alignment evolves in organisations. The model adopts the fraction of persons within an organisation who are unsatisfied by IT as a state variable to measure misalignment. The evolution of misalignment is linked to key parameters, such as the capacity of the IT department to understand business needs and transform them into innovation projects, the resistance to change of the personnel, the flexibility of the Information Systems, the IT investment policies of the organisation. The model is based on an extensive literature review (Chan and Reich 2007a), through which several parameters influencing alignment have been selected, and on the study of 4 cases, i.e. alignment processes implemented in manufacturing companies. Through the analysis of the model we derived principles for effectively managing alignment implementation in organisations, such as the improvement of personnel flexibility, the exploitation of feedback loops, the development of monitoring systems, and the implementation of modular, weakly-coupled IT components. Applicability of principles in corporate practice has been tested in one company undertaking a digital transformation project. The contribution to the study of alignment is twofold. The model, despite its simplicity, is capable of describing alignment dynamics, even in cases not explicable through other approaches, and contributes to the creation of a theoretical foundation for the study of alignment as a complex process. At operational level, the derivation of principles constitutes a step towards the implementation of effective alignment strategies. References Alaa, G. (2009). “Derivation of factors facilitating organizational emergence based on complex adaptive systems and social autopoiesis theories,” Emergence: Complexity and Organization, 11(1), 19. Benbya, H., and McKelvey, B. (2006). “Using Co-evolutionary and Complexity Theories to Improve IS Alignment: A Multi-level Approach,” Journal of Information Technology (21:4), pp. 284-298. Campbell, B., & Peppard, J. (2007). The co-evolution of business information systems’ strategic alignment: an exploratory study. Chan, Y. E., & Reich, B. H. (2007a). “IT alignment: an annotated bibliography,”Journal of Information Technology, 22(4), 316-396. Chan, Y. E., and Reich, B. H. (2007b). “IT Alignment: What have we Learned?”, Journal of Information Technology (22:4), pp. 297-315. Chan, Y. E., Sabherwal, R., and Thatcher, J. B. (2006). “Antecedents and Outcomes of Strategic IS Alignment: An Empirical Investigation,” IEEE Transactions on Engineering Management (53:1), pp. 27-47. Gerow, J. E., Grover, V., Thatcher, J. B., & Roth, P. L. (2014). “Looking toward the future of IT- business strategic alignment through the past: A meta-analysis,” MIS Quarterly, 38(4), 1059-1085. Henderson, J. C., & Venkatraman, H. (1993). “Strategic alignment: Leveraging information technology for transforming organizations,” IBM Systems Journal, 32(1), 472-484. Kallinikos, J. (2005). “The order of technology: Complexity and Control in a Connected World,” Information and Organization (15:3), pp. 185-202. Kappelman, L. A., McLeon, E., Luftman, J., and Johnson, V. 2013. “Key Issues of IT Organizations and their Leadership: The 2013 SIM IT Trends Study,” MIS Quarterly Executive, (12), pp. 227- 240. Luftman, J., Papp, R., and Brier, T. (1999). “Enablers and Inhibitors of Business-IT Alignment,” Communications of the AIS, 1(3es), 1. Merali, Y. (2006). “Complexity and Information Systems: The Emergent Domain,” Journal of Information Technology (21:4), 216-228. Vessey, I., and Ward, K. 2013. “The Dynamics of Sustainable IS Alignment: The Case for IS Adaptivity,” Journal of the Association for Information Systems (14:6), pp. 283-301. Wagner, H. T., Beimborn, D., & Weitzel, T. (2014). “How social capital among information technology and business units drives operational alignment and IT business value,” Journal of Management Information Systems, 31(1), 241-272.
Fabrizio Amarilli

ICCS 2017 Main Track (MT) Session 12

Time and Date: 16:20 - 18:00 on 13th June 2017

Room: HG D 1.1

Chair: Manfred Trummer

490 Parallel Parity Games: a Multicore Attractor for the Zielonka Recursive Algorithm [abstract]
Abstract: Parity games are abstract infinite-duration two-player games, widely studied in computer science. Several solution algorithms have been proposed and also implemented in the community tool of choice called PGSolver, which has declared the Zielonka Recursive (ZR) algorithm the best performing on randomly generated games. With the aim of scaling and solving wider classes of parity games, several improvements and optimizations have been proposed over the existing algorithms. However, no one has yet explored the benefit of using the full computational power of which even common modern multicore processors are capable of. This is even more surprisingly by considering that most of the advanced algorithms in PGSolver are sequential. In this paper we introduce and implement, on a multicore architecture, a parallel version of the Attractor algorithm, that is the main kernel of the ZR algorithm. This choice follows our investigation that more of the 99% of the execution time of the ZR algorithm is spent in this module. We provide testing on graphs up to 20K nodes generated through PGSolver and we discuss performance analysis in terms of strong and weak scaling.
Umberto Marotta, Aniello Murano, Rossella Arcucci and Loredana Sorrentino
492 Replicated Synchronization for Imperative BSP Programs [abstract]
Abstract: The BSP model (Bulk Synchronous Parallel) simplifies the construction and evaluation of parallel algorithms, with its simplified synchronization structure and cost model. Nevertheless, imperative BSP programs can suffer from synchronization errors. Programs with textually aligned barriers are free from such errors, and this structure eases program comprehension. We propose a simplified formalization of barrier inference as data flow analysis, which verifies statically whether an imperative BSP program has replicated synchronization, which is a sufficient condition for textual barrier alignment.
Arvid Jakobsson, Frederic Dabrowski, Wadoud Bousdira, Frederic Loulergue and Gaetan Hains
496 IMCSim: Parameterized Performance Prediction for Implicit Monte Carlo Codes [abstract]
Abstract: We design an application model (IMCSim) of the implicit Monte Carlo particle code IMC using the Performance Prediction Toolkit (PPT), a discrete-event simulation-based modeling framework for predicting code performance on a large range of parallel platforms. We present validation results for IMCSim. We then use the fast parameter scanning that such a high-level loop-structure model of a complex code enables to predict optimal IMC parameter settings for interconnect latency hiding. We find that variations in interconnect bandwidth have a significant effect on optimal parameter values, thus suggesting the use of IMCSim as a pre-step to substantial IMC runs to quickly identify optimal parameter values for the specific hardware platform that IMC runs on.
Stephan Eidenbenz, Alex Long, Jason Liu, Olena Tkachenko and Robert Zerr
532 Efficient Implicit Parallel Patterns for GIS [abstract]
Abstract: With the data growth, the need to parallelize treatments become crucial in numerous domains. But for non-specialists it is still difficult to tackle parallelism technicalities as data distribution, communications or load balancing. For the geoscience domain we propose a solution based on implicit parallel patterns. These patterns are abstract models for a class of algorithms which can be customized and automatically transformed in a parallel execution. In this paper, we describe a pattern for stencil computation and a novel pattern dealing with computation following a pre-defined order. They are particularly used in geosciences and we illustrate them with the flow direction and the flow accumulation computations.
Kevin Bourgeois, Sophie Robert, Sébastien Limet and Victor Essayan
103 Taking Lessons Learned from a Proxy Application to a Full Application for SNAP and PARTISN [abstract]
Abstract: SNAP is a proxy application which simulates the computational motion of a neutral particle transport code, PARTISN. In this work, we have adapted parts of SNAP separately; we have re-implemented the iterative shell of SNAP in the task-model runtime Legion, showing an improvement to the original schedule, and we have created multiple Kokkos implementations of the computational kernel of SNAP, displaying similar performance to the native Fortran. We then translate our Kokkos experiments in SNAP to PARTISN, necessitating engineering development, regression testing, and further thought.
Geoffrey Womeldorff, Joshua Payne and Benjamin Bergen

ICCS 2017 Main Track (MT) Session 13

Time and Date: 9:00 - 10:40 on 14th June 2017

Room: HG D 1.1

Chair: Michael Kirby

194 cuHines: Solving Multiple (Batched) Hines systems on NVIDIA GPUs. Human Brain Project [abstract]
Abstract: The simulation of the behavior of the Human Brain is one of the most important challenges today in computing. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this kind of simulation need, using the current technology. In this sense, this work is focused on one of the main steps of such simulation, which consists of computing the Ca capacitance on neurons’ morphology. This is carried out using the Hines Algorithm. Although this algorithm is the optimum method in terms of number of operations, it is in need of non-trivial modifications to be efficiently parallelized on NVIDIA GPUs. We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems (neurons). Each of the optimizations are deeply analyzed and described. To evaluate the impact of the optimizations on real inputs, we have used 6 different morphologies in terms of size and branches. Our studies have proven that the optimizations proposed in the present work can achieve a high performance on those computations with a high number of neurons, being our GPU implementations about 4× and 8× faster than the OpenMP multicore implementation (16 cores), using one and two K80 NVIDIA GPUs respectively. Also, it is important to highlight that these optimizations can continue scaling even when dealing with number of neurons.
Pedro Valero-Lara, Ivan Martínez-Pérez, Antonio J. Peña, Xavier Martorell, Raül Sirvent and Jesús Labarta
213 Exploiting Hybrid Parallelism in the Kinematic Analysis of Multibody Systems Based on Group Equations [abstract]
Abstract: Computational kinematics is a fundamental tool for the design, simulation, control, optimization and dynamic analysis of multibody systems - mechanical systems whose bodies are connected by joints which allow relative movement. The analysis of complex multibody systems and the need for real time solutions requires the development of kinematic and dynamic formulations that reduces computational cost, the selection and efficient use of the most appropriated solvers and the exploiting of all the computer resources using parallel computing techniques. The topological approach based on group equations and natural coordinates reduces the computation time in comparison with well-known global formulations and enables the use of parallelism techniques which can be applied at different levels: simultaneous solution of equations, use of multithreading routines for each equation, or a combination of both. This paper studies and compares these topological formulation and parallel techniques to ascertain which combination performs better in two applications. The first application is the use of dedicated systems for the real time control of small multibody systems, defined by a few number of equations and small linear systems, so shared-memory parallelism in combination with linear algebra routines is analyzed in a small multicore and in Raspberry Pi. The control of a Stewart platform is used as a case study. The second application is the study of large multibody systems in which the kinematic analysis must be performed several times during the design of multibody systems. A simulator which allows us to control the formulation, the solver, the parallel techniques and size of the problem has been developed and tested in more powerful computational systems with larger multicores and GPU.
Gregorio Bernabe, Jose-Carlos Cano, Domingo Gimenez, Javier Cuenca, Antonio Flores, Mariano Saura-Sanchez and Pablo Segado-Cabezos
209 On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization [abstract]
Abstract: The growing interest to incorporate new features into mobile devices has increased the number of signal processing applications running over processors designed for mobile computing. A challenging signal processing field is acoustic source localization, which is attractive for applications such as automatic camera steering systems, human-machine interfaces, video gaming or audio surveillance. In this context, the emergence of systems-on-chip (SoC) that contain a small graphics accelerator (or GPU), contributes a notable increment of the computational capacity while partially retaining the appealing low-power consumption of embedded systems. This is the case, for example, of the Samsung Exynos 5422 SoC that includes a Mali-T628 MP6 GPU. This work evaluates an OpenCL-based implementation of a method for sound source localization, namely, the Steered-Response Power with Phase Transform (SRP-PHAT) algorithm, on GPUs of this type. The results show that the proposed implementation can work in real time with high-resolution spatial grids using up to 12 microphones.
Jose A. Belloch, Jose M. Badia, Francisco D. Igual, Maximo Cobos and Enrique S. Quintana-Ortí
379 Fast Genome-Wide Third-order SNP Interaction Tests with Information Gain on a Low-cost Heterogeneous Parallel FPGA-GPU Computing Architecture [abstract]
Abstract: Complex diseases may result from many genetic variants interacting with each other. For this reason, genome-wide interaction studies (GWIS) are currently performed to detect pairwise SNP interactions. While the computations required here can be completed within reasonable time, it has been inconvenient yet to detect third-order SNP interactions for large-scale datasets due to the cubic complexity of the problem. In this paper we introduce a feasible method for third-order GWIS analysis of genotyping data on a low-cost heterogeneous computing system that combines a Virtex-7 FPGA and a GeForce GTX 780 Ti GPU, with speedups between 70 and 90 against a CPU-only approach and a speedup of approx. 5 against a GPU-only approach. To estimate effect sizes of third-order interactions we employed information gain (IG), a measure that has been applied on a genome-wide scale only for pairwise interactions in the literature yet.
Lars Wienbrandt, Jan Christian Kässens, Matthias Hübenthal and David Ellinghaus
459 Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures [abstract]
Abstract: This paper presents new algorithmic approaches and optimization techniques for LU factorization and matrix inversion of millions of very small matrices using GPUs. These problems appear in many scientific applications including astrophysics and generation of block-Jacobi preconditioners. We show that, for very small problem sizes, design and optimization of GPU kernels require a mindset different from the one usually used when designing LAPACK algorithms for GPUs. Techniques for optimal memory traffic, register blocking, and tunable concurrency are incorporated in our proposed design. We also take advantage of the small matrix sizes to eliminate the intermediate row interchanges in both the factorization and inversion kernels. The proposed GPU kernels achieve performance speedups vs. CUBLAS of up to 6x for the factorization, and 14x for the inversion, using double precision arithmetic on a Pascal P100 GPU.
Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov and Jack Dongarra

ICCS 2017 Main Track (MT) Session 14

Time and Date: 13:25 - 15:05 on 14th June 2017

Room: HG D 1.1

Chair: Jose A. Belloch

530 A Multithreaded Algorithm for Sparse Cholesky Factorization [abstract]
Abstract: We present a multithreaded method for supernodal sparse Cholesky factorizations on a hybrid multicore platform consisting of a multicore CPU and GPU. Our algorithm can utilize concurrentcy at differnt levels of the elimination tree by using multiple threads in both the CPU and the GPU. By factorizing multiple matrices in a batch our algorithm can generate better performance than previous implementations. Our experiments results on a platform consisting of an Intel multicore processor along with an Nvidia GPU indicate a significant improvement in performance over single-threaded supernodal algorithm.
Meng Tang, Mohamed Gadou and Sanjay Ranka
550 Utilizing Intel Advanced Vector Extensions for Monte Carlo Simulation based Value at Risk Computation [abstract]
Abstract: Value at Risk (VaR) is a statistical method of predicting market risk associated with financial portfolios. There are numerous statistical models which forecast VaR and out of those, Monte Carlo Simulation is a commonly used technique with a high accuracy though it is computationally intensive. Calculating VaR in real time is becoming a need of short term traders in current day markets and adapting Monte Carlo method of VaR computation for real time calculation poses a challenge due to the computational complexity involved with the simulation step of the Monte Carlo Simulation. The simulation process has an independent set of tasks. Hence a performance bottleneck occurs during the sequential execution of these independent tasks. By parallelizing these tasks, the time taken to calculate the VaR for a portfolio can be reduced significantly. In order to address this issue, we looked at utilizing the Advanced Vector Extensions (AVX) technology to parallelize the simulation process. We compared the performance of the AVX based solution against the sequential approach as well as against a multi threaded solution and a GPU based solution. The results showed that the AVX approach outperformed the GPU approach for up to an iteration count of 200000. Since such a number of iterations is generally not required to gain a sufficiently accurate VaR measure, it makes sense both computationally and economically to utilize AVX for Monte Carlo method of VaR computation.
Nipuna Liyanage, Pubudu Fernando, Dilini Mampitiya Arachchi, Dilip Karunathilaka and Amal Perera
564 Sparse Local Linear Embedding [abstract]
Abstract: The Locally Linear Embedding (LLE) algorithm has proven useful for determining structure preserving, dimension reducing mappings of data on manifolds. We propose a modification to the LLE optimization problem that serves to minimize the number of neighbors required for the representation of each data point. The algorithm is shown to be robust over wide ranges of the sparsity parameter producing an average number of nearest neighbors that is consistent with the best performing parameter selection for LLE. Given the number of non-zero weights may be substantially reduced in comparison to LLE, Sparse LLE can be applied to larger data sets. We provide three numerical examples including a color image, the standard swiss roll, and a gene expression data set to illustrate the behavior of the method in comparison to LLE. The resulting algorithm produces comparatively sparse representations that preserve the neighborhood geometry of the data in the spirit of LLE.
Lori Ziegelmeier, Michael Kirby and Chris Peterson
148 Efficient iterative methods for multi-frequency wave propagation problems: A comparison study [abstract]
Abstract: In this paper we present a comparison study for three different iterative Krylov methods that we have recently developed for the simultaneous numerical solution of wave propagation problems at multiple frequencies. The three approaches have in common that they require the application of a single shift-and-invert preconditioner at a suitable 'seed' frequency. The focus of the present work, however, lies on the performance of the respective iterative method. We conclude with numerical examples that provide guidance concerning the suitability of the three methods.
Manuel Baumann and Martin B. van Gijzen
437 Lyapunov Function computation for systems with multiple equilibria [abstract]
Abstract: Recently a method was presented to compute Lyapunov functions for nonlinear systems with multiple local attractors. This method was shown to succeed in delivering algorithmically a Lyapunov function giving qualitative information on the system's dynamics, including lower bounds on the attractors' basins of attraction. We suggest a simpler and faster algorithm to compute such a Lyapunov function if the attractors in question are exponentially stable equilibrium points. Just as in the earlier publication one can apply the algorithm and expect to obtain partial information on the system dynamics if the assumptions on the system at hand are only partially fulfilled. We give four examples of our method applied to different dynamical systems from the literature.
Sigurdur Hafstein and Johann Bjornsson

ICCS 2017 Main Track (MT) Session 15

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 1.2

Chair: Jorge González-Domínguez

252 Letting researchers do research: A national structure for expert IT support [abstract]
Abstract: Information Technology (IT) impacts nearly every aspect of our lives and work, and is an essential tool for research. However, research presents new problems in IT which make its support challenging. These challenges are often unlike those experienced by typical organisational IT departments, and relate to the use of new, less stable or more cutting edge technologies and approaches. Supporting researchers in meeting these challenges has led to the creation of a network of specialist units at Swiss research organisations to provide Research IT support. These provide specialist support, letting researchers concentrate on their core tasks and speeding time to results. They are further federated through the eScience Coordination Team (eSCT, a Swiss national project) into a national network to support research. Here we discuss the difference between Core IT functions, and the research IT functions this new network seeks to support. This differentiation helps both Core IT and Research IT teams better make use of their skills and better serve their customers. Beyond this, we reflect on the organisational experiences generated through creating and operating these units and the national network they contribute to, and what lessons can be learned to assist with creation of Research IT functions elsewhere.
Owen Appleton, Alex Upton, Thomas Wüst, Bernd Rinn, Henry Luetcke, Vittoria Rezzonico, Gilles Fourestey, Dean Flanders, John White, Nabil Abdennadher, Thierry Sengstag, Eva Pujadas, Heinz Stockinger and Sergio Maffioletti
288 Asynchronous Decentralized Framework for Unit Commitment in Power Systems [abstract]
Abstract: Optimization of power networks is a rich research area that focuses mainly on efficiently generating and distributing the right amount of power to meet demand requirements across various geographically dispersed regions. The Unit Commitment (UC) problem is one of the critical problems in power network research that involves determining the amount of power that must be produced by each generator in the power network subject to numerous operational constraints. Growth of these networks coupled with increased interconnectivity and cybersecurity measures has created an encouraging platform for applying decentralized optimization paradigms. In this paper, we develop a novel asynchronous decentralized optimization framework to solve the UC problem. We demonstrate that our asynchronous approach outperforms conventional synchronous approaches, thereby promising greater gains in computational efficiency.
Paritosh Ramanan, Murat Yildirim, Edmond Chow and Nagi Gebraeel
458 An Advanced Software Tool to Simulate Service Restoration Problems: a case study on Power Distribution Systems [abstract]
Abstract: This paper presents a software tool to simulate a practical problem in smart grid systems. A feature of the smart grid is a system self-recovery capability in the occurrence of anomalies, such as a recovery of a power distribution network after an occurrence of a fault. When this system has a capacity for self-recovery, it is called self-healing. The intersection among areas as computer science, telecommunication, automation and electrical engineering, has allowed power systems to gain new technologies. However, because it is a multi-area domain, self-recovery simulation tools in smart grids are often highly complex as well as presenting low fidelity by using approximation algorithms. The main contribution of this paper is a simulator with high fidelity and low complexity in terms of programming, usability and semantics. In this simulator, a computational intelligence technique and a derivative method for calculating the power flow were encapsulated. The result is a software tool with high abstraction and easy customization, aimed at a self-healing system for a reconfiguration of an electric power distribution network.
Richardson Ribeiro
259 Disaggregated Computing. An Evaluation of Current Trends for Datacentres [abstract]
Abstract: Next generation data centers will likely be based on the emerging paradigm of disaggregated function-blocks-as-a-unit departing from the current state of mainboard-as-a-unit. Multiple functional blocks or bricks such as compute, memory and peripheral will be spread through the entire system and interconnected together via one or multiple high speed networks. The amount of memory available will be very large distributed among multiple bricks. This new architecture brings various benefits that are desirable in today’s data centers such as fine-grained technology upgrade cycles, fine-grained resource allocation, and access to a larger amount of memory and accelerators. An analysis of the impact and benefits of memory disaggregation is presented in this paper. One of the biggest challenges when analyzing these architectures is that memory accesses should be modeled correctly in order to obtain accurate results. However, modeling every memory access would generate a high overhead that can make the simulation unfeasible for real data center applications. A model to represent and analyze memory disaggregation has been designed and a statistics-based queuing-based full system simulator was developed to rapidly and accurately analyze applications performance in disaggregated systems. With a mean error of 10%, simulation results pointed out that the network layers may introduce overheads that degrade applications’ performance up to 66%. Initial results also suggest that low memory access bandwidth may degrade up to 20% applications’ performance.
Hugo Daniel Meyer, Jose Carlos Sancho, Josue Quiroga, Ferad Zyulkyarov, Damián Roca and Mario Nemirovsky
580 Using Power Demand and Residual Load Imbalance in the Load Balancing to Save Energy of Parallel Systems [abstract]
Abstract: The power consumption of the High Performance Computing (HPC) systems is an increasing concern as large-scale systems grow in size and, consequently, consume more energy. In response to this challenge, we have develop and evaluate new energy-aware load balancers to reduce the average power demand and save energy of parallel systems when scientific applications with imbalanced load are executed. Our load balancers combine dynamic load balancing with DVFS techniques in order to reduce the clock frequency of underloaded computing cores which experience some residual imbalance even after tasks are remapped. The results show that our load balancers present power reductions of 7.5% in average with the fine-grained variant that performs per-core DVFS, and of 18.75% with the coarse-grained variant that performs per-chip DVFS over real applications.
Edson Luiz Padoin, Philippe Navaux, Jean-Francois Mehaut and Víctor Eduardo Martínez Abaunza

ICCS 2017 Main Track (MT) Session 16

Time and Date: 15:45 - 17:25 on 12th June 2017

Room: HG D 1.2

Chair: Fabrício Enembreck

135 StoreRush: An Application-Level Approach to Harvesting Idle Storage in a Best Effort Environment [abstract]
Abstract: For a production HPC system where storage devices are shared between multiple applica- tions and managed in a best effort manner, contention is often a major problem leading to some storage devices being more loaded than others and causing a significant reduction in I/O throughput. In this paper, we describe our latest efforts StoreRush to resolve this practical issue at the application level without requiring modification to the file and storage system. The proposed scheme uses a two-level messaging system to harvest idle storage via re-routing I/O requests to a less congested storage location so that write performance is improved while lim- iting the impact on read by throttling re-routing if deemed too much. An analytical model is derived to guide the setup of optimal throttling factor. The proposed scheme is verified against production applications Pixie3D, XGC1 and QMCPack during production windows, which very well demonstrated the effectiveness (e.g., up to 1.8x improvement in write) and scalability of our approach (up to 131,072 cores).
Qing Liu, Norbert Podhorszki, Jong Choi, Jeremy Logan, Matt Wolf, Scott Klasky, Tahsin Kurc and Xubin He
204 Fast Parallel Construction of Correlation Similarity Matrices for Gene Co-Expression Networks on Multicore Clusters [abstract]
Abstract: Gene co-expression networks are gaining attention in the present days as useful representations of biologically interesting interactions among genes. The most computationally demanding step to generate these networks is the construction of the correlation similarity matrix, as all pairwise combinations must be analyzed and complexity increases quadratically with the number of genes. In this paper we present MPICorMat, a hybrid MPI/OpenMP parallel approach to construct similarity matrices based on Pearson’s correlation. It is based on a previous tool (RMTGeneNet) that has been used on several biological studies and proved accurate. Our tool obtains the same results as RMTGeneNet but significantly reduces runtime on multicore clusters. For instance, MPICorMat generates the correlation matrix of a dataset with 61,170 genes and 160 samples in less than one minute using 16 nodes with two Intel Xeon Sandy-Bridge processors each (256 total cores), while the original tool needed almost 4.5 hours. The tool is also compared to another available approach to construct correlation matrices on multicore clusters, showing better scalability and performance. MPICorMat is an open-source software and it is publicly available at https://sourceforge.net/projects/mpicormat/.
Jorge González-Domínguez and María J. Martín
261 The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems [abstract]
Abstract: A current trend in high-performance computing is to decompose a large linear algebra problem into batches containing thousands of smaller problems, that can be solved independently, before collating the results. To standardize the interface to these routines, the community is developing an extension to the BLAS standard (the batched BLAS), enabling users to perform thousands of small BLAS operations in parallel whilst making efficient use of their hardware. We discuss the benefits and drawbacks of the current batched BLAS proposals and perform a number of experiments, focusing on GEMM, to explore their affect on the performance. In particular we analyze the effect of novel data layouts which, for example, interleave the matrices in memory to aid vectorization and prefetching of data. Utilizing these modifications our code outperforms both MKL and CuBLAS by up to 6 times on the self-hosted Intel KNL (codenamed Knights Landing) and Kepler GPU architectures, respectively, for large numbers of DGEMM operations using matrices of size 2 × 2 to 20 × 20
Jack Dongarra, Sven Hammarling, Nick Higham, Samuel Relton, Pedro Valero-Lara and Mawussi Zounon
333 OUTRIDER: Optimizing the mUtation Testing pRocess In Distributed EnviRonments [abstract]
Abstract: The adoption of commodity clusters has been widely extended due to its cost-effectiveness and the evolution of networks. These systems can be used to reduce the long execution time of applications that require a vast amount of computational resources, and especially of those techniques that are usually deployed in centralized environments, like testing. Currently, one of the main challenges in testing is to obtain an appropriate test suite. Mutation testing is a widely used technique aimed at generating high quality test suites. However, the execution of this technique requires a high computational cost. In this work we propose OUTRIDER, an HPC-based optimization that contributes to bridging the gap between the high computational cost of mutation testing and the parallel infrastructures of HPC systems aimed to speed-up the execution of computational applications. This optimization is based on our previous work called EMINENT, an algorithm focused on parallelizing the mutation testing process using MPI. However, since EMINENT does not efficiently exploit the computational resources in HPC systems, we propose 4 strategies to alleviate this issue. A thorough experimental study using different applications shows an increase of up to 70% performance improvement using these optimizations.
Pablo C. Cañizares, Alberto Núñez and Juan de Lara
112 Topology-aware Job Allocation in 3D Torus-based HPC Systems with Hard Job Priority Constraints [abstract]
Abstract: In this paper, we address the topology-aware job allocation problem on 3D torus-based high performance computing systems, with the objective of reducing system fragmentation. Firstly, we propose a group-based job allocation strategy, which leads to a more global optimization of resource allocation. Secondly, we propose two shape allocation methods to determine the topological shape for each input job, including a zigzag allocation method for communication non-sensitive jobs, and a convex allocation method for communication sensitive jobs. Thirdly, we propose a topology-aware job mapping algorithm to reduce the system fragmentation brought in by the job mapping process, including a target bin selection method and a bi-directional job mapping method. The evaluation results validate the efficiency of our approach in reducing system fragmentation and improving system utilization.
Kangkang Li, Maciej Malawski and Jarek Nabrzyski