Session1 10:35 - 12:15 on 12th June 2017

ICCS 2017 Main Track (MT) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG F 30

Chair: Youssef Nashed

525 Analysis of Computational Science Papers from ICCS 2001-2016 using Topic Modeling and Graph Theory [abstract]
Abstract: This paper presents results of topic modeling and networks of topics using the ICCS corpus, which contains domain specific(computational science) papers over sixteen years (5695 papers). We discuss topical structures of ICCS, how these topics evolve over time in response to topicality of various problems, technologies and methods, and how these topics relate to one another. This analysis illustrates multidisciplinary research and collaborations among scientific communities, by constructing static and dynamic networks of the topic modeling results and the authors’ keywords. The results of this study will help ICCS organizers to identify the past and future trends of core talking topics, and to organize workshops based on communities of topics which in return will satisfy the interests of participants by allowing them to attend the workshop which is directly related to their domain area. We used Non-negative Matrix Factorization(NMF) topic modeling algorithm to discover topics and labeled and grouped the results hierarchically. We used Gephi to study static networks of topics, and R library called DyA to analyze dynamic networks of topics.
Tesfamariam Abuhay, Sergey Kovalchuk, Klavdiya Bochenina, George Kampis, Valeria Krzhizhanovskaya and Michael Lees
43 Identifying Urban Inconsistencies via Street Networks [abstract]
Abstract: Street networks, comprised by its topology and geometry, can be used in problems related to ill-designed urban structures. Several works have focused on such application. Nevertheless, they lack a clear methodology to characterize and explain the urban space through a complex network. Aided by topo-geometrical measures from georeferenced networks, we present a methodology to identify what we call urban inconsistencies, which are characterized by low-access regions containing nodes that lack efficient access from or to other regions in a city. To this end, we devised algorithms capable of preprocessing and analyzing street networks, pointing to existing mobility problems in a city. Mainly, we identify inconsistencies that pertain to a given node where a facility of interest is currently placed. Our results introduce ways to assist in the urban planning and design processes. The proposed techniques are discussed through the visualization and analysis of a real-world city. Hence, our contributions provide a basis for further advancements on street networks applied to facilities location analysis.
Gabriel Spadon, Gabriel Gimenes and Jose Rodrigues-Jr
120 Impact of Neighbors on the Privacy of Individuals in Online Social Networks [abstract]
Abstract: The problem of user privacy enforcement in online social networks (OSN) cannot be ignored and, in recent years, Facebook and other providers have improved considerably their privacy protection tools. However, in OSN's the most powerful data protection "weapons" are the users themselves. The behavior of an individual acting in an OSN highly depends on her level of privacy attitude: an aware user tends not to share her private information, or the private information of her friends, while an unaware user could not recognize some information as private, and could share it without care to her contacts. In this paper, we experimentally study the role of the attitude on privacy of an individual and her friends on information propagation in social networks. We model information diffusion by means of an extension of the Susceptible-Infectious-Recovered (SIR) epidemic model that takes into account the privacy attitude of users. We employ this diffusion model in stochastic simulations on a synthetic social network, designed for miming the characteristics of the Facebook social graph.
Livio Bioglio and Ruggero G. Pensa
230 Mining Host Behavior Patterns From Massive Network and Security Logs [abstract]
Abstract: Mining host behavior patterns from massive logs plays an important and crucial role in anomalies diagnosing and management for large-scale networks. Almost all prior work gives a macroscopic link analysis of network events, but fails to microscopically analyze the evolution of behavior patterns for each host in networks. In this paper, we propose a novel approach, namely Log Mining for Behavior Pattern (LogM4BP), to address the limitations of prior work. LogM4BP builds a statistical model that captures each host's network behavior patterns with the nonnegative matrix factorization algorithm, and finally improve the interpretation and comparability of behavior patterns, and reduce the complexity of analysis. The work is evaluated on a public data set captured from a big marketing company. Experimental results show that it can describe network behavior patterns clearly and accurately, and the significant evolution of behavior patterns can be mapped to anomaly events in real world intuitively.
Jing Ya, Tingwen Liu, Quangang Li, Jinqiao Shi, Haoliang Zhang, Pin Lv and Li Guo

ICCS 2017 Main Track (MT) Session 8

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 1.1

Chair: Xing Cai

370 Semi-Supervised Clustering Algorithms for Grouping Scientific Articles [abstract]
Abstract: Creating sessions in scientific conferences consists in grouping papers with common topics taking into account the size restrictions imposed by the conference schedule. Therefore, this problem can be considered as semi-supervised clustering of documents based on their content. This paper aims to propose modifications in traditional clustering algorithms to incorporate size constraints in each cluster. Specifically, two new algorithms are proposed to semi-supervised clustering, based on: binary integer linear programming with cannot-link constraints and a variation of the K-Medoids algorithm, respectively. The applicability of the proposed semi-supervised clustering methods is illustrated by addressing the problem of automatic configuration of conference schedules by clustering articles by similarity. We include experiments, applying the new techniques, over real conferences datasets: ICMLA-2014, AAAI-2013 and AAAI-2014. The results of these experiments show that the new methods are able to solve practical and real problems.
Diego Vallejo, Paulina Morillo and Cesar Ferri
263 Parallel Learning Portfolio-based solvers [abstract]
Abstract: Exploiting multi-core architectures is a way to tackle the CPU time consumption when solving SATisfiability (SAT) problems. Portfolio is one of the main techniques that implements this principle. It consists in making several solvers competing, on the same problem, and the winner will be the first that answers. In this work, we improved this technique by using a learning schema, namely the Exploration- Exploitation using Exponential weight (EXP3), that allows smart resource allocations. Our contribution is adapted to situations where we have to solve a bench of SAT instances issued from one or several sequence of problems. Our experiments show that our approach achieves good results.
Tarek Menouer and Souheib Baarir
298 Learning Entity and Relation Embeddings for Knowledge Resolution [abstract]
Abstract: Knowledge resolution is the task of clustering knowledge mentions, e.g., entity and relation mentions into several disjoint groups with each group representing a unique entity or relation. Such resolution is a central step in constructing high-quality knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a semantic dictionary or a knowledge graph. This may lead to poor performance on knowledge mentions with poor or not well-known contexts. In addition, it is also limited by the coverage of the semantic dictionary or knowledge graph. In this work, we propose ETransR, a method which automatically learns entity and relation feature representations in continuous vector spaces, in order to measure the semantic relatedness of knowledge mentions for knowledge resolution. Experimental results on two benchmark datasets show that our proposed method delivers significant improvements compared with the state-of-the-art baselines on the task of knowledge resolution.
Hailun Lin
12 3D High-quality Textile Reconstruction with Synthesized Texture [abstract]
Abstract: 3D textile model plays an important role in textile engineering. However, not much work focus on high-quality 3D textile reconstruction. The texture is also limited by photography methods in 3D scanning. This paper presents a novel framework of reconstructing a high-quality 3D textile model with a synthesized texture. Firstly, a pipeline of 3D textile processing is proposed to obtain a better 3D model based on KinectFusion. Then, convolutional neural networks (CNN) is used to synthesize a new texture. To our best knowledge, this is the first paper combining 3D textile reconstruction and texture synthesis. Experimental results show that our method can conveniently obtain high-quality 3D textile models and realistic textures.
Pengpeng Hu, Taku Komura, Duan Li, Ge Wu and Yueqi Zhong
255 A Proactive Cloud Scaling Model Based on Fuzzy Time Series and SLA Awareness [abstract]
Abstract: Cloud computing has emerged as an optimal option for almost all computational problems today. Using cloud services, customers and providers come to terms of usage conditions defined in Service Agreement Layer (SLA), which specifies acceptable Quality of Service (QoS) metric levels. From the view of cloud-based software developers, their application-level SLA must be mapped to provided virtual resource-level SLA. Hence, one of the important challenges in clouds today is to improve QoS of computing resources. In this direction, there are many studies dealing with the problem by bringing forward prediction consumption models. However, the SLA violation evaluation for these prediction models still has been received less attentions. In this paper, we focus on developing a comprehensive autoscaling solution for clouds based on forecasting resource consumption in advance and validating prediction-based scaling decisions. Our prediction model takes all advantages of fuzzy approach, genetic algorithm and neural network to process historical monitoring time series data. After that the scaling decisions are validated and adapted through evaluating SLA violations. Our solution is tested on real workload data generated from Google data center. The achieved results show significant efficiency and feasibility of our model.
Dang Tran, Nhuan Tran, Giang Nguyen and Binh Minh Nguyen

ICCS 2017 Main Track (MT) Session 15

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 1.2

Chair: Jorge González-Domínguez

252 Letting researchers do research: A national structure for expert IT support [abstract]
Abstract: Information Technology (IT) impacts nearly every aspect of our lives and work, and is an essential tool for research. However, research presents new problems in IT which make its support challenging. These challenges are often unlike those experienced by typical organisational IT departments, and relate to the use of new, less stable or more cutting edge technologies and approaches. Supporting researchers in meeting these challenges has led to the creation of a network of specialist units at Swiss research organisations to provide Research IT support. These provide specialist support, letting researchers concentrate on their core tasks and speeding time to results. They are further federated through the eScience Coordination Team (eSCT, a Swiss national project) into a national network to support research. Here we discuss the difference between Core IT functions, and the research IT functions this new network seeks to support. This differentiation helps both Core IT and Research IT teams better make use of their skills and better serve their customers. Beyond this, we reflect on the organisational experiences generated through creating and operating these units and the national network they contribute to, and what lessons can be learned to assist with creation of Research IT functions elsewhere.
Owen Appleton, Alex Upton, Thomas Wüst, Bernd Rinn, Henry Luetcke, Vittoria Rezzonico, Gilles Fourestey, Dean Flanders, John White, Nabil Abdennadher, Thierry Sengstag, Eva Pujadas, Heinz Stockinger and Sergio Maffioletti
288 Asynchronous Decentralized Framework for Unit Commitment in Power Systems [abstract]
Abstract: Optimization of power networks is a rich research area that focuses mainly on efficiently generating and distributing the right amount of power to meet demand requirements across various geographically dispersed regions. The Unit Commitment (UC) problem is one of the critical problems in power network research that involves determining the amount of power that must be produced by each generator in the power network subject to numerous operational constraints. Growth of these networks coupled with increased interconnectivity and cybersecurity measures has created an encouraging platform for applying decentralized optimization paradigms. In this paper, we develop a novel asynchronous decentralized optimization framework to solve the UC problem. We demonstrate that our asynchronous approach outperforms conventional synchronous approaches, thereby promising greater gains in computational efficiency.
Paritosh Ramanan, Murat Yildirim, Edmond Chow and Nagi Gebraeel
458 An Advanced Software Tool to Simulate Service Restoration Problems: a case study on Power Distribution Systems [abstract]
Abstract: This paper presents a software tool to simulate a practical problem in smart grid systems. A feature of the smart grid is a system self-recovery capability in the occurrence of anomalies, such as a recovery of a power distribution network after an occurrence of a fault. When this system has a capacity for self-recovery, it is called self-healing. The intersection among areas as computer science, telecommunication, automation and electrical engineering, has allowed power systems to gain new technologies. However, because it is a multi-area domain, self-recovery simulation tools in smart grids are often highly complex as well as presenting low fidelity by using approximation algorithms. The main contribution of this paper is a simulator with high fidelity and low complexity in terms of programming, usability and semantics. In this simulator, a computational intelligence technique and a derivative method for calculating the power flow were encapsulated. The result is a software tool with high abstraction and easy customization, aimed at a self-healing system for a reconfiguration of an electric power distribution network.
Richardson Ribeiro
259 Disaggregated Computing. An Evaluation of Current Trends for Datacentres [abstract]
Abstract: Next generation data centers will likely be based on the emerging paradigm of disaggregated function-blocks-as-a-unit departing from the current state of mainboard-as-a-unit. Multiple functional blocks or bricks such as compute, memory and peripheral will be spread through the entire system and interconnected together via one or multiple high speed networks. The amount of memory available will be very large distributed among multiple bricks. This new architecture brings various benefits that are desirable in today’s data centers such as fine-grained technology upgrade cycles, fine-grained resource allocation, and access to a larger amount of memory and accelerators. An analysis of the impact and benefits of memory disaggregation is presented in this paper. One of the biggest challenges when analyzing these architectures is that memory accesses should be modeled correctly in order to obtain accurate results. However, modeling every memory access would generate a high overhead that can make the simulation unfeasible for real data center applications. A model to represent and analyze memory disaggregation has been designed and a statistics-based queuing-based full system simulator was developed to rapidly and accurately analyze applications performance in disaggregated systems. With a mean error of 10%, simulation results pointed out that the network layers may introduce overheads that degrade applications’ performance up to 66%. Initial results also suggest that low memory access bandwidth may degrade up to 20% applications’ performance.
Hugo Daniel Meyer, Jose Carlos Sancho, Josue Quiroga, Ferad Zyulkyarov, Damián Roca and Mario Nemirovsky
580 Using Power Demand and Residual Load Imbalance in the Load Balancing to Save Energy of Parallel Systems [abstract]
Abstract: The power consumption of the High Performance Computing (HPC) systems is an increasing concern as large-scale systems grow in size and, consequently, consume more energy. In response to this challenge, we have develop and evaluate new energy-aware load balancers to reduce the average power demand and save energy of parallel systems when scientific applications with imbalanced load are executed. Our load balancers combine dynamic load balancing with DVFS techniques in order to reduce the clock frequency of underloaded computing cores which experience some residual imbalance even after tasks are remapped. The results show that our load balancers present power reductions of 7.5% in average with the fine-grained variant that performs per-core DVFS, and of 18.75% with the coarse-grained variant that performs per-chip DVFS over real applications.
Edson Luiz Padoin, Philippe Navaux, Jean-Francois Mehaut and Víctor Eduardo Martínez Abaunza

Agent-based simulations, adaptive algorithms and solvers (ABS-AAS) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 7.1

Chair: Maciej Paszynski

-3 ICCS 2017 Workshop on Agent-Based Simulations, Adaptive Algorithms and Solvers [abstract]
Abstract: [No abstract available]
Aleksander Byrski, Maciej Paszynski, Robert Schaefer, Victor Calo and David Pardo
192 Quadrature blending for isogeometric analysis [abstract]
Abstract: We use blended quadrature rules to reduce the phase error of isogeometric analysis discretizations. To explain the observed behavior and quantify the approximation errors, we use the generalized Pythagorean eigenvalue error theorem to account for quadrature errors on the resulting weak forms. The proposed blended techniques improve the spectral accuracy of isogeometric analysis on uniform and non-uniform meshes for different polynomial orders and continuity of the basis functions. The convergence rate of the optimally blended schemes is increased by two orders with respect to the case when standard quadratures are applied. Our technique can be applied to arbitrary high-order isogeometric elements.
Victor Calo, Quanling Deng and Vladimir Puzyrev
81 Optimally refined isogeometric analysis [abstract]
Abstract: Performance of direct solvers strongly depends upon the employed discretization method. In particular, it is possible to improve the performance of Isogeometric Analysis (IGA) discretizations by introducing multiple $C^0$-continuity hyperplanes that act as separators during LU factorization [7]. In here, we further explore this venue by introducing separators of arbitrary continuity. Moreover, we develop an efficient method to obtain optimal discretizations in the sense that they minimize the time employed by the direct solver of linear equations. The search space consists of all possible discretizations obtained by enriching a given IGA mesh. Thus, the best approximation error is always reduced with respect to its IGA counterpart, while the solution time is decreased by up to a factor of 60.
Daniel Garcia, Michael Barton and David Pardo
538 Higher-Order Finite Element Electromagnetics Code for HPC environments [abstract]
Abstract: In this communication, an electromagnetic software suite developed to work in high performance computing (HPC) environments is presented. Details about the formulation used are provided, and an exhaustive flowchart is included and analyzed. Finally, results using HPC environments are shown.
Adrian Amor-Martin, Daniel Garcia-Donoro and Luis E. Garcia-Castillo
270 Coupled isogeometric Finite Element Method and Hierarchical Genetic Strategy with balanced accuracy for solving optimization inverse problem [abstract]
Abstract: The liquid fossil fuel reservoir exploitation problem (LFFEP) has not only economical signification but also strong natural environment impact. When the hydraulic fracturing technique is considered from the mathematical point of view it can be formulated as an optimization inverse problem, where we try to find optimal locations of pumps and sinks to maximize the amount of the oil extracted and to minimize the contamination of the groundwater. In the paper, we present combined solver consisting of the Hierarchical Genetic Strategy (HGS) with variable accuracy for solving optimization problem and isogeometric finite element method (IGA-FEM) with different mesh size for modeling a non-stationary flow of the non-linear fluid in heterogeneous media. The algorithm was tested and compared with the strategy using Simple Genetic Algorithm (SGA) as optimization algorithm and the same IGA-FEM solver for solving a direct problem. Additionally, a parallel algorithm for non-stationary simulations with isogeometric L2 projections is discussed and preliminarily assessed for reducing the computational cost of the solvers consisting of genetic algorithm and IGA-FEM algorithm. The theoretical asymptotic analysis which shows the correctness of algorithm and allows to estimate computational costs of the strategy is also presented.
Barbara Barabasz, Marcin Łoś, Maciej Woźniak, Leszek Siwik and Stephen Barrett

Simulations of Flow and Transport: Modeling, Algorithms and Computation (SOFTMAC) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 7.2

Chair: Shuyu Sun

302 Reduced Fracture Finite Element Model Analysis of an Efficient Two-Scale Hybrid Embedded Fracture Model [abstract]
Abstract: A Hybrid Embedded Fracture (HEF) model was developed to reduce various computational costs while maintaining physical accuracy (Amir and Sun, 2016). HEF splits the computations into fine scale and coarse scale. Fine scale solves analytically for the matrix-fracture flux exchange parameter. Coarse scale solves for the properties of the entire system. In literature, fractures were assumed to be either vertical or horizontal for simplification (Warren and Root, 1963). Matrix-fracture flux exchange parameter was given few equations built on that assumption (Kazemi, 1968; Lemonnier and Bourbiaux, 2010). However, such simplified cases do not apply directly for actual random fracture shapes, directions, orientations …etc. This paper shows that the HEF fine scale analytic solution (Amir and Sun, 2016) generates the flux exchange parameter found in literature for vertical and horizontal fracture cases. For other fracture cases, the flux exchange parameter changes according to the angle, slop, direction, … etc. This conclusion rises from the analysis of both: the Discrete Fracture Network (DFN) and the HEF schemes. The behavior of both schemes is analyzed with exactly similar fracture conditions and the results are shown and discussed. Then, a generalization is illustrated for any slightly compressible single-phase fluid within fractured porous media and its results are discussed.
Sahar Amir, Huangxin Chen and Shuyu Sun
22 Numerical Simulation of Rotation of Intermeshing Rotors using Added and Eliminated Mesh Method [abstract]
Abstract: To compute flows around objects with complicated motion like the intermeshing rotors, the unstructured moving grid finite volume method was developed. Computational elements are added and eliminated according to motion of rotors, to keep the computation domain around rotors which mutually reverse. Also, the geometric conservation law is satisfied in the method, using four dimensional space time unified domain for control volume. Using the method, accurate computation is carried out without interpolation of physical quantities. Applying to a flow around a sphere, computation procedure was established with introduction of concept of a hierarchical grid distinction. Then, the results of application to the flow around intermeshing rotors showed efficacy of the method. The results also showed applicability of the method to compute flows around any complicated motion.
Masashi Yamakawa, Naoya Mitsunari and Shinichi Asao
239 Extension of a regularization based time-adaptive numerical method for a degenerate diffusion-reaction-biofilm growth model to systems involving quorum sensing [abstract]
Abstract: We extend a regularization based numerical method for a highly degenerate partial differential equation that describes biofilm growth to systems of PDEs describing biofilms with several particulate substances. The example for which we develop the method is a quorum sensing biofilm which consists of donwn- and up-regulated biomass fractions. We carry out computational studies to assess the effect of the regularization parameter, a grid refinement study and report briefly on parallel performance of our code under OpenMP on desktop workstations.
Maryam Ghasemi and Hermann Eberl
428 A Fast Algorithm to Simulate Droplet Motions in Oil/Water Two Phase Flow [abstract]
Abstract: To improve the research methods in petroleum industry, we develop a fast algorithm to simulate droplet motions in oil and water two phase flow, using phase field model to describe the phase distribution in the flow process. An efficient partial difference equation solver—Shift-Matrix method is applied here, to speed up the calculation coding in high-level language, i.e. Matlab and R. An analytical solution of order parameter is derived, to define the initial condition of phase distribution. The upwind scheme is applied in our algorithm, to make it energy decay stable, which results in the fast speed of calculation. To make it more clear and understandable, we provide the specific code for forming the coefficient matrix used in Shift-Matrix Method. Our algorithm is compared with other methods in different scales, including Front Tracking and VOSET method in macroscopic and LBM method using RK model in mesoscopic scale. In addition, we compare the result of droplet motion under gravity using our algorithm with the empirical formula common used in industry. The result proves the high efficiency and robustness of our algorithm and it’s then used to simulate the motions of multiple droplets under gravity and cross-direction forces, which is more practical in industry and can be extended to wider application.
Tao Zhang, Shuyu Sun and Bo Yu
175 Similarity Conversion of Centrifugal Natural Gas Compressors Based on Predictor-Corrector [abstract]
Abstract: Centrifugal compressors are one of the most commonly used equipments powering the long distance natural gas pipeline. In this paper, a similarity conversion method of centrifugal natural gas compressors based on predictor-corrector was proposed. In other words, we used one similarity conversion to predict the key parameter and the other was used as the correction. Compared with the field test data, we found the error of the predicted outlet pressure of the compressor was controlled at about 2% and the outlet temperature fluctuated within 2℃, which could satisfy the engineering application requirements.
Liyan Wang, Peng Wang, Zhizhu Cao, Bo Yu and Wang Li

Multiscale Modelling and Simulation (MMS) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 3.2

Chair: Derek Groen

-1 Multiscale Modelling and Simulation, 14th International Workshop [abstract]
Abstract: [No abstract available]
Derek Groen, Valeria Krzhizhanovskaya, Alfons Hoekstra, Bartosz Bosak and Petros Koumoutsakos
297 Multiscale Computing Patterns for High Performance Computers [abstract]
Abstract: Moving into the era of exascale machines will lead to drastic changes in the way we use HPC resources, especially for multiscale applications [1, 2]. Hence, there is an increasing demand to device generic methods to execute such multiscale applications on emerging exascale resources. For this we propose generic multiscale computing pattern. These patterns should map in the most efficient way single scale components of a multiscale application on heterogeneous architectures [2]. This research [2] is aimed at identifying and analysing generic multiscale computing patterns in multiscale models to increase the efficient, fault tolerant, and energy-aware usage of HPC computing resources [3–6]. The vision of multiscale computing patterns is rooted in the Multiscale Modelling and Simulation Framework (MMSF) [7–11], which plays a pivotal role in designing, utilising and programming various multiscale applications. A multiscale model in the MMSF is described as a coordinated implementation of single-scale models that are coupled using scale bridging mechanisms. The main components of the MMSF are the scale separation map, coupling topology, multiscale modeling language - with different flavors- and task graphs. The MMSF has shown its capability on a range of multi-sciences applications(e.g. fusion [10, 12], computational biology [10, 13, 14], bio medicine [10, 15–21], nano material science [10, 22, 23], and hydrology [10]). Based on the multiscale models task graphs, we propose one generic task graph per set of multiscale models (a pattern) [2]. The main target of these graphs is to capture the behavior of multiscale applications. The main assumption here is that we can develop an algorithm per pattern that would cover all multiscale scientific applications in the same scenario. A Multiscale Computing Pattern (MCP) can be defined as high-level call sequences that exploit the functional decomposition of multiscale models in terms of single scale models. Taking the generic task graph, multi-scale model information and single-scale performance data, we can apply pattern services to optimise the work based on the desired approach (i.e. optimal mapping based on efficient usage of resources, less wall clock time, load balance, energy efficiency, fault tolerance or total submission to execution time). We distinguish three different types of computing patterns, namely Extreme Scaling (ES), Heterogeneous Multiscale Computing (HMC)and Replica Computing (RC) patterns [2]. We argue that these patterns have the capacity to ensure load balancing, energy awareness and fault tolerance, with the virtues of effective multiscale simulations on exascale HPC resources. This would be achieved by ensuring the best mapping of single-scale models on computing resources as well as applying effective check-pointing strategies and energy analysis.The Extreme Scaling computing pattern represents a situation wherein a sole or a few of the single scale models necessitate exascale performance as coupled to other less expensive models.The Heterogeneous Multiscale Computing pattern denotes to a macroscale model that spawns a large number of instances of microscale models based on set of decisions by an HMC manager.Replica Computing signifies a setting wherein numerous replicas are implemented in the form of a single scale outline with different communication forms. Based on the communication, we defined three different flavors of RC namely ensemble simulations, dynamic ensemble simulations and replica-exchange simulations (for details we refer to [2]). The findings of the study thus indicated the potentiality of these three patterns (ES, HMC and RC) in aligning multiscale applications to computing resources. The Extreme Scaling computing pattern, for instance,helps in ensuring better performance by utilising load balancing between different submodels based on their computation power or energy consumption. The Heterogeneous Multiscale Computing pattern is primarily based on the heterogeneous multiscale method (HMM) [24–26]. This pattern represents a widely known micro-macro multiscale models. Here, a set of microscopic models is coupled to a macroscopic model. Utilising a HMC manager, which uses a dedicate database to restrict the number of microscale simulation needed and enhance re-using data, will increase the usage of extensive resources of parallel computing [27]. Replica Computing is recognised as a combination of numerous petascale and terascale simulations (replicas), which generate statistically vigorous and scientifically vital out-comes. Achieving the best resources of a single replica is a very critical step towards the best usage of resources for the multiscale applications of this kind. One example is estimating binding affinities, where persistent amid proteins and small compounds through the exchange of simulation data between single-scale models are performed [28–30]. The proposed patterns can address the exascale challenges on the level of multiscale computing. The developers of the multiscale models can concentrate on the efficiency of the single-scale models. The execution environment of e.g MMSF will take care of these issues with help of patterns. This can be achieved by having sufficient amount of description of multiscale model(as in xMML) and single-scale performance and power consumption measurements. Then a pattern software can be developed to generate execution scenarios to enhance the current execution based on the required optimization target (i.e. optimal mapping based on efficient usage of resources, less wall clock time, load balance, energy efficiency, fault tolerance or total submission to execution time).
Saad Alowayyed, Derek Groen, Peter Coveney and Alfons Hoekstra
124 Dynamic load balancing for CAFE multiscale modelling methods for heterogeneous hardware infrastructure [abstract]
Abstract: Conventional load balancing algorithms, i.e. for one computing method in one scale, are very well known in literature and developed since 70’s when solutions based on binary trees and Parallel Virtual Machine were created. Since that time more than twenty thousand scientific papers were published in this area. Nowadays the most important part of this branch of science is related to two aspects of balancing i.e. between different computing nodes and inside single node with many CPUs and many computing devices with multicore heterogeneous architectures. The first aspect is studied for homogeneous as well as heterogeneous infrastructures. The second is mainly caused by an unpredictable behavior of sophisticated numerical algorithms on computing devices with hierarchical access to memory typical for NUMA design. This aspect is analyzed regarding scheduling, load balancing and work stealing between computing devices and inside particular device. In this paper both aspects are important. The paper presents new approach to scheduling and Dynamic Load Balancing (DBL) of tightly and loosely coupled multiscale modelling methods executed in heterogeneous hardware infrastructure. The most popular configurations of computing nodes composed of modern multicore CPUs, GPUs and co-processors, are used. The proposed load balancing approach takes into account computational character of methods applied in particular scales, which depends on a size of input data, operational intensity and limitations of hardware architecture. Such constrains are defined by the Roofline model and used in the algorithm as boundary conditions, allowing to determine maximum performance of an algorithm for particular device. The upscaling multiscale approaches are analysed, which in this paper are represented by Cellular Automata Finite Element (CAFE) method. The qualitative as well as quantitative results, obtained after application of proposed load balancing procedure, are discussed in the paper in details.
Lukasz Rauch
623 Performance Monitoring of Multiscale Applications [abstract]
Abstract: The use of performance analysis tools has always been prevalent in HPC, to understand application behaviour and ensure machine utilization. However, these profiles often take an application centric perspective, profiling and visualising a single application at a time, or a machine centric perspective, thus losing information about the specific applications. With multiscale computational patterns, multiple application runs are coupled together to form a workflow. As such, performance analysis tools must be able to capture the context of multiple application runs and combine the resulting data, a capability that is widely lacking in existing tools. This presentation will cover how the profiling tool Allinea MAP has be extended, using a custom metrics interface and a JSON export capability, to support the profiling and visualisation of multiscale models. The demonstration will focus on collecting domain specific data from the MUSCLE2 communication library and its combination with existing data sources. The data is then exported for analysis and visualisation in open source tools, such as Python and Kibana.
Oliver Perks and Keeran Brabazon

Workshop on Computational Optimization,Modelling and Simulation (COMS) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG D 5.2

Chair: Xin-She Yang

203 Global Convergence Analysis of the Flower Pollination Algorithm: A Discrete-Time Markov Chain Approach [abstract]
Abstract: Flower pollination algorithm is a recent metaheuristic algorithm for solving nonlinear global optimization problems. The algorithm has also been extended to solve multiobjective optimization with promising results. In this work, we analyze this algorithm mathematically and prove its convergence properties by using Markov chain theory. By constructing the appropriate transition probability for a population of flower pollen and using the homogeneity property, it can be shown that the constructed stochastic sequences can converge to the optimal set. Under the two proper conditions for convergence, it is proved that the simplified flower algorithm can indeed satisfy these convergence conditions and thus the global convergence of this algorithm can be guaranteed. Numerical experiments are used to demonstrate that the flower pollination algorithm can converge quickly and can thus achieve global optimality efficiently.
Xingshi He, Xin-She Yang and Mehmet Karamanoglu
217 Memetic Simulated Annealing for Data Approximation with Local-Support Curves [abstract]
Abstract: This paper introduces a new memetic optimization algorithm called MeSA (Memetic Simulated Annealing) to address the data fitting problem with local-support free-form curves. The proposed method hybridizes simulated annealing with the COBYLA local search optimization method. This approach is further combined with the centripetal parameterization and the Bayesian information criterion to compute all free variables of the curve reconstruction problem with B-splines. The performance of our approach is evaluated by its application to four different shapes with local deformations and different degrees of noise and density of data points. The MeSA method has also been compared to the non-memetic version of SA. Our results show that MeSA is able to reconstruct the underlying shape of data even in the presence of noise and low density point clouds. It also outperforms SA for all the examples in this paper.
Carlos Loucera, Andres Iglesias Prieto and Akemi Galvez-Tomida
326 A Matheuristic Approach for Solving the Dynamic Facility Layout Problem [abstract]
Abstract: The Dynamic Facility Layout Problem (DFLP) is designing a facility over a multi-period planning horizon where the interdepartmental material flows change from one period to the next one due to changes in product demands. The DFLP is used while designing manufacturing and logistics facilities over multiple planning periods; however, it is a very challenging nonlinear optimization problem. In this paper, a zone-based block layout is used to design manufacturing and logistics facilities over multiple planning periods. A zone-based block layout inherently includes possible aisle structures, which can easily be adapted to different material handling systems. The unequal area DFLP is modeled and solved using a zone-based structure where the dimensions of the departments are decision variables and the departments are assigned to flexible zones with a pre-structured positioning. A matheuristic approach, which combines concepts from Tabu Search (TS) and mathematical programming, is proposed to solve the zone-based DFLP on the continuous plane with unequal area departments. The TS determines the relative locations of departments and their assignments to zones while their exact locations and shapes are calculated by the mathematical programming. Numerical results for a set of test problems from the literature showed that our proposed matheuristic approach is promising.
Sadan Kulturel-Konak
64 Job-flow Anticipation Scheduling in Grid [abstract]
Abstract: In this paper, a heuristic user job-flow scheduling approach for Grid virtual organizations with non-dedicated resources is discussed. Users' and resource providers' preferences, virtual organization's internal policies, resources geographical distribution along with local private utilization impose specific requirements for efficient scheduling according to different, usually contradictive, criteria. With increasing resources utilization level the available resources set and corresponding decision space are reduced. This further complicates the task of efficient scheduling. In order to improve overall scheduling efficiency we propose a heuristic anticipation scheduling approach. Initially it generates a near optimal but infeasible scheduling solution which is then used as a reference for efficient resources allocation.
Victor V. Toporkov, Dmitry Yemelyanov and Alexander Bobchenkov

Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG F 33.1

Chair: Stephane Louise

-2 Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface [abstract]
Abstract: [No abstract available]
Johanna Sepulveda, Jeronimo Castrillon and Vania Marangozova-Martin
587 A multi-level optimization strategy to improve the performance of the stencil computation [abstract]
Abstract: Stencil computation represents an important numerical kernel in scientific computing. Leveraging multicore or manycore parallelism to optimize such operations represents a major challenge due both to the bandwidth demand and the low arithmetic intensity. The situation is worsened by the complexity of current architectures and the potential impact of various mechanisms (cache memory, vectorization, compilation). In this paper, we describe a multi-level optimization strategy that combines manual vectorization, space tiling and stencil composition. A major effort of this study is the comparison of our results with Pochoir stencil compiler framework. We evaluate our methodology with a set of three different compilers (Intel, Clang and GCC) on two recent generations of Intel multicore platforms. Our results show a good match with the theoretical performance models (i.e. roofline models). We also outperform Pochoir performance by a factor of x2.5 in the best cases.
Gauthier Sornet, Fabrice Dupros and Sylvain Jubertie
503 Towards Protected MPSoC Communication for Information Protection against a Malicious NoC [abstract]
Abstract: Multi-Processors System-on-Chip (MPSoCs) design is based on the integration of several third-party Intellectual Property (IP) cores. Some of those IPs may include Trojans, extra hardware that can be triggered during operation time in order to perform an attack. Network-on-Chip(NoC), the communication IP of MPSoCs, can include Trojans that spy, modify and denial the sensitive communication inside the chip. Previous works address the malicious NoC threat. However, to find secure and efficient solutions is still a challenge. In this work we propose a novel secure network interfaces that implements a tunnel-based protocol that allows the secure exchange of sensitive data, even in the presence of a malicious NoC. We test our technique under several real application and synthetic traffic and attack scenarios and show that it is a secure and efficient solution.
Johanna Sepulveda, Andreas Zankl, Daniel Florez and Georg Sigl
589 A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor [abstract]
Abstract: The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed shared memory (DSM) model supported in software transparently using C++ templated meta-programming techniques. The approach offers an extremely simple parallel programming model well suited for the architecture. Initial results are presented that demonstrate the approach and provide insight into the efficiency of the programming model and also the ability of the NoC to support a DSM without explicit control over data movement and localization.
David Richie, James Ross and Jamie Infantolino
595 An OpenMP backend for the Sigma-C streaming language [abstract]
Abstract: The ΣC (pronounced “Sigma-C”) language was initially designed for Kalray’s MPPA embedded many-core processor. Nonetheless, it was imagined as a target independent language based on C and allowing the Cyclo-Static Data-Flow (CSDF) model of computation. Until now, it was only available for the first generation of the MPPA chip. In this paper, we show how we built an OpenMP back-end for the ΣC language, and we used this compiler to evaluate some of the assets of stream programming and some limitations of the current implementation, by evaluating the performance on several benchmark programs. This new back-end could open the way to utilize this language to study embedded stream programming concepts or to program HPC applications.
Stephane Louise

Tools for Program Development and Analysis in Computational Science (Tools) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG E 33.3

Chair: Andreas Knüpfer

450 Performance Analysis of Parallel Python Applications [abstract]
Abstract: Python is progressively consolidating itself within the HPC community with its simple syntax, large standard library, as well as powerful third-party libraries for scientific computing that are especially attractive to domain scientists. Despite Python lowering the bar for accessing parallel computing, utilizing the capacities of HPC systems efficiently remains a challenging task, after all. Yet, at the moment only few supporting tools exist and provide merely basic information in the form of summarized profile data. In this paper, we present our efforts in developing event-based tracing support for Python within the performance monitor Extrae to provide detailed information and enable a profound performance analysis. We present concepts to record the complete communication behavior as well as to capture entry and exit of functions in Python to provide the according application context. We evaluate our implementation in Extrae by analyzing the well-established electronic structure simulation package GPAW and demonstrate that the recorded traces provide equivalent information as for traditional C or Fortran applications and, therefore, offering the same profound analysis capabilities now for Python, as well.
Michael Wagner, Germán Llort, Estanislao Mercadal, Judit Giménez and Jesús Labarta
159 Scaling Score-P to the next level [abstract]
Abstract: As part of performance measurements with Score-P, a description of the system and the execution locations is recorded into the performance measurement reports. For large-scale measurements using a million or more processes, the global system description can consume all the available memory. While the information stored process-locally during measurement is small, the memory requirement becomes a bottleneck in the process of constructing a global representation of the whole system. To address this problem we implemented a new system description in Score-P that exploits regular structures of the system, and results, on homogeneous systems, in a system description of constant size. Furthermore, we present a parallel algorithm to create a global view from the process-local information. The scalable system description comes at the price that it is no longer possible to assign individual names to each system element, but only enumerate elements of the same type. We have successfully tested the new approach on the full JUQUEEN system with up to nearly two million processes.
Daniel Lorenz and Christian Feld
528 Design Evaluation of a Performance Analysis Trace Repository [abstract]
Abstract: Parallel and high performance computing experts are obsessed with performance and scalability. Performance analysis and tuning are important and complex but there is a number of software tools to support this. One methodology for such tools is detailed recording of parallel runtime behavior in event traces and their subsequent analysis. This regularly produces very large data sets with their own challenges for handling and data management. This paper evaluates the utilization of the MASi research data management service as a trace repository to store, manage, and find traces in an efficient and usable way. First, we give an introduction to trace technologies in general, metadata in OTF2 traces specifically, and the MASi research data management service. Then, the trace repository is described with its potential for both performance analysts and parallel tool developers, followed with how we implemented it using existing metadata and how it can utilized. Finally, we give an outlook on how we plan to put the repository into productive use for the benefit of researchers using traces.
Richard Grunzke, Maximilian Neumann, Thomas Ilsche, Volker Hartmann, Thomas Jejkal, Rainer Stotzka, Andreas Knüpfer and Wolfgang E. Nagel
475 Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP [abstract]
Abstract: A software framework has been developed for use in parallel boundary element method (BEM) analyses. The framework program was parallelized in a hybrid parallel programming model, and both multiple processes and threads were used. Additionally, an H-matrix library for a distributed memory parallel computer was also developed to accelerate the analysis. In this paper, we describe the basic design concept for the framework and details of its implementation. The framework program, which was written with MPI functions and OpenMP directives, is mainly intended to reduce the user’s parallel programming costs. We also show the results of a sample analysis performed with approximately 60,000 unknowns. The numerical results verify the effectiveness of both the parallelization and the H-matrix method. In the test analysis, which was performed using a single core, the H-matrix version of the framework is 17-fold faster than the dense matrix version. The parallel framework program with the H-matrix attains an approximately 50-fold acceleration using 128 cores when compared with sequential computation.
Takeshi Iwashita, Akihiro Ida, Takeshi Mifune and Yasuhito Takahashi

Applications of Matrix Computational Methods in the Analysis of Modern Data (AMCMD) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG E 41

Chair: Raja Velu

625 The epsilon-algorithm in matrix computations [abstract]
Abstract: E-algorithm is designed to expedite iterative algorithms in matrix computation. In this talk, this algorithm is explained in the connection with Krylov-Subspace Methods and matrix completion problem.
Walter Gander
39 Clustering Mixed-Attribute Data using Random Walk [abstract]
Abstract: Most clustering algorithms rely in some fundamental way on a measure of either similarity or distance, either between objects themselves, or between objects and cluster centroids. When the dataset contains mixed attributes, defining a suitable measure can be problematic. This paper presents a general graph-based method for clustering mixed-attribute datasets that does not require any explicit measure of similarity or distance. Empirical results on a range of well-known datasets using a range of evaluation measures show that the method achieves performance competitive with traditional clustering algorithms that require explicit calculation of distance or similarity, as well as with more recently proposed clustering algorithms based on matrix factorization.
Andrew Skabar
274 Regularized Computation of Oscillatory Integrals with Stationary Points [abstract]
Abstract: Ability to calculate integrals of rapidly oscillating functions is crucial for solving many problems in optics, electrodynamics, quantum mechanics, nuclear physics, and many other areas. The article considers the method of computing oscillatory integrals with the help of the transition to the numerical solution of the system of ordinary differential equations. Using the Levin’s collocation method, we reduce the problem to solving a system of linear algebraic equations. In the case where the phase function has stationary points, (its derivative vanishes on the interval of integration) the solution of the corresponding system becomes an ill-posed task. The regularized algorithm presented in the article describes the stable method of integration of rapidly oscillating functions at the presence of stationary points. Performance and high accuracy of the algorithms illustrated by various examples.
Konstantin P. Lovetskiy, Leonid A. Sevastianov and Nikolai Ed. Nikolaev
536 Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices [abstract]
Abstract: A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well.
Tingxing Dong, Azzam Haidar, Stanimire Tomov and Jack Dongarra