### Modeling and Simulation of Large-scale Complex Urban Systems (MASCUS) Session 1

#### Chair: Heiko Aydt

 707 Cellular Automata-based Anthropogenic Heat Simulation [abstract]Abstract: Cellular automata (CA) models have been for several years, employed to describe urban phenomena like growth of human settlements, changes in land use and, more recently, dispersion of air pollutants. We propose to adapt CA to study the dispersion of anthropogenic heat emissions on the micro scale. Three dimensional cubic CA with a constant cell size of 0.15m have been implemented. Simulations suggest an improvement in processing speed compared to conventional computational fluid dynamics (CFD) models, which are limited in scale and yet incapable of solving simulations on local or larger scale. Instead of solving the Navier-Stokes equations, as in CFD, only temperature and heat differences for the CA are modeled. Radiation, convection and turbulence have been parameterized according to scale. This CA based approach can be combined with an agent-based traffic simulation to analyse the effect of driving behavior and other microscopic factors on urban heat. Michael Wagner, Vaisagh Viswanathan, Dominik Pelzer, Matthias Berger, Heiko Aydt 128 Measuring Variability of Mobility Patterns from Multiday Smart-card Data [abstract]Abstract: Available large amount of mobility data stimulates the work in discovering patterns and understanding regularities. Comparatively, less attention has been paid to the study of variability, which, however, has been argued as equally important as regularities in previous related work, since variability identifies diversity. In a transport network, variability exists from day to day, from person to person, and from place to place. In this paper, we present a set of measuring of variability at individual and aggregated levels using multi-day smart-card data. Statistical analysis, correlation matrix and network-based clustering are applied and the potential usage of measured results for urban applications are discussed. We take Singapore as a case study and use one-week smart-card data for analysis. An interesting finding is that though the number of trips and mobility patterns varies from day to day, the overall spatial structure of urban movement remains the same throughout the whole week. We consider this paper as a tentative work towards a generic framework for measuring regularity and variability, which contributes to the understanding of transit, social and urban dynamics. Chen Zhong, Ed Manley, Michael Batty and Gerhard Schmitt 500 The Resilience of the Encounter Network of Commuters for a Metropolitan Public Bus System [abstract]Abstract: We analyse the structure and resilience of a massive encounter network generated from commuters who share the same bus ride on a single day. The network is created by using smartcard data that contains detailed travel information of all the commuters who utilised the public bus system during a typical weekday in the whole of Singapore. We show that the network structure is of random-exponential type with small world features rather than a scale-free network. Within one day, 99.97% of all commuters became connected approximately within 7 steps of each other. We report on how this network structure changes upon application of a threshold based on the encounter duration (TE). Among others, we demonstrate a 50% reduction on the size of the giant cluster when TE=15mins. We then assess the dynamics of infection spreading by comparing the effect of both random and targeted node removal strategies. By assuming that the network characteristic is invariant day after day, our simulation indicates that without node removal, 99% of the commuter network became infected within 7 days of the onset of infection. While a targeted removal strategy was shown to be able to delay the onset of the maximum number of infected individuals, it was not able to isolate nodes that remained within the giant component. Muhamad Azfar Ramli, Christopher Monterola 84 Facilitating model reuse and integration in an urban energy simulation platform [abstract]Abstract: The need for more sustainable, liveable and resilient cities demands improved methods for studying urban infrastructures as integrated wholes. Progress in this direction would be aided by the ability to effectively reuse and integrate existing computational models of urban systems. Building on the concept of multi-model ecologies, this paper describes ongoing efforts to facilitate model reuse and integration in the Holistic Urban Energy Simulation (HUES) platform - an extendable simulation environment for the study of urban multi-energy systems. We describe the design and development of a semantic wiki as part of the HUES platform. The purpose of this wiki is to enable the sharing and navigation of model metadata - essential information about the models and datasets of the platform. Each model and dataset in the platform is represented in the wiki in a structured way to facilitate the identification of opportunities for model reuse and integration. As the platform grows, this will help to ensure that it develops coherently and makes efficient use of existing formalized knowledge. We present the core concepts of multi-model ecologies and semantic wikis, the current state of the platform and associated wiki, and a case study demonstrating their use and benefit. Lynn Andrew Bollinger, Ralph Evins

### Modeling and Simulation of Large-scale Complex Urban Systems (MASCUS) Session 2

#### Chair: Heiko Aydt

 555 Reducing Computation Time with a Rolling Horizon Approach Applied to a MILP Formulation of Multiple Urban Energy Hub System [abstract]Abstract: Energy hub model is a powerful concept allowing the interactions of many energy conversion and storage systems to be optimized. Solving the optimal configuration and operating strategy of an energy hub combining multiple energy sources for a whole year can become computationally demanding. Indeed the effort to solve a mixed-integer linear programming (MILP) problem grows dramatically with the number of integer variables. This paper presents a rolling horizon approach applied to the optimisation of the operating strategy of an energy hub. The focus is on the computational time saving realized by applying a rolling horizon methodology to solve problems over many time-periods. The choice of rolling horizon parameters is addressed, and the approach is applied to a model consisting of a multiple energy hubs. This work highlights the potential to reduce the computational burden for the simulation of detailed optimal operating strategies without using typical-periods representations. Results demonstrate the possibility to improve by 15 to 100 times the computational time required to solve energy optimisation problems without affecting the quality of the results. Julien F. Marquant, Ralph Evins, Jan Carmeliet 307 Economic, Climate Change, and Air Quality Analysis of Distributed Energy Resource Systems [abstract]Abstract: This paper presents an optimisation model and cost-benefit analysis framework for the quantification of the economic, climate change, and air quality impacts of the installation of a distributed energy resource system in the area surrounding Paddington train station in London, England. A mixed integer linear programming model, called the Distributed Energy Network Optimisation (DENO) model, is employed to design the optimal energy system for the district. DENO is then integrated into a cost-benefit analysis framework that determines the resulting monetised climate change and air quality impacts of the optimal energy systems for different technology scenarios in order to determine their overall economic and environmental impacts. Akomeno Omu, Adam Rysanek, Marc Stettler, Ruchi Choudhary 616 Towards a Design Support System for Urban Walkability [abstract]Abstract: In the paper we present an urban design support tool centered on pedestrian accessibility and walkability of places. Differently from standard decision support systems developed for the purpose of evaluating given pre-defined urban projects and designs, we address the inverse problem to have the software system itself generate hypotheses of projects and designs, given some (user-provided) objectives and constraints. Taking as a starting point a model for evaluating walkability , we construct a variant of a multi-objective genetic algorithm (specifically NSGA-II) to produce the frontier of non-dominated design alternatives to satisfy certain predefined constraints. By way of example, we briefly present an application of the system to a real urban area. Ivan Blecic, Arnaldo Cecchini, Giuseppe A. Trunfio

### Solving Problems with Uncertainties (SPU) Session 1

#### Chair: Vassil Alexandrov

 455 An individual-centric probabilistic extension for OWL: Modelling the Uncertainness [abstract]Abstract: The theoretical benefits of semantics as well as their potential impact on IT are well known concepts, extensively discussed in literature. As more and more systems are currently using or referring semantic technologies, the challenging third version of the web (Semantic Web or Web 3.0) is progressively taking shape. On the other hand, apart from the relatively limited capabilities in terms of expressiveness characterizing current concrete semantic technologies, theoretical models and research prototypes are actually overlooking a significant number of practical issues including, among others, consolidated mechanisms to manage and maintain vocabularies, shared notations systems and support to high scale systems (Big Data). Focusing on the OWL model as the current reference technology to specify web semantics, in this paper we will discuss the problem of approaching the knowledge engineering exclusively according to a deterministic model and excluding a priori any kind of probabilistic semantic. Those limitations determine that most knowledge ecosystems including, at some level, probabilistic information are not well suited inside OWL environments. Therefore, despite the big potential of OWL, a consistent number of applications are still using more classic data models or unnatural hybrid environments. But OWL, even with its intrinsic limitations, reflects a model flexible enough to support extensions and integrations. In this work we propose a simple statistical extension for the model that can significantly spread the expressiveness and the purpose of OWL. Salvatore Flavio Pileggi 457 Relieving Uncertainty in Forest Fire Spread Prediction by Exploiting Multicore Architectures [abstract]Abstract: The most important aspect that affects the reliability of environmental simulations is the uncertainty on the parameter settings describing the environmental conditions, which may involve important biases between simulation and reality. To relieve such arbitrariness, a two-stage prediction method was developed, based on the adjustment of the input parameters according to the real observed evolution. This method enhances the quality of the predictions, but it is very demanding in terms of time and computational resources needed. In this work, we describe a methodology developed for response time assessment in the case of fire spread prediction, based on evolutionary computation. In addition, a parallelization of one of the most important fire spread simulators, FARSITE, was carried out to take advantage of multicore architectures. This allows us to design proper allocation policies that significantly reduce simulation time and reach successful predictions much faster. A multi-platform performance study is reported to analyze the benefits of the methodology. Andrés Cencerrado, Tomàs Vivancos, Ana Cortés, Tomàs Margalef 723 Populations of models, Experimental Designs and coverage of parameter space by Latin Hypercube and Orthogonal Sampling [abstract]Abstract: In this paper we have used simulations to make a conjecture about the coverage of a $t$ dimensional subspace of a $d$ dimensional parameter space of size $n$ when performing $k$ trials of Latin Hypercube sampling. This takes the form $P(k,n,d,t)=1-e^{-k/n^{t-1}}$. We suggest that this coverage formula is independent of $d$ and this allows us to make connections between building Populations of Models and Experimental Designs. We also show that Orthogonal sampling is superior to Latin Hypercube sampling in terms of allowing a more uniform coverage of the $t$ dimensional subspace at the sub-block size level. Bevan Thompson, Kevin Burrage, Pamela Burrage, Diane Donovan 340 Analysis of Space-Time Structures Appearance for Non-Stationary CFD Problems [abstract]Abstract: The paper presents a combined approach to finding conditions for space-time structures appearance in non-stationary flows for CFD (computational fluid dynamics) problems. We consider different types of space-time structures, for instance, such as boundary layer separation, vortex zone appearance, appearance of oscillating regimes, transfer from Mach reflection to regular one for shock waves, etc. The approach combines numerical solutions of inverse problems and parametric studies. Parallel numerical solutions are implemented. This approach is intended for fast approximate estimation for dependence of unsteady flow structures on characteristic parameters (or determining parameters) in a certain class of problems. The numerical results are presented in a form of multidimensional data volumes. To find out hidden dependencies in the volumes some multidimensional data processing and visualizing methods should be applied. The approach is organized in a pipeline fashion. For certain classes of problems the approach allows obtaining the sought-for dependence in a quasi-analytical form. The proposed approach can be considered to provide some kind of generalized numerical experiment environment. Examples of its application to a series of practical problems are given. The approach can be applied to CFD problems with ambiguities. Alexander Bondarev, Vladimir Galaktionov

### Solving Problems with Uncertainties (SPU) Session 2

#### Chair: Vassil Alexandrov

 509 Discovering most significant news using Network Science approach [abstract]Abstract: The role of social network mass media increased greatly in the recent years. We investigate news publications in Twitter from the point of view of Network Science. We analyzed news data posted by the most popular media sources to reveal the most significant news over some period of time. Significance is a qualitative property that reflects the news impact degree at society and public opinion. We define the threshold of significance and discover a number of news which were significant for society in period from July 2014 up to January 2015. Ilya Blokh, Vassil Alexandrov 713 Towards Understanding Uncertainty in Cloud Computing Resource Provisioning [abstract]Abstract: In spite of extensive research of uncertainty issues in different fields ranging from computational biology to decision making in economics, a study of uncertainty for cloud computing systems is limited. Most of works examine uncertainty phenomena in users’ perceptions of the qualities, intentions and actions of cloud providers, privacy, security and availability. But the role of uncertainty in the resource and service provisioning, programming models, etc. have not yet been adequately addressed in the scientific literature. There are numerous types of uncertainties associated with cloud computing, and one should to account for aspects of uncertainty in assessing the efficient service provisioning. In this paper, we tackle the research question: what is the role of uncertainty in cloud computing service and resource provisioning? We review main sources of uncertainty, fundamental approaches for scheduling under uncertainty such as reactive, stochastic, fuzzy, robust, etc. We also discuss potentials of these approaches for scheduling cloud computing activities under uncertainty, and address methods for mitigating job execution time uncertainty in the resource provisioning. Andrei Tchernykh, Uwe Schwiegelsohn, Vassil Alexandrov, El-Ghazali Talbi 507 Monte Carlo method for density reconstruction based on insucient data [abstract]Abstract: In this work we consider the problem of reconstruction of unknown density based on a given sample. We present a method for density reconstruction which includes B-spline approximation, least squares method and Monte Carlo method for computing integrals. The error analysis is provided. The method is compared numerically with other statistical methods for density estimation and shows very promising results. Aneta Karaivanova, Sofiya Ivanovska, Todor Gurov 20 Total Least Squares and Chebyshev Norm [abstract]Abstract: We investigate the total least square problem with Chebyshev norm instead of the traditionally used Frobenius norm. Using Chebyshev norm is motivated by seeking for robust solutions. In order to solve the problem, we make link with interval computation and use many of results developed there. We show that the problem is NP-hard in general, but it becomes polynomial in the case of a fixed number of regressors. This is the most important result for practice since usually we work with regression models with a low number of regression parameters (compared to the number of observations). We present not only an precise algorithm for the problem, but also a computationally cheap heuristic. We illustrate the behavior of our method in a particular probabilistic setup by a simulation study. Milan Hladik, Michal Cerny

### Mathematical Methods and Algorithms for Extreme Scale (MMAES) Session 1

#### Chair: Vassil Alexandrov

 127 Efficient Algorithm for Computing the Ergodic Projector of Markov Multi-Chains [abstract]Abstract: This paper extends the Markov uni-chain series expansion theory to Markov multi-chains, i.e., to Markov chains having multiple ergodic classes and possible transient states. The introduced series expansion approximation (SEA) provides a controllable approximation for Markov multi-chain ergodic projectors which may be a useful tool in large-scale network analysis. As we will illustrate by means of numerical examples, the new algorithm is for large networks faster than the power algorithm. Joost Berkhout, Bernd Heidergott 376 Transmathematical Basis of Infinitely Scalable Pipeline Machines [abstract]Abstract: A current Grand Challenge is to scale high-performance machines up to exascale. Here we take the theoretical approach of setting out the mathematical basis of pipeline machines that are infinitely scalable, whence any particular scale can be achieved as technology allows. We briefly discuss both hardware and software simulations of such a machine, which lead us to believe that exascale is technologically achievable now. The efficiency of von Neumann machines declines with increasing size but our pipeline machines retain constant efficiency regardless of size. These machines have perfect parallelism in the sense that every instruction of an inline program is executed, on successive data, on every clock tick. Furthermore programs with shared data effectively execute in less than a clock tick. We show that pipeline machines are faster than single or multi-core, von Neumann machines for sufficiently many program runs of a sufficiently time consuming program. Our pipeline machines exploit the totality of transreal arithmetic and the known waiting time of statically compiled programs to deliver the interesting property that they need no hardware or software exception handling. James Anderson 420 Multilevel Communication optimal Least Squares [abstract]Abstract: Using a recently proposed communication optimal variant of TSQR, weak scalability of the least squares solver (LS) with multiple right hand sides is studied. The communication for TSQR based LS solver for multiple right hand sides remains optimal in the sense that no additional messages are necessary compared to TSQR. However, LS has additional communication volume and flops compared to that for TSQR. Additional flops and words sent for LS is derived. A PGAS model, namely, global address space programming framework (GPI) is used for inter-nodal one sided communication. Within NUMA sockets, C++-11 threading model is used. Scalability results of the proposed method up to a few thousand cores are shown. Pawan Kumar 406 Developing A Large Time Step, Robust, and Low Communication Multi-Moment PDE Integration Scheme for Exascale Applications [abstract]Abstract: The Boundary Averaged Multi-moment Constrained finite-Volume (BA-MCV) method is derived, explained, and evaluated for 1-D transport to assess accuracy, maximum stable time step (MSTS), oscillations for discontinuous data, and parallel communication burden. The BA-MCV scheme is altered from the original MCV scheme to compute the updates of point wise cell boundary derivatives entirely locally. Then it is altered such that boundary moments are replaced with the interface upwind value. The scheme is stable at a maximum stable CFL (MSCFL) value of one no matter how high-order the scheme is, giving significantly larger time steps than Galerkin methods, for which the MSCFL decreases nearly quadratically with increasing order. The BA-MCV method is compared against a SE method at varying order, both using the ADER-DT time discretization. BA-MCV error for a sine wave was comparable to the same order of accuracy for a SE method. The resulting large time step, multi-moment, low communication scheme is well suited for exascale architectures. Matthew Norman

### Urgent Computing -Computations for Decision Support in Critical Situations (UC) Session 1

#### Chair: Alexander Boukhanovsky

 728 Computational uncertainty management for coastal flood prevention system [abstract]Abstract: Multivariate and progressive uncertainty is the main factor of accuracy in simulation systems. It can be a critical issue for systems that forecast and prevent extreme events and related risks. To deal with this problem, computational uncertainty management strategies should be used. This paper aims to demonstrate an adaptation of the computational uncertainty management strategy in the framework of a system for prediction and prevention of such natural disasters as coastal floods. The main goal of the chosen strategy is to highlight the most significant ways of uncertainty propagation and to collocate blocks of action with procedures for reduction or evaluation of uncertainty in a way that catches the major part of model error. Blocks of action involve several procedures: calibration of models, data assimilation, ensemble forecasts, and various techniques for residual uncertainty evaluation (including risk evaluation). The strategy described in this paper was tested and proved based on a case study of the coastal flood prevention system in St. Petersburg. Anna Kalyuzhnaya, Alexander Boukhanovsky 731 Computational uncertainty management for coastal flood prevention system. Part II: Diversity analysis [abstract]Abstract: Surge floods in Saint-Petersburg are related to extreme natural phenomena of rare repeatability. A lot of works were devoted to the problems appeared during maintenance of the flood prevention facility complex in Saint-Petersburg. However a lot of investigation issues connected with similar extreme events in Baltic Sea are remained opened. In this work, for surge flood of rare repeatability reconstruction need combination of two approaches based on the statistical multidimensional extremum analysis and on the synthetic surge floods was made. Synthetic storm model, taking multidimensional probability distributions from Reanalysis was developed and synthetic cyclone generation for its implementation was proposed. Anna Kalyuzhnaya, Denis Nasonov, Alexander Visheratin, Alexey Dudko and Alexander Boukhanovsky 517 SIM-CITY: an e-Science framework for urban assisted decision support [abstract]Abstract: Urban areas are characterised by high population densities and the resulting complex social dynamics. For urban planners to evaluate, analyse, and predict complex urban dynamics, a lot of scenarios and a large parameter space must be explored. In urban disasters, complex situations must be assessed in short notice. We propose the concept of an assisted decision support system to aid in these situations. The system interactively runs a scenario exploration, which evaluates scenarios and optimize for desired properties. We introduce the SIM-CITY architecture to run such interactive scenario explorations and highlight a use case for the architecture, an urban fire emergency response simulation in Bangalore. Joris Borgdorff, Harsha Krishna, Michael H. Lees 297 Towards a general definition of Urgent Computing [abstract]Abstract: Numerical simulations of urgent events, e.g. tsunamis, storms and flash floods, must be completed within a stipulated deadline. The simulation results are needed by relevant authorities in making timely educated decisions to mitigate financial losses, manage affected areas and reduce casualties. The existing definition of urgent computing is too usage context specific and thus restricts the identification of urgent use cases and the general application of urgent computing. We aim to extend and refine the existing definition and provide a comprehensive general definition of urgent computing. This general definition will aid in the identification of urgent computing's unique challenges and thus demonstrates the need for innovative multi-disciplinary solutions to address these challenges. Siew Hoon Leong, Dieter Kranzlmüller 375 Combining Data-driven Methods with Finite Element Analysis for Flood Early Warning Systems [abstract]Abstract: We developed a robust approach for real-time levee condition monitoring based on combination of data-driven methods (one-side classification) and finite element analysis. It was implemented within a flood early warning system and validated on a series of full-scale levee failure experiments organised by the IJkdijk consortium in August-September 2012 in the Netherlands. Our approach has detected anomalies and predicted levee failures several days before the actual collapse. This approach was used in the UrbanFlood decision support system for routine levee quality assessment and for critical situations of a potential levee breach and inundation. In case of emergency, the system generates an alarm, warns dike managers and city authorities, and launches advanced urgent simulations of levee stability and flood dynamics, thus helping to make informed decisions on preventive measures, to evaluate the risks and to alleviate adverse effects of a flood. A.L. Pyayt, D.V. Shevchenko, A.P. Kozionov, I.I. Mokhov, B. Lang, V.V. Krzhizhanovskaya, P.M.A. Sloot

### Urgent Computing -Computations for Decision Support in Critical Situations (UC) Session 2

#### Chair: Alexander Boukhanovsky

 725 Evolutionary replicative data reorganization with prioritization for efficient workload processing [abstract]Abstract: Nowadays the importance of data collection, processing, and analyzing is growing tremendously. BigData technologies are in high demand in different areas, including bio-informatics, hydrometeorology, high energy physics, etc. One of the most popular computation paradigms that is used in large data processing frameworks is the MapReduce programming model. Today integrated optimization mechanisms that take into account only load balance and execution fast simplicity are not enough for advanced computations and more efficient complex approaches are needed. In this paper, we suggest an improved algorithm based on categorization for data reorganization in MapReduce frameworks using replication and network aspects. Moreover, for urgent computations that require a specific approach, the prioritization customization is introduced. Denis Nasonov, Anton Spivak, Andrew Razumovskiy, Anton Myagkov 727 Multiscale agent-based simulation in large city areas: emergency evacuation use case [abstract]Abstract: Complex phenomena are increasingly attracting the interest of researchers from various branches of computational science. So far, this interest have conditioned the demand not only for more sophisticated autonomous models, but also for mechanisms that would associate them. This paper presents a multiscale agent-based modelling and simulation technique based on the incorporation of multiple modules. Two key principles are presented as guiding such an integration: common abstract space as a space, where entities of different models interact and commonly controlled agents – abstract actors operating in a common space, which can be handled by different agent-based models. Proposed approach is evaluated through series of experiments on simulating the emergency evacuation from the cinema building to the city streets, where building and street levels are reproduced in heterogeneous models. Vladislav Karbovskii, Daniil Voloshin, Andrey Karsakov, Alexey Bezgodov, Aleksandr Zagarskikh 550 Execution management and efficient resource provisioning for flood decision support [abstract]Abstract: We present a resource provisioning and execution management solution for a flood decision support system. The system developed within the ISMOP project, features an urgent computing scenario in which flood threat assessment for large sections of levees is requested within a specified deadline. Unlike typical decision support systems which utilize heavyweight simulations in order to predict the possible course of an emergency, in ISMOP we employ an alternative approach based on the scenario identification' method. We show that this approach is a particularly good fit for the resource provisioning model of IaaS Clouds. We describe the architecture of the ISMOP decision support system, focusing on the urgent computing scenario and its formal resource provisioning model. Preliminary results of experiments performed in order to calibrate and validate the model indicate that the model fits experimental data. Bartosz Balis, Marek Kasztelnik, Maciej Malawski, Piotr Nowakowski, Bartosz Wilk, Maciej Pawlik, Marian Bubak 726 Holistic approach to urgent computing for flood decision support [abstract]Abstract: This paper presents the concept of holistic approach to urgent computing which extends resources management in situation of emergency from computational resources to Data Acquisition and Preprocessing System. The layered structure of this system is presented in detail and its rearrangement in case of emergency is proposed. This process is harmonised with large scale computation using Urgent Service Profile. The proposed approach was validated by practical work performed under ISMOP project. Concrete examples of Urgent Service Profile definition have been discussed. Results of preliminary experiments related to energy management and data transmission optimization in case of emergency have been presented. Robert Brzoza-Woch, Marek Konieczny, Bartosz Kwolek, Piotr Nawrocki, Tomasz Szydło, Krzysztof Zieliński 327 3D simulation system to support the planning of rescue operations on damaged ships [abstract]Abstract: The paper describes a software system to simulate the ship motions in a crisis situation. The scenario consists of the damaged ship subjected to wave excitation forces generated by a random sea base on real wave spectrum. The simulation is displayed in an interactive Virtual Environment allowing the visualization of the ship motions. The numerical simulation of the sea surface and ship motions requires intensive computation to maintain the real-time or even the fast-forward simulations, which are the only ones of interest for these situations. Dedicated tools to analyse the ship behaviour in time are also described. The system can be useful to evaluate the responses of the ship to the current sea state, namely the amplitude, variations and tendencies of ship motions, and help the planning and coordination of rescue operations. Jose Varela, José Miguel Rodrigues, Carlos Guedes Soares

### Numerical and computational developments to advance multi-scale Earth System Models (MSESM) Session 2

#### Chair: K.J. Evans

 97 On the scalability of the Albany/FELIX first-order Stokes approximation ice sheet solver for large-scale simulations of the Greenland and Antarctic ice sheets [abstract]Abstract: We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow [1]. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems, which arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multi-level (ML) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the ML preconditioner results in faster linear solve times but the ILU preconditioner exhibits better scalability. A weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. Here, we show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the ML preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the ML preconditioner. [1] I. Kalashnikova, M. Perego, A. Salinger, R. Tuminaro, and S. Price. Albany/FELIX: A parallel, scalable and robust finite element higher-order stokes ice sheet solver built for advance analysis. Geosci. Model Develop. Discuss., 7:8079-8149, 2014. Irina Kalashnikova, Raymond Tuminaro, Mauro Perego, Andrew Salinger, Stephen Price 145 On the Use of Finite Difference Matrix-Vector Products in Newton-Krylov Solvers for Implicit Climate Dynamics with Spectral Elements [abstract]Abstract: Efficient solutions of global climate models require effectively handling disparate length and time scales. Implicit solution approaches allow time integration of the physical system with a step size governed by accuracy of the processes of interest rather than by stability of the fastest time scales present. Implicit approaches, however, require the solution of nonlinear systems within each time step. Usually, a Newton's method is applied to solve these systems. Each iteration of the Newton's method, in turn, requires the solution of a linear model of the nonlinear system. This model employs the Jacobian of the problem-defining nonlinear residual, but this Jacobian can be costly to form. If a Krylov linear solver is used for the solution of the linear system, the action of the Jacobian matrix on a given vector is required. In the case of spectral element methods, the Jacobian is not calculated but only implemented through matrix-vector products. The matrix-vector multiply can also be approximated by a finite difference approximation which may introduce inaccuracy in the overall nonlinear solver. In this paper, we review the advantages and disadvantages of finite difference approximations of these matrix-vector products for climate dynamics within the spectral element shallow water dynamical core of the Community Atmosphere Model (CAM). Carol Woodward, David Gardner, Katherine Evans 503 Accelerating Time Integration for Climate Modeling Using GPUs [abstract]Abstract: The push towards larger and larger computational platforms has made it possible for climate simulations to resolve climate dynamics across multiple spatial and temporal scales. This direction in climate simulation has created a strong need to develop scalable time stepping methods capable of accelerating throughput on high performance computing. This work details the recent advances in the implementation of implicit time stepping of the spectral element dynamical core within the United States Department of Energy (DOE) Accelerated Climate Model for Energy (ACME) on graphical processing units (GPU) based machines. We demonstrate how solvers in the Trilinos project are interfaced with ACME and GPU kernels to increase computational speed of the residual calculations in the implicit time stepping method for the atmosphere dynamics. We show the optimization gains and data structure reorganization that facilitates the performance improvements. Rick Archibald, Katherine Evans, Andrew Salinger 543 A Time-Split Discontinuous Galerkin Transport Scheme for Global Atmospheric Model [abstract]Abstract: A time-split transport scheme has been developed for the high-order multiscale atmospheric model (HOMAM). The spacial discretization of HOMAM is based on the discontinuous Galerkin method, combining the 2D horizontal elements on the cubed-sphere surface and 1D vertical elements in a terrain-following height-based coordinate. The accuracy of the time-splitting scheme is tested with a set of new benchmark 3D advection problems. The split time-integrators are based on the Strang-type operator-split method. The convergence of standard error norms shows a second-order accuracy with the smooth scalar field, irrespective of a particular time-integrator. The results with the split scheme is comparable with that of the established models. Ram Nair, Lei Bao, Michael Toy

### Numerical and computational developments to advance multi-scale Earth System Models (MSESM) Session 3

#### Chair: K.J. Evans

 321 Analysis of ocean-atmosphere coupling algorithms : consistency and stability [abstract]Abstract: This paper is focused on the numerical and computational issues associated to ocean-atmosphere coupling. It is shown that usual coupling methods do not provide the solution to the correct problem, but to an approaching one since they are equivalent to performing one single iteration of an iterative coupling method. The stability analysis of these ad-hoc methods is presented, and we motivate and propose the adaptation of a Schwarz domain decomposition method to ocean-atmosphere coupling to obtain a stable and consistent coupling method. Florian Lemarie, Eric Blayo, Laurent Debreu 658 Exploring the Effects of a High-Order Vertical Coordinate in a Non-Hydrostatic Global Model [abstract]Abstract: As atmospheric models are pushed towards non-hydrostatic resolutions, there is a growing need for new numerical discretizations that are accurate, robust and effective at these scales. In this paper we describe a new arbitrary-order staggered nodal finite-element method (SNFEM) vertical discretization motivated by the flux reconstruction formulation. The SNFEM formulation generalizes traditional second-order vertical discretizations, including Lorenz and Charney-Phillips discretizations, to arbitrary order-of-accuracy while preserving desirable properties such as energy conservation. Preliminary results from application of this method to an idealized baroclinic instability are given, demonstrating the effect of improvements in order of accuracy on the structure of the instability. Paul Ullrich, Jorge Guerra 494 High-Order / Low-Order Methods for Ocean Modeling [abstract]Abstract: We examine a High Order / Low Order (HOLO) approach for a z-level ocean model and show that the traditional semi-implicit and split-explicit methods, as well as a recent preconditioning strategy, can easily be cast in the framework of HOLO methods. The HOLO formulation admits an implicit-explicit method that is algorithmically scalable and second-order accurate, allowing timesteps much larger than the barotropic time scale. We show how HOLO approaches, in particular the implicit-explicit method, can provide a solid route for ocean simulation to heterogeneous computing and exascale environments. Chris Newman, Geoff Womeldorff, Luis Chacon, Dana Knoll 134 Aeras: A Next Generation Global Atmosphere Model [abstract]Abstract: Sandia National Laboratories is developing a new global atmosphere model named Aeras that is performance portable and supports the quantification of uncertainties. These next-generation capabilities are enabled by building Aeras on top of Albany, a code base that supports the rapid development of scientific application codes while leveraging Sandia's foundational mathematics and computer science packages in Trilinos and Dakota. Embedded uncertainty quantification is an original design capability of Albany, and performance portability is a recent upgrade for Albany. Other required features, such as shell-type elements, spectral elements, efficient explicit and semi-implicit time-stepping, transient sensitivity analysis, and concurrent ensembles, were not components of Albany as the project began, and have been (or are being) added by the Aeras team. We present early sensitivity analysis and performance portability results for the shallow water equations. William Spotz, Thomas Smith, Irina Demeshko, Jeffrey Fike

### Poster Track (POSTER) Session 1

#### Room: Solin 1st Floor

 506 Numerical modelling of pollutant propagation in Lake Baikal during the spring thermal bar [abstract]Abstract: In this paper, the phenomenon of the thermal bar in Lake Baikal and the propagation of pollutants from the Selenga River are studied with a nonhydrostatic mathematical model. An unsteady flow is simulated by solving numerically a system of thermal convection equations in the Boussinesq approximation using second-order implicit difference schemes in both space and time. To calculate the velocity and pressure fields in the model, an original procedure for buoyant flows, SIMPLED, which is a modification of the well-known Patankar and Spalding's SIMPLE algorithm, has been developed. The simulation results have shown that the thermal bar plays a key role in propagation of pollution in the area of Selenga River inflow into Lake Baikal. Bair Tsydenov, Anthony Kay, Alexander Starchenko 730 I have a DRIHM: A case study in lifting computational science services up to the scientific mainstream [abstract]Abstract: While we are witnessing a transition from petascale to exascale computing, we experience, when teaching students and scientists to adopt distributed computing infrastructures for computational sciences, what Geoffrey A. Moore once coined the chasm between the visionaries in computational sciences and the early majority of scientific pragmatists. Using the EU-funded DRIHM project (Distributed Research Infrastructure for Hydro-Meteorology) as a case study, we see that innovative research infrastructures have difficulties to be accepted by the scientific pragmatists: The infrastructure services are not yet "mainstream". Excellence in workforces in computational sciences, however, can only be achieved if the tools are not only available but also used. In this paper we show for DRIHM how the chasm exhibits and how it can be crossed. Michael Schiffers, Nils Gentschen Felde, Dieter Kranzlmüller 523 Random Set Method Application to Flood Embankment Stability Modelling [abstract]Abstract: In this work the application of random set theory to flood embankment stability modelling is presented. The objective of this paper is to illustrate a method of uncertainty analysis in a real geotechnical problem. Anna Pięta, Krzysztof Krawiec 260 MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems [abstract]Abstract: Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the importance and significance of Big Data, an increasing number of organizations are investing in relatively cheaper Hadoop clusters for executing their mission critical data processing applications. An issue here is that system administrators at these sites might have to maintain two parallel facilities for running HPC and Hadoop computations. This, of course, is not ideal due to redundant maintenance work and poor economics. This paper attempts to bridge this gap by allowing HPC and Hadoop jobs to co-exist on a single hardware facility. We achieve this goal by exploiting YARN—Hadoop v2.0—that de-couples the compu- tational and resource scheduling part of the Hadoop framework from HDFS. In this context, we have developed a YARN-based reference runtime system for the MPJ Express software that allows executing parallel MPI-like Java applications on Hadoop clusters. The main contribution of this paper is to provide Big Data community access to MPI-like programming using MPJ Express. As an aside, this work allows parallel Java applications to perform computations on data stored in Hadoop Data File System (HDFS). Hamza Zafar, Farrukh Aftab Khan, Bryan Carpenter, Aamir Shafi, Asad Waqar Malik 393 Scalable Multilevel Support Vector Machines [abstract]Abstract: Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that scales efficiently to very large data sets. Instead of solving the whole training set in one optimization process, the support vectors are obtained and gradually refined at multiple levels of coarseness of the data. The proposed framework includes: (a) construction of hierarchy of large-scale data coarse representations, and (b) a local processing of updating the hyperplane throughout this hierarchy. Our multilevel framework substantially improves the computational time without loosing the quality of classifiers. The algorithms are demonstrated for both regular and weighted support vector machines. Experimental results are presented for balanced and imbalanced classification problems. Quality improvement on several imbalanced data sets has been observed. Talayeh Razzaghi, Ilya Safro 407 Arbitrarily High-Order-Accurate, Hermite WENO Limited, Boundary-Averaged Multi-Moment Constrained Finite-Volume (BA-MCV) Schemes for 1-D Transport [abstract]Abstract: This study introduces the Boundary Averaged Multi-moment Constrained finite-Volume (BA-MCV) scheme for 1-D transport with Hermite Weighted Essentially Non-Oscillatory (HWENO) limiting using the ADER Differential Transform (ADER-DT) time discretization. The BA-MCV scheme evolves a cell average using a Finite-Volume (FV) scheme, and it adds further constraints as point wise derivatives of the state at cell boundaries, which are evolved in strong form using PDE derivatives. The resulting scheme maintains a Maximum Stable CFL (MSCFL) value of one no matter how high-order the scheme is. Also, parallel communication requirements are very low and will be described. Using test cases of a function with increasing steepness, the accuracy of the BA-MCV method will be tested in a limited and non-limited context for varying levels of smoothness. Polynomial $h$-refinement convergence and exponential $p$-refinement convergence will be demonstrated. The overall ADER-DT + BA-MCV + HWENO scheme is a scalable and larger time step alternative to Galerkin methods for multi-moment fluid simulation in climate and weather applications. Matthew Norman 434 A Formal Method for Parallel Genetic Algorithms [abstract]Abstract: We present a formal model that allows to analyze non trivial properties about the behavior of parallel genetic algorithms implemented using multi-islands. The model is based on a probabilistic labeled transition system, that represents the evolution of the population in each island, as well as the interaction among different islands. By studying the traces these systems can perform, the resulting model allows to formally compare the behavior of different algorithms. Natalia Lopez, Pablo Rabanal, Ismael Rodriguez, Fernando Rubio 484 Comparison of Two Diversication Methods to Solve the Quadratic Assignment Problem [abstract]Abstract: The quadratic assignment problem is one of the most studied NP-hard problems. It is known for its complexity which makes it a good candidate for the parallel design. In this paper, we propose and analyze two parallel cooperative algorithms based on hybrid iterative tabu search. The only difference between the two approaches is the diversification methods. Through 15 of the hardest well-known instances from QAPLIB benchmark, our algorithms produce competitive results. This experimentation shows that our propositions can exceed or equal several leading algorithms from the literature in almost all the hardest benchmark instances. Omar Abdelkafi, Lhassane Idoumghar, Julien Lepagnot 522 A Matlab toolbox for Kriging metamodelling [abstract]Abstract: Metamodelling offers an efficient way to imitate the behaviour of computationally expensive simulators. Kriging based metamodels are popular in approximating computation-intensive simulations of deterministic nature. Irrespective of the existence of various variants of Kriging in the literature, only a handful of Kriging implementations are publicly available and most, if not all, free libraries only provide the standard Kriging metamodel. ooDACE toolbox offers a robust, flexible and easily extendable framework where various Kriging variants are implemented in an object-oriented fashion under a single platform. This paper presents an incremental update of the ooDACE toolbox introducing an implementation of Gradient Enhanced Kriging which has been tested and validated on several engineering problems. Selvakumar Ulaganathan, Ivo Couckuyt, Dirk Deschrijver, Eric Laermans, Tom Dhaene 607 Improving Transactional Memory Performance for Irregular Applications [abstract]Abstract: Transactional memory (TM) offers optimistic concurrency support in modern multicore architectures, helping the programmers to extract parallelism in irregular applications when data dependence information is not available before runtime. In fact, recent research focus on exploiting thread-level parallelism using TM approaches. However, the proposed techniques are of general use, valid for any type of application. This work presents ReduxSTM, a software TM system specially designed to extract maximum parallelism from irregular applications. Commit management and conflict detection were tailored to take advantage of both, transaction ordering constraints to assure correct results, and the existence of (partial) reduction patterns, a very frequent memory access pattern in irregular applications. Both facts are used to avoid unnecessary transaction aborts. A function in 300.twolf package from SPEC CPU2000 was taken as a motivating irregular program. This code was parallelized using ReduxSTM and an ordered version of TinySTM, a state-of-the-arte TM system. The experimental evaluation shows our proposed TM system exploits more parallelism from the sequential program and obtains better performance than the other system. Manuel Pedrero, Eladio Gutiérrez, Sergio Romero, Oscar Plata 635 Building Java Intelligent Applications Data Mining for Java Type-2 Fuzzy Inference Systems [abstract]Abstract: This paper introduces JT2FISClustering, a data mining extension for JT2FIS. JT2FIS is a Java class library for building intelligent applications. This extension is used to extract information from a data set and transform it into an Interval Type-2 Fuzzy Inference System in Java applications. Mamdani and Takagi-Sugeno Fuzzy Inference Systems can be generated using fuzzy c-means or subtractive data mining methods. We compare the outputs and performance of Matlab R versus Java in order to validate the proposed extension. Manuel Castañón-Puga, Josué-Miguel Flores-Parra, Juan Ramón Castro, Carelia Gaxiola-Pacheco, Luis Enrique Palafox-Maestre 639 The Framework for Rapid Graphics Application Developent: The Multi-scale Problem Visualization. [abstract]Abstract: Interactive real-time visualization plays a significant role in simulation research domain. Multi-scale problems are in need of high performance visualization with good quality and the same could be said about other problem domains, e.g. big data analysis, physics simulation, etc. The state of the art shows that a universal tool for solving such problem is non-existent. Modern computer graphics requires enormous efforts to implement efficient algorithms on modern GPUs and GAPIs. In the first part of our paper we introduce a framework for rapid graphics application development and its extensions for multi-scale problem visualization. In the second part of the paper we provide a prototype of multi-scale problem’s solution in simulation and monitoring of high-precision agent movements starting from behavioral patterns in an airport and up to world-wide flight traffic. Finally we summarize our results and speculate about future investigations. Alexey Bezgodov, Andrey Karsakov, Aleksandr Zagarskikh, Vladislav Karbovskii 29 A multiscale model for the feto-placental circulation in the monochorionic twin pregnancies [abstract]Abstract: We developed a mathematical model of monochorionic twin pregnancies to simulate both the normal gestation and the Twin-Twin Transfusion Syndrome (TTTS), a disease in which the interplacental anastomose create a flow imbalance, causing one of the twin to receive too much blood and liquids, becoming hypertensive and polyhydramnios (the Recipient) and the other to become hypotensive and oligohydramnios (the Donor). This syndrome, if untreated, leads almost certainly to death one or both twins. We propose a compartment model to simulate the flows between the placenta and the fetuses and the accumulation of the amniotic fluid in the sacs. The aim of our work is to provide a simple but realistic model of the twins-mother system and to stress it by simulating the pathological cases and the related treatments, i.e. aminioreduction (elimination of the excess liquid in the recipient sac), laser therapy (removal of all the anastomoses) and other possible innovative therapies impacting on pressure and flow parameters. Ilaria Stura, Pietro Gaglioti, Tullia Todros, Caterina Guiot 86 Sequential and Parallel Implementation of GRASP for the 0-1 Multidimensional Knapsack Problem [abstract]Abstract: The knapsack problem is a widely known problem in combinatorial optimization and has been object of many researches in the last decades. The problem has a great number of variants and obtaining an exact solution to any of these is not easily accomplished, which motivates the search for alternative techniques to solve the problem. Among these alternatives, metaheuristics seem to be suitable on the search for approximate solutions for the problem. In this work we propose a sequential and a parallel implementation for the multidimensional knapsack problem using GRASP metaheuristic. The obtained results show that GRASP can lead to good quality results, even optimal in some instances, and that CUDA may be used to expand the neighborhood search and as a result may lead to improved quality results. Bianca De Almeida Dantas, Edson Cáceres 89 Telescopic hybrid fast solver for 3D elliptic problems with point singularities [abstract]Abstract: This paper describes a telescopic solver for two dimensional h adaptive grids with point singularities. The input for the telescopic solver is an h refined two dimensional computational mesh with rectangular finite elements. The candidates for point singularities are first localized over the mesh by using a greedy algorithm. Having the candidates for point singularities, we execute either a direct solver, that performs multiple refinements towards selected point singularities and executes a parallel direct solver algorithm which has logarithmic cost with respect to refinement level. The direct solvers executed over each candidate for point singularity return local Schur complement matrices that can be merged together and submitted to iterative solver. In this paper we utilize a parallel logarithmic computational cost GPU solver or parallel multi-thread GALOIS solver as a direct solver. We use Incomplete LU Preconditioned Conjugated Gradients (ILUPCG) as an iterative solver. We also show that elimination of point singularities from the refined mesh reduces significantly the number of iterations to be performed by the ILUPCG iterative solver. Anna Paszynska, Konrad Jopek, Krzysztof Banaś, Maciej Paszynski, Andrew Lenerth, Donald Nguyen, Keshav Pingali, Lisandro Dalcin, Victor Calo 95 Adapting map resolution to accomplish execution time constraints in wind field calculation [abstract]Abstract: Forest fires are natural hazards that every year destroy thousands of hectares around the world. Forest fire propagation prediction is a key point to fight against such hazards. Several models and simulators have been developed to predict forest fire propagation. These models require input parameters such as digital elevation map, vegetation map, and other parameters describing the vegetation and meteorological conditions. However, some meteorological parameters, such as wind speed and direction, change from one point to another one due to the effect of the topography of the terrain. Therefore, it is necessary to couple wind field models, such as WindNinja, to estimate the wind speed and direction at each point of the terrain. The output provided by the wind field simulator is used as input of the fire propagation model. Coupling wind field model and forest fire propagation model improves accuracy prediction, but increases significantly prediction time. This fact is critical since propagation prediction must be provided in advance to allow the control centers to manage firefighters in the best possible way. This work analyses WindNinja execution time, describes a WindNinja parallelisation based on map partitioning, determines the limitations of such methodology for large maps and presents an improvement based on adapting map resolution to accomplish execution time limitations. Gemma Sanjuan, Tomas Margalef, Ana Cortes 103 Efficient BSP/CGM algorithms for the maximum subsequence sum and related problems [abstract]Abstract: Given a sequence of n numbers, with at least one positive value, the maximum subsequence sum problem consists in finding the contiguous subsequence with the largest sum or score, among all derived subsequences of the original sequence. Several scientific applications have used algorithms that solve the maximum subsequence sum. Particularly in Computational Biology, these algorithms can help in the tasks of identification of transmembrane domains and in the search for GC-content regions, a required activity in the operation of pathogenicity islands location. The sequential algorithm that solves this problem has O(n) time complexity. In this work we present BSP/CGM parallel algorithms to solve the maximum subsequence sum problem and three related problems: the maximum longest subsequence sum, the maximum shortest subsequence sum and the number of disjoints subsequences of maximum sum. To the best of our knowledge there are no parallel BSP/CGM algorithms for these related problems. Our algorithms use p processors and require O(n/p) parallel time with a constant number of communication rounds for the algorithm of the maximum subsequence sum and O(log p) communication rounds, with O(n/p) local computation per round, for the algorithms of the related problems. We implemented the algorithms on a cluster of computers using MPI and on a machine with GPU using CUDA, both with good speed-ups. Anderson C. Lima, Edson N. Cáceres, Rodrigo G. Branco, Roussian R. A. Gaioso, Samuel B. Ferraz, Siang W. Song, Wellinton S. Martins 225 Fire Hazard Safety Optimisation for Building Environments [abstract]Abstract: This article provides a theoretical study for fire hazard safety in building environments. The working hypothesis is that the navigation costs and hazard spread are deterministically modeled and over time. Based on the dynamic navigation costs under fire hazard, the article introduces the notion of dynamic safety in a recursive manner. Then several theoretical results are proposed to calculate the dynamic safety over time and to establish that it represents the maximum amount of time to delay safely on nodes. Based on the recursive equations, an algorithm is proposed to calculate the dynamic safety and successor matrices. Finally, some experimental results are provided to illustrate the efficiency of the algorithm and to present a real case study. Sabin Tabirca, Tatiana Tabirca, Laurence Yang 295 A Structuring Concept for Securing Modern Day Computing Systems [abstract]Abstract: Security within computing systems is ambiguous, proliferated through obscurity, a knowledgeable user, or plain luck. Presented is a novel concept for structuring computing systems to achieve a higher degree of overall system security through the compartmentalization and isolation of executed instructions for each component. Envisioned is a scalable model which focuses on lower level operations to alleviate the view of security as a binary outcome to that of a deterministic metric based on a set of independent characteristics. Orhio Creado, Phu Dung Le, Jan Newmarch, Jeff Tan 323 Federated Big Data for resource aggregation and load balancing with DIRAC [abstract]Abstract: BigDataDIRAC is a Federated Big Data solution with a Distributed Infrastructure with Remote Agent Control (DIRAC) access point. Users have the opportunity to access multiple Big Data resources scattered in different geographical areas, such as access to grid resources. This approach opens the possibility of offering not only grid and cloud to the users, but also Big Data resources from the same DIRAC environment. We describe a system to allow access to a federation of Big Data resources, including load balancing, using DIRAC. Proof of concept is shown and load balancing performance evaluations are presented using several use cases supported by three computing centers in two countries, and with four Hadoop clusters. Victor Fernandez, Víctor Méndez, Tomás F. Pena 324 Big Data Analytics Performance for Large Out-Of-Core Matrix Solvers on Advanced Hybrid Architectures [abstract]Abstract: This paper examines the performance of large Out-Of-Core matrices to assess the optimal Big Data system performance of advanced computer architectures, based on the performance evaluation of a large dense Lower-Upper Matrix Decomposition (LUD) employing a highly tuned, I/O managed, slab based LUD software package developed by the Lockheed Martin Corporation. We present extensive benchmark studies conducted with this package on UMBC’s Bluegrit and Bluewave clusters, and NASA-GFSC’s Discover cluster systems. Our results show speedup for a single node achieved by Phi Coprocessors relative to the host CPU SandyBridge processors is about a 1.5X improvement, which is an even smaller relative performance gain compared with the studies published by F.Masci (Masci, 2013), where he obtains a 2-2.5x performance. Surprisingly, the Westmere with the Tesla GPU scales comparably with the Sandy Bridge and the Phi Coprocessor up to 12 processes and then fails to continue to scale. The performances across 20 CPU nodes of SandyBridge obtains a uniform speedup of 0.5X over Westmere for problem sizes of 10K, 20K and 40K unknowns. With an Infiniband DDR, the performance of Nehalem processors is comparable to Westmere without the interconnect. Raghavendra Rao, Milton Halem, John Dorband 352 A critical survey of data grid replication strategies based on data mining techniques [abstract]Abstract: Replication is one common way to effectively address challenges for improving the data management in data grids. It has attracted a great deal of attention of many researchers. Hence, a lot of work is done and many strategies have been proposed. However, most of the existing replication strategies consider a single file-based granularity and do not take into account file access patterns or possible file correlations. However, file correlations become an increasingly important consideration for performance enhancement in data grids. In this regard, the knowledge about file correlations can be extracted from historical and operational data using the techniques of the data mining field. Data mining techniques have proved to be a powerful tool facilitating the extraction of meaningful knowledge from large data sets. As a consequence of the convergence of data mining and data grid, mining grid data is an interesting research field which aims at analyzing grid systems with data mining techniques in order to efficiently discover new meaningful knowledge to enhance data management in data grids. More precisely, in this paper, the extracted knowledge is used to enhance replica management. Gaps in the current literature and opportunities for further research are presented. In addition, we propose a new guideline to data mining application in the context of data grid replication strategies. To the best of our knowledge, this is the first survey mainly dedicated to data grid replication strategies based on data mining techniques. Tarek Hamrouni, Sarra Slimani, Faouzi Ben Charrrada 428 Reduction of Computational Load for MOPSO [abstract]Abstract: The run time for many optimisation algorithms, particularly those that explicitly consider multiple objectives, can be impractically large when applied to real world problems. This paper reports an investigation into the behaviour of Multi-Objective Particle Swarm Optimisation (MOPSO), that seeks to reduce the number of objective function evaluations needed, without degrading solution quality. By restricting archive size and strategically reducing the trial solution population size, it has been found the number of function evaluations can been reduced by 66.7% without significant reduction in solution quality. In fact, careful manipulation of algorithm operating parameters can even significantly improve solution quality. Mathew Curtis, Andrew Lewis 501 The Effects of Hotspot Detection and Virtual Machine Migration Policies on Energy Consumption and Service Levels in the Cloud [abstract]Abstract: Cloud computing has received much attention among researchers lately. Managing Cloud resources efficiently necessitates effective policies that assign applications to hardware in a way that they require the least resources possible. Applications are first assigned to virtual machines which are subsequently placed on the most appropriate server host. If a server becomes overloaded, some of its virtual machines are reassigned. This process requires a hotspot detection mechanism in combination with techniques that select the virtual machine(s) to migrate. In this work we introduce two new virtual machine selection policies, Median Migration Time and Maximum Utilisation, and show that they outperform existing approaches on the criteria of minimising energy consumption, service level agreement violations and the number of migrations when combined with different hotspot detection mechanisms. We show that parametrising the the hotspot detection policies correctly has a significant influence on the workload balance of the system. S Sohrabi, I. Moser 614 Towards a Performance-realism Compromise in the Development of the Pedestrian Navigation Model [abstract]Abstract: Despite the emergence of new approaches and increasingly powerful processing resources, there are cases in the domain of pedestrian modeling that require the maintenance of compromise between the computational performance and realism of the behavior of the simulated agents. Present paper seeks to address this issue through comparative computational experiments and visual validation of the simulations using the real-world data. Acquired results show that a reasonable compromise may be reached for in the multi-level navigation incorporating both route planning and collision avoidance. Daniil Voloshin, Vladislav Karbovskii, Dmitriy Rybokonenko 641 A Methodology for Designing Energy-Aware Systems for Computational Science [abstract]Abstract: Energy consumption is currently one of the main issues in large distributed systems. More specifically, the efficient management of energy without losing performance has become a hot topic in the field. Thus, the design of systems solving complex problems must take into account energy efficiency. In this paper we present a formal methodology to check the correctness, from an energy-aware point of view, of large systems, such as HPC clusters and cloud environments, dedicated to computational science. Our approach uses a simulation platform, to model and simulate computational science environments, and metamorphic testing, to check the correctness of energy consumption in these systems. Pablo Cañizares, Alberto Núñez, Manuel Nuñez, J.Jose Pardo 528 Towards an automatic co-generator for manycores’ architecture and runtime: STHORM case-study [abstract]Abstract: The increasing design complexity of manycore architectures at the hardware and software levels imposes to have powerful tools capable of validating every functional and non-functional property of the architecture. At the design phase, the chip architect needs to explore several parameters from the design space, and iterate on different instances of the architecture, in order to meet the defined requirements. Each new architectural instance requires the configuration and the generation of a new hardware model/simulator, its runtime, and the applications that will run on the platform, which is a very long and error-prone task. In this context, the IP-XACT standard has become widely used in the semiconductor industry to package IPs and provide low level SW stack to ease their integration. In this work, we present a primer work on a methodology to automatically configuring and assembling an IP-XACT golden model and generating the corresponding manycore architecture HW model, low-level software runtime and applications. We use the STHORM manycore architecture and the HBDC application as a case study. Charly Bechara, Karim Ben Chehida, Farhat Thabet 306 Enhancing ELM-based facial image classification by exploiting multiple facial views [abstract]Abstract: In this paper, we investigate the effectiveness of the Extreme Learning Machine (ELM) network in facial image classification. In order to enhance performance, we exploit knowledge related to the human face structure. We train a multi-view ELM network by employing automatically created facial regions of interest to this end. By jointly learning the network parameters and optimized network output combination weights, each facial region appropriately contributes to the final classification result. Experimental results on three publicly available databases show that the proposed approach outperforms facial image classification based on a single facial representation and on other facial region combination schemes Alexandros Iosifidis, Anastasios Tefas, Ioannis Pitas 429 Automatic Query Driven Data Modelling in Cassandra [abstract]Abstract: Non-relational databases have recently been the preferred choice when it comes to dealing with BigData challenges, but their performance is very sensitive to the chosen data organisations. We have seen differences of over 70 times in response time for the same query on different models. This brings users the need to be fully conscious of the queries they intend to serve in order to design their data model. The common practice then, is to replicate data into different models designed to fit different query requirements. In this scenario, the user is in charge of the code implementation required to keep consistency between the different data replicas. Manually replicating data in such high layers of the database results in a lot of squandered storage due to the underlying system replication mechanisms that are formerly designed for availability and reliability ends. In this paper, we propose and design a mechanism and a prototype to provide users with transparent management, where queries are matched with a well-performing model option. Additionally, we propose to do so by transforming the replication mechanism into a heterogeneous replication one, in order to avoid squandering disk space while keeping the availability and reliability features. The result is a system where, regardless of the query or model the user specifies, response time will always be that of an affine query. Roger Hernandez, Yolanda Becerra, Jordi Torres, Eduard Ayguade 186 A clustering-based approach to static scheduling of multiple workflows with soft deadlines in heterogeneous distributed systems [abstract]Abstract: Typical patterns of using scientific workflow management systems (SWMS) include periodical executions of prebuilt workflows with precisely known estimates of tasks’ execution times. Combining such workflows into sets could sufficiently improve resulting schedules in terms of fairness and meeting users’ constraints. In this paper, we propose a clustering-based approach to static scheduling of multiple workflows with soft deadlines. This approach generalizes commonly used techniques of grouping and ordering of parts of different workflows. We introduce a new scheduling algorithm, MDW-C, for multiple workflows with soft deadlines and compare its effectiveness with task-based and workflow-based algorithms which we proposed earlier in [1]. Experiments with several types of synthetic and domain-specific test data sets showed the superiority of a mixed clustering scheme over task-based and workflow-based schemes. This was confirmed by an evaluation of proposed algorithms on a basis of the CLAVIRE workflow management platform. Klavdiya Bochenina, Nikolay Butakov, Alexey Dukhanov, Denis Nasonov 268 Challenges and Solutions in Executing Numerical Weather Prediction in a Cloud Infrastructure [abstract]Abstract: Cloud Computing has emerged as an option to perform large-scale scientific computing. The elasticity of the cloud and its pay-as-you-go model present an interesting opportunity for applications commonly executed in clusters or supercomputers. This paper presents the challenges of migrating and executing a numerical weather prediction (NWP) application to a cloud computing infrastructure. We compared the execution of this High-Performance Computing (HPC) application in a local cluster and in the cloud using different instances sizes. The experiments demonstrate that processing and networking create a limiting factor, but that storing input and output datasets in the cloud presents an interesting option to share results and ease the deployment of a test-bed for a weather research platform. Results show that cloud infrastructure can be used as an viable HPC alternative for numerical weather prediction software. Emmanuell Diaz Carreño, Eduardo Roloff, Philippe Navaux 325 Flexible Dynamic Time Warping for Time Series Classification [abstract]Abstract: Measuring the similarity or distance between two time series sequences is critical for the classification of a set of time series sequences. Given two time series sequences, X and Y, the dynamic time warping (DTW) algorithm can calculate the distance between X and Y. But the DTW algorithm may align some neighboring points in X to the corresponding points which are far apart in Y. It may get the alignment with higher score, but with less representative information. This paper proposes the flexible dynamic time wrapping (FDTW) method for measuring the similarity of two time series sequences. The FDTW algorithm adds an additional score as the reward for the contiguously long one-to-one fragment. As the experimental results show, the DTW and DDTW and FDTW methods outperforms each other in some testing sets. By combining the FDTW, DTW and DDTW methods to form a classifier ensemble with the voting scheme, it has less average error rate than that of each individual method. Che-Jui Hsu, Kuo-Si Huang, Chang-Biau Yang, Yi-Pu Guo 511 Onedata - a Step Forward towards Globalization of Data Access for Computing Infrastructures [abstract]Abstract: To satisfy requirements of data globalization and high performance access in particular, we introduce the originally created onedata system which virtualizes storage systems provided by storage resource providers distributed globally. onedata introduces new data organization concepts together with providers' cooperation procedures that involve use of GlobalRegistry as a mediator. The most significant features include metadata synchronization and on-demand file transfer. Lukasz Dutka, Michał Wrzeszcz, Tomasz Lichoń, Rafał Słota, Konrad Zemek, Krzysztof Trzepla, Łukasz Opioła, Renata Slota, Jacek Kitowski 536 Ocean forecast information system for emergency interventions [abstract]Abstract: The paper describes the computation and information system required to support fast and efficient operations in emergency situation in the marine environment. The most common cases, which induced to activate emergency procedures, are identified and the main features of the Search And Rescue (SAR) intervention are described in their evolution, the inputs and detail that are required and the weakness that still exist. The improvement that can come from a more integrated information system, from the computation of the environmental condition to the adoption of dedicated graphical interface to provide all the necessary information in a clear and complete way, are also explained. Roberto Vettor, Carlos Guedes Soares 682 Optimizing Performance of ROMS on Intel Xeon Phi [abstract]Abstract: ROMS (Regional Oceanic Modeling System) is an open-source ocean modeling system that is widely used by the scientific community. It uses a coarse-grained parallelization scheme which partitions the computational domain into tiles. ROMS operates on a lot of multi-dimensional arrays, which makes it an ideal candidate to gain from architectures with wide and powerful Vector Processing Units (VPU) such as Intel Xeon Phi. In this paper we present an analysis of the BENCHMARK application of ROMS and the issues affecting its performance on Xeon Phi. We then present an iterative optimization strategy for this application on Xeon Phi which results in a speed-up of over 2x compared to the baseline code in the native mode and 1.5x in symmetric mode. Gopal Bhaskaran, Pratyush Gaurav 336 Fuzzy indication of reliability in metagenomics NGS data analysis [abstract]Abstract: NGS data processing in metagenomics studies has to deal with noisy data that can contain a large amount of reading errors which are difficult to detect and account for. This work introduces a fuzzy indicator of reliability technique to facilitate solutions to this problem. It includes modified Hamming and Levenshtein distance functions that are aimed to be used as drop-in replacements in NGS analysis procedures which rely on distances, such as phylogenetic tree construction. The distances utilise fuzzy sets of reliable bases or an equivalent fuzzy logic, potentially aggregating multiple sources of base reliability. Milko Krachunov, Dimitar Vassilev, Maria Nisheva, Ognyan Kulev, Valeriya Simeonova, Vladimir Dimitrov 559 Pairwise genome comparison workflow in the Cloud using Galaxy [abstract]Abstract: Workflows are becoming the new paradigm in bioinformatics. In general, bioinformatics problems are solved by interconnecting several small software pieces to perform complex analyses. This demands a minimal expertise to create, enact and monitor such tools compositions. In addition bioinformatics is immersed in the big-data territory, facing huge problems to analyse such amount of data. We have addressed these problems by integrating a tools management platform (Galaxy) and a Cloud infrastructure, which prevents moving the big datasets between different locations and allows the dynamic scaling of the computing resources depending on the user needs. The result is a user-friendly platform that facilitates the work of the end-users while performing their experiments, installed in a Cloud environment that includes authentication, security and big-data transfer mechanisms. To demonstrate the suitability of our approach we have integrated in the infrastructure an existing pairwise and multiple genome comparison tool which comprises the management of huge datasets and high computational demands. Óscar Torreño Tirado, Michael T. Krieger, Paul Heinzlreiter, Oswaldo Trelles 583 WebGL based visualisation and analysis of stratigraphic data for the purposes of the mining industry [abstract]Abstract: In recent years the combination of databases, data and internet technologies has greatly enhanced the functionality of many systems based on spatial data, and facilitated the dissemination of such information. In this paper, we propose a web-based data visualisation and analysis system for stratigraphic data from a Polish mine, with visualisation and analysis tools which can be accessed via the Internet. WWW technologies such as active web pages and WebGL technology provide a user-friendly interface for browsing, plotting, comparing, and downloading information of interest, without the need for dedicated mining industry software. Anna Pieta, Justyna Bała 33 Modeling and Simulation of Masticatory Muscles [abstract]Abstract: Medical simulators play an important role in helping the development of prototype prostheses, pre-surgical planning and in a better understanding of the mechanical phenomena involved in muscular activity. This article focuses in modeling and simulating the activity of the jaw muscular system. The model involves the use of three-dimensional bone models and muscle modeling based on Hill type actuators. Ligament restrictions to mandible movement were taken into account in our model. Data collected from patients were used to partially parameterize our model so that it could be used in medical applications. In addition, the simulation of muscles employed a new methodology based on insertion curves, with many lines of action for each group of muscles. A simulator was developed, which allowed real time visualization of individual muscle activation under each correspondent simulation time. The model derived trajectory was then compared to the assembled data, remaining mostly within the convex hull of the mandible motion curves captured. Furthermore, the model accurately described the desired border movements. Eduardo Garcia, Márcio Leal, Marta Villamil 35 Fully automatic 2D hp-adaptive Finite Element Method for Non-Stationary Heat Transfer [abstract]Abstract: In this paper we present a fully automatic hp adaptive finite element method code for non-stationary two dimensional problems. The code utilizes the -scheme for time discretization and fully automatic hp adaptive finite element method discretization for numerical solution of each time step. The code is verified on the examplary non-stationary problem of heat transfer over the L-shape domain. Paweł Matuszyk, Marcin Sieniek, Maciej Paszyński 46 Parallelization of an Encryption Algorithm Based on a Spatiotemporal Chaotic System and a Chaotic Neural Network [abstract]Abstract: In this paper the results of parallelizing a block cipher based on a spatiotemporal chaotic system and a chaotic neural network are presented. A data dependence analysis of loops was applied in order to parallelize the algorithm. The parallelism of the algorithm is demonstrated in accordance with the OpenMP standard. As a result of my study, it was stated that the most time-consuming loops of the algorithm are suitable for parallelization. The efficiency measurements of a parallel algorithm working in ECB, CTR, CBC and CFB modes of operation are shown. Dariusz Burak 64 Cryptanalysing the shrinking generator [abstract]Abstract: Some linear cellular automata generate exactly the same PN-sequences as those generated by maximum-length LFSRs. Hence, cellular automata can be considered as alternative generators to the maximum-length LFSRs. Moreover, some LFSR-based keystream generators can be modelled as linear structures based on cellular automata. In this work, we analyse a family of one-dimensional, linear, regular and cyclic cellular automata based on the rule 102 that describe the behaviour of the shrinking generator, designed as a non-linear generator. This implies that the output sequence of the generator is sensitive to suffer a cryptanalysis that takes advantage of this linearity. Sara D. Cardell, Amparo Fúster-Sabater 74 D-Aid - An App to Map Disasters and Manage Relief Teams and Resources [abstract]Abstract: Natural or man-made disasters cause damage to life and property. Lack of appropriate emergency management increases the physical damage and loss of life. D-Aid, the smartphone App proposed by this article, intends to help volunteers and relief teams to quickly map and aid victims of a disaster. Anyone can put an occurrence after a disaster on a web map streamlining and decentralizing the information access. Through visualization techniques like heat maps and voronoi diagrams on a map implemented in the D-Aid app and also on a web map everyone can easily get information about amount of victims, their necessities and eminent dangers after disasters. Luana Carine Schunke, Luiz Paulo Luna de Oliveira, Mauricio Cardoso, Marta Becker Villamil 168 My Best Current Friend in a Social Network [abstract]Abstract: Due to its popularity, social networks (SNs) have been subject to different analyses. A research field in this area is the identification of several types of users and groups. To make the identification process easier, a SN is usually represented through a graph. Usual tools to analyze a graph are the centrality measures, which identify the most important vertices within a graph; among them the PageRank (a measure originally designed to classify web pages). Informally, in the context of a SN, the PageRank of a user i represents the probability that another user of the SN is seeing the page of i after a considerable time of navigation in the SN. In this paper, we define a new type of user in a SN: the best current friend. Informally, the idea is to identify, among the friends of a user i, who is the friend k that would generate the highest decrease in the PageRank of i if k stops being his/her friend. This may be useful to identify the users/customers whose friendship/relationship should be a priority to keep. We provide formal definitions, algorithms and some experiments for this subject. Our experiments showed that the best current friend of a user is not necessarily among those who have the highest PageRank in the SN, or among the ones who have lots of friends. Francisco Moreno, Santiago Hernández, Edison Ospina 398 Clustering Heterogeneous Semi-Structured Social Science Datasets [abstract]Abstract: Social scientists have begun to collect large datasets that are heterogeneous and semi-structured, but the ability to analyze such data has lagged behind its collection. We design a process to map such datasets to a numerical form, apply singular value decomposition clustering, and explore the impact of individual attributes or fields by overlaying visualizations of the clusters. This provides a new path for understanding such datasets, which we illustrate with three real-world examples: the Global Terrorism Database, which records details of every terrorist attack since 1970; a Chicago police dataset, which records details of every drug-related incident over a period of approximately a month; and a dataset describing members of a Hezbollah crime/terror network within the U.S. David Skillicorn, Christian Leuprecht 473 CFD post-processing in Unity3D [abstract]Abstract: In architecture and urban design the urban climate on a meso/micro scale is a strong design criterion for outdoor thermal comfort and building’s energy performance. Evaluating the effect of buildings on the local climate and vice versa can be done by computational fluid dynamics (CFD) methods. The results from CFD are typically visualized through post-processing software closely related to the product family of pre-processing and simulation. The built-in functions are made for engineers and lack user-friendliness for real-time exploration of results. To bridge the gap between architect and engineer we propose visualizations based on game engine technology. This paper demonstrates the implementation of CFD to Unity3D conversion and weather data visualization. Matthias Berger, Verina Cristie 596 Helsim: a particle-in-cell simulator for highly imbalanced particle distributions [abstract]Abstract: Helsim is a 3D electro-magnetic particle-in-cell simulator used to simulate the behaviour of plasma in space. Particle-in-cell simulators track the movement of particles through space, with the particles generating and being subjected to various fields (electric, magnetic and or gravitational). Helsim dissociates the particles data structure from the fields, allowing them to be distributed and load- balanced independently and can simulate experiments with highly imbalanced particle distributions with ease. This paper shows weak scaling results of a highly imbalanced particle setup on up to 32 thousand cores. The results validate the basic claims for scalability for imbalanced particle distributions, but also highlights a problem with a workaround we had to implement to circumvent an OpenMPI bug we encountered. Roel Wuyts, Tom Haber, Giovanni Lapenta 724 Efficient visualization of urban simulation data using modern GPUs [abstract]Abstract: Visualization of simulation results in major urban areas is a difficult task. Multi-scale processes and connectivity of the urban environment may require interactive visualization of dynamic scenes with lots of objects at different scales. To visualize these scenes it is not always possible to use standard GIS systems. Wide distribution of high-performance gaming graphics cards has led to the emergence of specialized frameworks, which are able to cope with such kinds of visualization. This paper presents a framework and special algorithms that take full advantage of the GPU to render the urban simulation data over a virtual globe. The experiments on a scalability of the framework have showed that the framework is successfully deals with the visualization of up to two million moving agents and up to eight million of fixed points of interest on top of the virtual globe without detriment to smoothness of the image. Aleksandr Zagarskikh, Andrey Karsakov, Alexey Bezgodov 732 Cloud Technology for Forecasting Accuracy Evaluation of Extreme Metocean Events [abstract]Abstract: The paper describes the approach for ensemble-based simulation within the tasks of extreme metocean events forecasting as an urgent computing problem. The approach is based on the developed conceptual basis of data-flow construction for the simulation-based ensemble forecasting. It was used to develop the architecture for ensemble-based data processing based on cloud computing environment CLAVIRE with extension for urgent computing resource provisioning and scheduling. Finally the solution for ensemble water level forecasting in Baltic Sea was developed as a part of St. Petersburg flood preventing system. Sergey Kosukhin, Sergey Kovalchuk, Alexander Boukhanovsky 320 Co-clustering based approach for Indian monsoon prediction [abstract]Abstract: Prediction of Indian monsoon is a challenging task due to complex dynamics and variability over the years. Skills of statistical predictors that perform well in a set of years are not as good for others. In this paper, we attempt to identify a set of predictors that have high skills for a cluster of years. A co-clustering algorithm, which extracts groups of years, paired with good predictor sets for those years, is used for this purpose. Weighted ensemble of these predictors are used in final prediction. Results on past 65 years data show that the approach is competitive with state of art techniques. Moumita Saha, Pabitra Mitra 139 Agent Based Simulations for the Estimation of Sustainability Indicators [abstract]Abstract: We present a methodology to improve the estimation of several Sustainability Indicators based on the measurement of walking distance to infrastructures combining Agent Based Simulation with Volunteer Geographic Information. Joining these two forces we construct a more realistic and accurate distribution of the infrastructures based on knowledge created by citizens and their perceptions instead of official data sources. A Situated Multi-Agent System is in charge of simulating not only the functional disparity and sociodemographic characteristics of the population but also the geographic reality in a dynamic way. Namely, the system will analyze different geographic barriers for each collective bringing new possibilities to improve the assessment of the needs of the population for a more sustainable development of the city. In this article we will describe the methodology to carry on several sustainability indicator measurements and present the results of the proposed methodology applied to several municipalities. Ander Pijoan, Cruz E. Borges, Iraia Oribe-Garcia, Cristina Martín, Ainhoa Alonso-Vicario 276 Bray-Curtis Metrics as Measure of Liquid State Machine Separation Ability in Function of Connections Density [abstract]Abstract: Separation ability is one of two most important properties of Liquid State Machines used in the Liquid Computing theory. To measure the so-called distance of states that Liquid State Machine can exist in -- different norms and metrics can be applied. Till now we have used the Euclidean distance to tell the distance of states representing different stimulations of simulated cortical microcircuits. In this paper we compare our previously used methods and the approach with Bray-Curtis measure of dissimilarity. Systematic analysis of efficiency and its comparison for a different number of simulated synapses present in the model will be discussed to some extent. Grzegorz Wójcik, Marcin Ważny 365 A First Step to Performance Prediction for Heterogeneous Processing on Manycores [abstract]Abstract: In order to maintain the continuous growth of the performance of computers while keeping their energy consumption under control, the microelecttronic industry develops architectures capable of processing more and more tasks concurrently. Thus, the next generations of microprocessors may count hundreds of independent cores that may differ in their functions and features. As an extensive knowledge of their internals cannot be a prerequisite to their programming and for the sake of portability, these forthcoming computers necessitate the compilation flow to evolve and cope with heterogeneity issues. In this paper, we lay a first step toward a possible solution to this challenge by exploring the results of SPMD type of parallelism and predicting performance of the compilation results so that our tools can guide a compiler to build an optimal partition of task automatically, even on heterogeneous targets. We show on experimental results a very good accuracy of our tools to predict real world performance. Nicolas Benoit, Stephane Louise 468 A decision support system for emergency flood embankment stability [abstract]Abstract: This article presents a decision support system for emergency flood embankment stability. The proposed methodology is based on analysis of data from both a flood embankment measurement network and data generated through numerical modeling. Decisions about the risk of embankment interruption are made on the basis of this analysis. The authors present both the general concept of the system as well as a detailed description the system components. Magdalena Habrat, Michał Lupa, Monika Chuchro, Andrzej Leśniak 422 A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures [abstract]Abstract: Maximizing the data throughput is a very common implementation objective for several streaming applications. Such task is particularly challenging for implementations based on many-core and multi-core target platforms because, in general, it implies tackling several NP-complete combinatorial problems. Moreover, an efficient design space exploration requires an accurate evaluation on the basis of dataflow program execution profiling. The focus of the paper is on the methodology challenges for obtaining accurate profiling measures. Experimental results validate a many-core platform built by an array of Transport Triggered Architecture processors for exploring the partitioning search space based on the execution trace analysis. Malgorzata Michalska, Jani Boutellier, Marco Mattavelli 590 Minimum-overlap clusterings and the sparsity of overcomplete decompositions of binary matrices. [abstract]Abstract: Given a set of $n$ binary data points, a widely used technique is to group its features into $k$ clusters: sets of features for which there is, in turn, a set of data points that has similar values in those features. In the case where $n < k$, an exact decomposition is always possible, and the question of how overlapping are the clusters is of interest. In this paper we approach the question through matrix decomposition, and relate the degree of overlap with the sparsity of one of the resulting matrices. We present i) analytical results regarding bounds on this sparsity, and ii) a heuristic to estimate the minimum amount of overlap that an exact grouping of features into $k$ clusters must have. Happily, adding new data will not alter this minimum amount of overlap. An interpretation of this amount, and its change with $k$, is given for a biological example. Victor Mireles, Tim Conrad 736 Modeling of critical situations in the migration policy implementation [abstract]Abstract: This paper describes an approach for modeling of potentially critical situations in the society. Potentially critical situations is caused by the lack of compliance of current local policies and the desired goals, methods and means of these policies implementation. The modeling approach is proposed to improve the efficiency of the local government management, taking into account potentially critical situations that may arise on a personal level, social group’s level and society as a whole. The use of proposed method is shown by the example of migration policies in St. Petersburg. Sergey Mityagin, Sergey Ivanov, Alexander Boukhanovsky, Iliya Gubarev, Tihonova Olga 450 Parallelization of context-free grammar parsing on GPU using CUDA [abstract]Abstract: During the last decade, increasing interest in parallel programming can be observed. It is caused by a tendency of developing microprocessors as multicore units, that can perform instructions simultaneously. Popular and widely used example of such platform is a graphic processing unit (GPU). Its ability to perform calculations simultaneously is being investigated as a way of improving performance of the complex algorithms. Therefore, GPU has the architectures that allow to use its computational power by programmers and software developers in the same way as CPU. One of these architectures is CUDA platform, developed by nVidia. Purpose of our work was to implement the parallel CYK algorithm, which is one of the most popular and effective parsing algorithms for the context-free languages. The process of parsing is crucial for a systems which are dedicated to work with the natural, biological (like RNA), or artificial languages, i.e. interpreters of scripting languages, compilers, and systems, which concern pattern or natural/biological language recognition. Parallelization of context-free grammar parsing on GPU was done by using CUDA platform. Paper presents a review of existing parallelizations of CYK algorithm in the literature, deliver descriptions of proposed algorithms, and discusses experimental results obtained. We considered algorithms in which each cell of CYK matrix was assigned to the respective thread (processor), each pair of cells assigned to the thread, version with a shared memory, and finally version with limited number of non-terminal. The algorithms were evaluated on five artificial grammars with different number of terminals, non-terminals, size of grammar rules, and different lengths of input sequences. Significant performance improvement (up to about 10x) compared with CPU-based computations was achieved. Olgierd Unold and Piotr Skrzypczak

### Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 1

#### Chair: Mario Cannataro

 759 8th Workshop on Biomedical and Bioinformatics Challenges for Computer Science - BBC2015 [abstract]Abstract: This is the summary of the 8th Workshop on Biomedical and Bioinformatics Challenges for Computer Science - BBC2015 Stefano Beretta, Mario Cannataro, Riccardo Dondi 374 Robust Conclusions in Mass Spectrometry Analysis [abstract]Abstract: A central issue in biological data analysis is that uncertainty, resulting from different factors of variabilities, may change the effect of the events being investigated. Therefore, robustness is a fundamental step to be considered. Robustness refers to the ability of a process to cope well with uncertainties, but the different ways to model both the processes and the uncertainties lead to many alternative conclusions in the robustness analysis. In this paper we apply a framework allowing to deal with such questions for mass spectrometry data. Specifically, we provide robust decisions when testing hypothesis over a case/control population of subject measurements (i.e. proteomic profiles). To this concern, we formulate (i) a reference model for the observed data (i.e., graphs), (ii) a reference method to provide decisions (i.e., test of hypotheses over graph properties) and (iii) a reference model of variability to employ sources of uncertainties (i.e., random graphs). We apply these models to a real-case study, analyzing the mass spectrometry pofiles of the most common type of Renal Cell Carcinoma; the Clear Cell variant. Italo Zoppis, Riccardo Dondi, Massimiliano Borsani, Erica Gianazza, Clizia Chinello, Fulvio Magni, Giancarlo Mauri 612 Modeling of Imaging Mass Spectrometry Data and Testing by Permutation for Biomarkers Discovery in Tissues [abstract]Abstract: Exploration of tissue sections by imaging mass spectrometry reveals abundance of different biomolecular ions in different sample spots, allowing finding region specific features. In this paper we present computational and statistical methods for investigation of protein biomarkers i.e. biological features related to presence of different pathological states. Proposed complete processing pipeline includes data pre-processing, detection and quantification of peaks by using Gaussian mixture modeling and identification of specific features for different tissue regions by performing permutation tests. Application of created methodology provides detection of proteins/peptides with concentration levels specific for tumor area, normal epithelium, muscle or saliva gland regions with high confidence. Michal Marczyk, Grzegorz Drazek, Monika Pietrowska, Piotr Widlak, Joanna Polanska, Andrzej Polanski 336 Fuzzy indication of reliability in metagenomics NGS data analysis [abstract]Abstract: NGS data processing in metagenomics studies has to deal with noisy data that can contain a large amount of reading errors which are difficult to detect and account for. This work introduces a fuzzy indicator of reliability technique to facilitate solutions to this problem. It includes modified Hamming and Levenshtein distance functions that are aimed to be used as drop-in replacements in NGS analysis procedures which rely on distances, such as phylogenetic tree construction. The distances utilise fuzzy sets of reliable bases or an equivalent fuzzy logic, potentially aggregating multiple sources of base reliability. Milko Krachunov, Dimitar Vassilev, Maria Nisheva, Ognyan Kulev, Valeriya Simeonova, Vladimir Dimitrov 559 Pairwise genome comparison workflow in the Cloud using Galaxy [abstract]Abstract: Workflows are becoming the new paradigm in bioinformatics. In general, bioinformatics problems are solved by interconnecting several small software pieces to perform complex analyses. This demands a minimal expertise to create, enact and monitor such tools compositions. In addition bioinformatics is immersed in the big-data territory, facing huge problems to analyse such amount of data. We have addressed these problems by integrating a tools management platform (Galaxy) and a Cloud infrastructure, which prevents moving the big datasets between different locations and allows the dynamic scaling of the computing resources depending on the user needs. The result is a user-friendly platform that facilitates the work of the end-users while performing their experiments, installed in a Cloud environment that includes authentication, security and big-data transfer mechanisms. To demonstrate the suitability of our approach we have integrated in the infrastructure an existing pairwise and multiple genome comparison tool which comprises the management of huge datasets and high computational demands. Óscar Torreño Tirado, Michael T. Krieger, Paul Heinzlreiter, Oswaldo Trelles 645 Iterative Reconstruction from Few-View Projections [abstract]Abstract: In the medical imaging field, iterative methods have become a hot topic of research due to their capacity to resolve the reconstruction problem from a limited number of projections. This gives a good possibility to reduce radiation exposure on patients during the data acquisition. However, due to the complexity of the data, the reconstruction process is still time consuming, especially for 3D cases, even though implemented on modern computer architecture. Time of the reconstruction and high radiation dose imposed on patients are two major drawbacks in computed tomography. With the aim to resolve them effectively, we adapted Least Square QR method with soft threshold filtering technique for few-view image reconstruction and present its numerical validation. The method is implemented using CUDA programming mode and compared to standard SART algorithm. The numerical simulations and qualitative analysis of the reconstructed images show the reliability of the presented method. Liubov A. Flores, Vicent Vidal, Gumersindo Verdú

### Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 2

#### Chair: Riccardo Dondi

 319 GoD: An R-Package based on Ontologies for Prioritization of Genes with respect to Diseases. [abstract]Abstract: Omics sciences are widely used to analyze diseases at a molecular level. Usually, results of omics experiments are a large list of candidate genes, proteins or other molecules. The interpretation of results and the filtering of candidate genes or proteins selected in an experiment is a challenge in some scenarios. This problem is particularly evident in clinical scenarios in which researchers are interested in the behaviour of few molecules related to some specific disease. The filtering requires the use of domain-specific knowledge that is often encoded into ontologies. To support this interpretation, we implemented GoD (Gene ranking based On Diseases), an algorithm that ranks a given set of genes based on ontology annotations. The algorithm orders genes by the semantic similarity computed between annotation of each gene and those describing the selected disease. We tested as proof-of-principle our software using Human Phenotype Ontology (HPO), Gene Ontology (GO) and Disease Ontology (DO) using the semantic similarity measures. The dedicated website is \url{https://sites.google.com/site/geneontologyprioritization/}. Mario Cannataro, Pietro Hiram Guzzi and Marianna Milano 693 Large Scale Comparative Visualisation of Regulatory Networks with TRNDiff [abstract]Abstract: The advent of Next Generation Sequencing technologies has seen explosive growth in genomic datasets, and dense coverage of related organisms, supporting study of subtle, strain-specific variations as a determinant of function. Such data collections present fresh and complex challenges for bioinformatics, those of comparing models of complex relationships across hundreds and even thousands of sequences. Transcriptional Regulatory Network (TRN) structures document the influence of regulatory proteins called Transcription Factors (TFs) on associated Target Genes (TGs). TRNs are routinely inferred from model systems or iterative search, and analysis at these scales requires simultaneous displays of multiple networks well beyond those of existing network visualisation tools [1]. In this paper we describe TRNDiff, an open source tool supporting the comparative analysis and visualization of TRNs (and similarly structured data) from many genomes, allowing rapid identification of functional variations within species. The approach is demonstrated through a small scale multiple TRN analysis of the Fur iron-uptake system of Yersinia, suggesting a number of candidate virulence factors; and through a far larger study based on integration with the RegPrecise database (http://regprecise.lbl.gov) - a collection of hundreds of manually curated and predicted transcription factor regulons drawn from across the entire spectrum of prokaryotic organisms. The tool is presently available in stand-alone and integrated form. Information may be found at the dedicated site http://trndiff.org, which includes example data, a short tutorial and links to a working version of the stand-alone system. The integrated regulon browser is currently available at the demonstration site http://115.146.86.55/RegulonExplorer/index.html. Source code is freely available under a non-restrictive Apache 2.0 licence from the authors’ repository at http://bitbucket.org/biovisml. Xin-Yi Chua, Lawrence Buckingham, James Hogan 30 Epistatic Analysis of Clarkson Disease [abstract]Abstract: Genome Wide Association Studies (GWAS) have predominantly focused on the association between single SNPs and disease. It is probable, however, that complex diseases are due to combined effects of multiple genetic variations, as opposed to single variations. Multi-SNP interactions, known as epistatic interactions, can potentially provide information about causes of complex diseases, and build on previous GWAS looking at associations between single SNPs and phenotypes. By applying epistatic analysis methods to GWAS datasets, it is possible to identify significant epistatic interactions, and map SNPs identified to genes allowing the construction of a gene network. A large number of studies have applied graph theory techniques to analyse gene networks from microarray data sets, using graph theory metrics to identify important hub genes in these networks. In this work, we present a graph theory study of SNP and gene interaction networks constructed for a Clarkson disease GWAS, as a result of applying epistatic interaction methods to identify significant epistatic interactions. This study identifies a number of genes and SNPs with potential roles for Clarkson disease that could not be found using traditional single SNP analysis, including a number located on chromosome 5q previously identified as being of interest for capillary malformation. Alex Upton, Oswaldo Trelles, James Perkins 527 Multiple structural clustering of bromodomains of the bromo and extra terminal (BET) proteins highlights subtle differences in their structural dynamics and acetylated leucine binding pocket [abstract]Abstract: BET proteins are epigenetic readers whose deregulation results in cancer and inflammation. We show that BET proteins (BRD2, BRD3, BRD4 and BRDT) are globally similar with subtle differences in the sequences and structures of their N-terminal bromodomain. Principal component analysis and non-negative matrix factorization reveal distinct structural clusters associated with specific BET family members, experimental methods, and source organisms. Subtle variations in structural dynamics are evident in the acetylated lysine (Kac) binding pocket of BET bromodomains. Using multiple structural clustering methods, we have also identified representative structures of BET proteins, which are potentially useful for developing potential therapeutic agents. Suryani Lukman, Zeyar Aung, Kelvin Sim 633 Parallel Tools for Simulating the Depolarization Block on a Neural Model [abstract]Abstract: The prototyping and the development of computational codes for biological models, in terms of reliability, efficient and portable building blocks allow to simulate real cerebral behaviours and to validate theories and experiments. A critical issue is the tuning of a model by means of several numerical simulations with the aim to reproduce real scenarios. This requires a huge amount of computational resources to assess the impact of parameters that influence the neuronal response. In this paper, we describe how parallel tools are adopted to simulate the so-called depolarization block of a CA1 pyramidal cell of hippocampus. Here, the high performance computing techniques are adopted in order to achieve a more efficient model simulation. Finally, we analyse the performance of this neural model, investigating the scalability and benefits on multi-core and on parallel and distributed architectures. Salvatore Cuomo, Pasquale De Michele, Ardelio Galletti, Giovanni Ponti

### Agent-Based Simulations, Adaptive Algorithms and Solvers (ABS-AAS) Session 1

#### Chair: Maciej Paszynski

 754 Agent-Based Simulations, Adaptive Algorithms and Solvers [abstract]Abstract: The aim of this workshop is to integrate results of different domains of computer science, computational science and mathematics. We invite papers oriented toward simulations, either hard simulations by means of finite element or finite difference methods, or soft simulations by means of evolutionary computations, particle swarm optimization and other. The workshop is most interested in simulations performed by using agent-oriented systems or by utilizing adaptive algorithms, but simulations performed by other kind of systems are also welcome. Agent-oriented system seems to be the attractive tool useful for numerous domains of applications. Adaptive algorithms allow significant decrease of the computational cost by utilizing computational resources on most important aspect of the problem. This year following the challenges of ICCS 2015 theme "Computational Science at the Gates of Nature" we invite submissions using techniques dealing with large simulations, e.g. agents based algorithms dealing with big data, model reduction techniques for large problems, fast solvers for large three dimensional simulations, etc. To give - rather flexible - guidance in the subject, the following, more detailed, topics are suggested. These of theoretical brand, like: (a) multi-agent systems in high-performance computing, (b) efficient adaptive algorithms for big problems, (c) low computational cost adaptive solvers, (d) agent-oriented approach to adaptive algorithms, (e) model reduction techniques for large problems, (f) mathematical modeling and asymptotic analysis of large problems, (g) finite element or finite difference methods for three dimensional or non-stationary problems, (h) mathematical modeling and asymptotic analysis. And those with stress on application sphere: (a) agents based algorithms dealing with big data, (b) application of adaptive algorithms in large simulation, (c) simulation and large multi-agent systems, (d) application of adaptive algorithms in three dimensional finite element and finite difference simulations, (e) application of multi-agent systems in computational modeling, (f) multi-agent systems in integration of different approaches. Maciej Paszynski, Robert Schaefer, Krzysztof Cetnarowicz, David Pardo and Victor Calo 631 Coupling Navier-Stokes and Cahn-Hilliard equations in a two-dimensional annular flow configuration [abstract]Abstract: In this work, we present a novel isogeometric analysis discretization for the Navier-Stokes-Cahn-Hilliard equation, which uses divergence-conforming spaces. Basis functions generated with this method can have higher-order continuity, and allow to directly discretize the higher-order operators present in the equation. The discretization is implemented in PetIGA-MF, a high-performance framework for discrete differential forms. We present solutions in a two-dimensional annulus, and model spinodal decomposition under shear flow. Philippe Vignal, Adel Sarmiento, Adriano Côrtes, Lisandro Dalcin, Victor Calo 656 High-Accuracy Adaptive Modeling of the Energy Distribution of a Meniscus-Shaped Cell Culture in a Petri Dish [abstract]Abstract: Cylindrical Petri dishes embedded in a rectangular waveguide and exposed to a polarized electromagnetic wave are often used to grow cell cultures. To guarantee the success of these cultures, it is necessary to enforce that the specific absorption rate distribution is sufficiently high and uniform over the Petri dish. Accurate numerical simulations are needed to design such systems. These simulations constitute a challenge due to the strong discontinuity of electromagnetic parameters of the materials involved, the relative low value of field within the dish cultures compared with the rest of the domain, and the presence of the meniscus shape developed at the liquid/solid interface. The latter greatly increases the level of complexity of the model in terms of geometry and the intensity of the gradients/singularities of the field solution. In here, we employ a three-dimensional (3D) $hp$-adaptive finite element method using isoparametric elements to obtain highly accurate simulations. We analyse the impact of the geometrical modeling of the meniscus shape cell culture in the $hp$-adaptivity. Numerical results concerning the convergence history of the error indicate the numerical difficulties arisen due to the presence of a meniscus-shaped object. At the same time, the resulting energy distribution shows that to consider such meniscus shape is essential to guarantee the success of the cell culture from the biological point of view. Ignacio Gomez-Revuelto, Luis Emilio Garcia-Castillo and David Pardo 162 Leveraging workflows and clouds for a multi-frontal solver for finite element meshes [abstract]Abstract: Scientific workflows in clouds have been successfully used for automation of large-scale computations, but so far they were applied to the loosely-coupled problems, where most workflow tasks can be processed independently in parallel and do not require high volume of communication. The multi-frontal solver algorithm for finite element meshes can be represented as a workflow, but the fine granularity of resulting tasks and the large communication to computation ratio makes it hard to execute it efficiently in loosely-coupled environments such as the Infrastructure-as-a-Service clouds. In this paper, we hypothesize that there exists a class of meshes that can be effectively decomposed into a workflow and mapped onto a cloud infrastructure. To show that, we have developed a workflow-based multi-frontal solver using the HyperFlow workflow engine, which comprises workflow generation from the elimination tree, analysis of the workflow structure, task aggregation based on estimated computation costs, and distributed execution using a~dedicated worker service that can be deployed in clouds or clusters. The results of our experiments using the workflows of over 10,000 tasks indicate that after task aggregation the resulting workflows of over 100 tasks can be efficiently executed and the overheads are not prohibitive. These results lead us to conclusions that our approach is feasible and gives prospects for providing a generic workflow-based solution using clouds for problems typically considered as requiring HPC infrastructure. Bartosz Balis, Kamil Figiela, Maciej Malawski, Konrad Jopek 571 Multi-pheromone ant colony optimization for socio-cognitive simulation purposes [abstract]Abstract: We present an application of Ant Colony Optimisation (ACO) to simulate socio-cognitive features of a population. We incorporated perspective taking ability to generate three different proportions of ant colonies: Control Sample, High Altercentricity Sample, and Low Altercentricity Sample. We simulated their performances on the Travelling Salesman Problem and compared them with the classic ACO. Results show that all three 'cognitively enabled' ant colonies require less time than the classic ACO. Also, though the best solution is found by the classic ACO, the Control Sample finds almost as good a solution but much faster. This study is offered as an example to illustrate an easy way of defining inter-individual interactions based on stigmergic features of the environment. Mateusz Sekara, Kowalski Michal, Aleksander Byrski, Bipin Indurkhya, Marek Kisiel-Dorohinicki, Dana Samson, Tom Lenaerts

### Agent-Based Simulations, Adaptive Algorithms and Solvers (ABS-AAS) Session 3

#### Chair: Aleksander Byrski

 364 Object Oriented Programming for Partial Differential Equations [abstract]Abstract: After a short introduction to the mathematical modelling of the elastic dynamic problem, which shows the similarity between the governing Partial Differential Equations (PDEs) in different applications, common blocks for Finite Element approximation are identified, and an Object Oriented Programming (OOP) methodology for linear and non-linear, stationary and dynamic problems is presented. Advantages of this approach are commented and some results are shown as examples of this methodology. Elisabete Alberdi Celaya, Juan José Anza Aguirrezabala 667 GPGPU for Difficult Black-box Problems [abstract]Abstract: Difficult black-box problems are required to be solved in many scientific and industrial areas. In this paper, efficient use of a hardware accelerator to implement dedicated solvers for such problems is discussed and studied based on an example of Golomb Ruler problem. The actual solution of the problem is shown based on evolutionary and memetic algorithms accelerated on GPGPU. The presented results prove the supremacy of GPGPU over optimized multicore CPU implementation. Marcin Pietron, Aleksander Byrski, Marek Kisiel-Dorohinicki 558 Multi-variant Planing for Dynamic Problems with Agent-based Signal Modeling [abstract]Abstract: The problem of planning for groups of autonomous beings is gaining attention over the last few years. Real life tasks, like mobile robots coordination or urban traffic management, need robust and flexible solutions. In this paper a new approach to the problem of multi-variant planning in such systems is presented. It assumes use of simple reactive controllers by the beings, however the state observation is enriched by dynamically updated model, which contains planning results. The approach gives promising results in the considered use case, which is the Multi Robot Task Allocation problem. Szymon Szomiński, Wojciech Turek, Małgorzata Żabińska, Krzysztof Cetnarowicz 637 Conditional Synchronization in Multi-Agent Graph-Based Knowledge Systems [abstract]Abstract: Graph transformations provide a well established method for the formal description of modifications of graph-based systems. On the other side such systems can be regarded as multi-agent ones providing a feasible mean for maintaining and manipulating large scale data. This paper deals with the problem of information exchange among agents maintaining different graph-based systems. Graph formalism applied for representing a knowledge maintained by agents is used at the same time to perform graph transformations modeling a knowledge exchange. The consistency of a knowledge represented by the set of agents is ensured by execution of some graph transformations rules by two agents in a parallel way. We suggest that complex operations (sequences of graph transformations) should be introduced instead of the formalism basing on simple unconditional operations. The approach presented in this paper is accompanied by examples concerning the problem of personal data distributed over different places (and maintained by different agents) and transmitted in such an environment\footnote{Financial support for this study was provided from resources of National Center for Research and Development, the grant number NCBiR 0021/R/ID2/2011/01. }. Leszek Kotulski, Adam Sędziwy, Barbara Strug 442 Agent-based approach to WEB exploration process [abstract]Abstract: The paper contains the concept of agent-based search system and monitoring of Web pages. It is oriented at the exploration of limited problem area, covering a given sector of industry or economy. The proposal of agent-based (modular) structure of the system is due to the desire to ease the introduction of modifications or enrichment of its functionality. Commonly used search engines do not offer such a feature. The second part of the article presents a pilot version of the WEB mining system, representing a simplified implementation of the previously presented concept. Testing of the implemented application was executed by referring to the problem area of foundry industry. Andrzej Opaliński, Edward Nawarecki, Stanisława Kluska-Nawarecka

### Agent-Based Simulations, Adaptive Algorithms and Solvers (ABS-AAS) Session 4

#### Chair: Aleksander Byrski

 568 Agent-oriented Foraminifera Habitat Simulation [abstract]Abstract: An agent-oriented software solution for simulation of marine unicellular organisms called foraminifera is presented. Their simplified microhabitat interactions are described and implemented to run the model and verify its flexibility. This group of well fossilizable protists has been selected due to its excellent in fossilio'' record that should help to verify our future long-run evolutionary results. The introduced system is built utilizing PyAge platform and based on easily exchangeable components that may be replaced (also in runtime). Selected experiments considering substantial and technological efficiency were conducted and the obtained results are presented and discussed. Maciej Kazirod, Wojciech Korczynski, Elias Fernandez, Aleksander Byrski, Marek Kisiel-Dorohinicki, Paweł Topa, Jaroslaw Tyszka, Maciej Komosinski 432 Comparison of the structure of equation systems and the GPU multifrontal solver for finite difference, collocation and finite element method [abstract]Abstract: The article is an in-depth comparison of the solving process of the equation systems specific for finite difference, collocation and finite element methods. The paper considers recently developed isogeometric versions of the collocation and finite element methods, employing B-splines for the computations and ensuring C^{p-1} continuity on the borders of elements for the B-splines of the order p. For solving the systems, we use our GPU implementation of the state-of-the-art parallel multifrontal solver, which leverages modern GPU architectures and allows to reduce the complexity. We analyze the structures of linear equation systems resulting from each of the methods and how different structures of matrix lead to different multifrontal solver elimination trees. We also consider the flows of multifrontal solver depending on the originally employed method. Pawel Lipski, Maciej Wozniak, Maciej Paszynski

### Sixth Workshop on Data Mining in Earth System Science (DMESS) Session 1

#### Chair: Jay Larson

 739 Data Mining in Earth System Science (DMESS 2015) [abstract]Abstract: Spanning many orders of magnitude in time and space scales, Earth science data are increasingly large and complex and often represent very long time series, making such data difficult to analyze, visualize, interpret, and understand. Moreover, advanced electronic data storage technologies have enabled the creation of large repositories of observational data, while modern high performance computing capacity has enabled the creation of detailed empirical and process-based models that produce copious output across all these time and space scales. The resulting “explosion” of heterogeneous, multi-disciplinary Earth science data have rendered traditional means of integration and analysis ineffective, necessitating the application of new analysis methods and the development of highly scalable software tools for synthesis, assimilation, comparison, and visualization. This workshop explores various data mining approaches to understanding Earth science processes, emphasizing the unique technological challenges associated with utilizing very large and long time series geospatial data sets. Especially encouraged are original research papers describing applications of statistical and data mining methods—including cluster analysis, empirical orthogonal functions (EOFs), genetic algorithms, neural networks, automated data assimilation, and other machine learning techniques—that support analysis and discovery in climate, water resources, geology, ecology, and environmental sciences research. Forrest M. Hoffman, Jitendra Kumar and Jay Larson 312 Pattern-Based Regionalization of Large Geospatial Datasets Using COBIA [abstract]Abstract: Pattern-based regionalization -- spatial classification of an image into sub-regions characterized by relatively stationary patterns of pixel values -- is of significant interest for conservation, planing, as well as for academic research. A technique called the complex object-based image analysis (COBIA) is particularly well-suited for pattern-based regionalization of very large spatial datasets. In COBIA image is subdivided into a regular grid of local blocks of pixels (complex objects) at minimal computational cost. Further analysis is performed on those blocks which represent local patterns of pixel-based variable. A variant of COBIA presented here works on pixel-classified images, uses a histogram of co-occurrence pattern features as block attribute, and utilizes the Jensen-Shannon divergence to measure a distance between any two local patterns. In this paper the COBIA concept is utilized for unsupervised regionalization of land cover dataset (pixel-classified Landsat images) into landscape types -- characteristic patterns of different land covers. This exploratory technique identifies and delineates landscape types using a combination of segmentation of a grid of local patterns with clustering of the segments. A test site with 3.5 x 10^8 pixels is regionalized in just few minutes using a standard desktop computer. Computational efficiency of presented approach allows for carrying out regionalizations of various high resolution spatial datasets on continental or global scales. Tomasz Stepinski, Jacek Niesterowicz, Jaroslaw Jasiewicz 720 Fidelity of Precipitation Extremes in High Resolution Global Climate Simulations [abstract]Abstract: Precipitation extremes have tangible societal impacts. Here, we assess if current state of the art global climate model simulations at high spatial resolutions capture the observed behavior of precipitation extremes in the past few decades over the continental US. We design a correlation-based regionalization framework to quantify precipitation extremes, where samples of extreme events for a grid box may also be drawn from neighboring grid boxes with statistically equal means and statistically significant temporal correlations. We model precipitation extremes with the Generalized Extreme Value (GEV) distribution fits to time series of annual maximum precipitation. Non-stationarity of extremes is captured by including a time-dependent parameter in the GEV distribution. Our analysis reveals that the high-resolution model substantially improves the simulation of stationary precipitation extreme statistics particularly over the Northwest Pacific coastal region and the Southeast US. Observational data exhibits significant non-stationary behavior of extremes only over some parts of the Western US, with declining trends in the extremes. While the high resolution simulations improve upon the low resolution model in simulating this non-stationary behavior, the trends are statistically significant only over some of those regions. Salil Mahajan, Katherine Evans, Marcia Branstetter, Valentine Anantharaj, Juliann Leifeld 729 On Parallel and Scalable Classification and Clustering Techniques for Earth Science Datasets [abstract]Abstract: One observation of earth data science is their massive increase in volume (e.g. higher quality measurements) or the emerging high number of dimensions (e.g. hyperspectral bands in satellite observations). Traditional data mining tools (R, Matlab, etc.) are partly becoming infeasible to be used with those datasets. Parallel and scalable techniques bear the potential to overcome these limits while our analysis revealed that a wide variety of new implementations are not all suited for data mining tasks in earth science. This contribution gives reasons by focusing on two distinct parallel and scalable data mining techniques used in High Performance Computing (HPC) environments in earth science case studies: (a) Parallel Density-based Spatial Clustering of Applications with Noise (DBSCAN) for automated outlier detection in time series data and (b) parallel classification using multi-class Support Vector Machines (SVMs) for land cover identification in multi-spectral satellite datasets. In the paper we also compare recent ‘big data stacks’ vs. traditional HPC techniques. Markus Götz, Matthias Richerzhagen, Gabriele Cavallaro, Christian Bodenstein, Philipp Glock, Morris Riedel, Jon Atli Benediktsson 322 Completion of a sparse GLIDER database using multi-iterative Self-Organizing Maps (ITCOMP SOM) [abstract]Abstract: We present a novel approach named ITCOMP SOM that uses iterative self-organizing maps (SOM) to progressively reconstruct missing data in a highly correlated multidimensional dataset. This method was applied for the completion of a complex oceanographic data-set containing glider data from the EYE of the Levantine experiment of the EGO project. ITCOMP SOM provided reconstructed temperature and salinity profiles that are consistent with the physics of the phenomenon they sampled. A cross-validation test was performed and validated the approach, providing a root mean square error of providing a root mean square error of 0,042°C for the reconstruction of the temperature profiles and 0,008 PSU for the simultaneous reconstruction of the salinity profiles. Anastase - Alexander Charantonis, Pierre Testor, Laurent Mortier, Fabrizio D'Ortenzio, Sylvie Thiria 698 A Feature-first Approach to Clustering for Highlighting Regions of Interest in Scientific Data [abstract]Abstract: We present a simple clustering algorithm that classifies the points of a dataset by a combination of scalar variables' values as well as spatial locations. How heavily the spatial locations impact the algorithm is a tunable parameter. With no impact the algorithm bins the data by calculating a histogram and classifies each point by a bin ID. With full impact, points are bunched together with their neighbors regardless of value. This approach is unsurprisingly very sensitive to this weighting; a sampling of possible values yields a wide variety of classifications. However, we have found that when tuned just right it is indeed possible to extract meaningful features from the resulting clustering. Furthermore, the principles behind our development of this technique are also applicable in both tuning the algorithm as well as in selecting data regions. In this paper we will provide the details of design and implementation and demonstrate using the auto-tuned approach to extract interesting regions of real scientific data. Our target application is data derived from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) sensors. Robert Sisneros

### DDDAS-Dynamic Data Driven Applications Systems and Large-Scale-Big-Data & Large-Scale-Big-Computing (DDDAS-LS) Session 2

#### Chair: Frederica Darema

 561 Spectral Validation of Measurements in a Vehicle Tracking DDDAS [abstract]Abstract: Vehicle tracking in adverse environments is a challenging problem because of the high number of factors constraining their motion and possibility of frequent occlusion. In such conditions, identification rates drop dramatically. Hyperspectral imaging is known to improve the robustness of target identification by recording extended data in many wavelengths. However, it is impossible to transmit such a high rate data in real time with a conventional full hyperspectral sensor. Thus, we present a persistent ground-based target tracking system, taking advantage of a state-of-the-art, adaptive, multi-modal sensor controlled by Dynamic Data Driven Applications Systems (DDDAS) methodology. This overcomes the data challenge of hyperspectral tracking by only using spectral data as required. Spectral features are inserted in a feature matching algorithm to identify spectrally likely matches and simplify multidimensional assignment algorithm. The sensor is tasked for spectra acquisition by the prior estimates from the Gaussian Sum Filter and foreground mask generated by the background subtraction. Prior information matching the target features is used to tackle false negatives in the background subtraction output. The proposed feature-aided tracking system is evaluated in a challenging scene with a realistic vehicular simulation. Burak Uzkent, Matthew J. Hoffman, Anthony Vodacek 567 Dynamic Data-Driven Application System (DDDAS) for Video Surveillance User Support [abstract]Abstract: Human-machine interaction mixed initiatives require a pragmatic coordination between different systems. Context understanding is established from the content, analysis, and guidance from query-based coordination between users and machines. Inspired by Level 5 Information Fusion ‘user refinement’, a live-video computing (LVC) structure is presented for user-based query access of a data-base management of information. Information access includes multimedia fusion of query-based text, images, and exploited tracks which can be utilized for context assessment, content-based information retrieval (CBIR), and situation awareness. In this paper, we explore new developments in dynamic data-driven application systems (DDDAS) of context analysis for user support. Using a common image processing data set, a system-level time savings is demonstrated using a query-based approach in a context, control, and semantic-aware information fusion design Erik Blasch, Alex Aved 630 Multi-INT Query Language for DDDAS Designs [abstract]Abstract: Context understanding is established from the content, analysis, and guidance from query-based coordination between users and machines. In this manuscript, a live-video computing (LVC) approach is presented for access, comprehension and management of information for context assessment. Context assessment includes multimedia fusion of query-based text, images, and exploited tracks which can be utilized for image retrieval. In this paper, we explore the developments in database systems to enable context to be utilized in user-based queries for video tracking content extraction. Using a common image processing data set, we demonstrate activity analysis with context, privacy, and semantic-aware in a Dynamic Data-Driven Application System (DDDAS). Alex Aved, Erik Blasch 683 A DDDAS Plume Monitoring System with Reduced Kalman Filter [abstract]Abstract: A new dynamic data-driven application system (DDDAS) is proposed in this article to dynamically estimate a concentration plume and to plan optimal paths for unmanned aerial vehicles (UAVs) equipped with environmental sensors. The proposed DDDAS dynamically incorporates measured data from UAVs into an environmental simulation while simultaneously steering measurement processes. The main idea is to employ a few time-evolving proper orthogonal decomposition (POD) modes to simulate a coupled linear system, and to simultaneously measure plume concentration and plume source distribution via a reduced Kalman filter. In order to maximize the information gain, UAVs are dynamically driven to hot spots chosen based on the POD modes using a greedy algorithm. We demonstrate the efficacy of the data assimilation and control strategies in a numerical simulation and a field test. Liqian Peng, Matthew Silic, Kamran Mohseni 685 A Dynamic Data Driven Approach for Operation Planning of Microgrids [abstract]Abstract: Distributed generation resources (DGs) and their utilization in large-scale power systems are attracting more and more utilities as they are becoming more qualitatively reliable and economically viable. However, uncertainties in power generation from DGs and fluctuations in load demand must be considered when determining the optimal operation plan for a microgrid. In this context, a novel dynamic data driven approach is proposed for determining the real-time operation plan of an electric microgrid while considering its conflicting objectives. In particular, the proposed approach is equipped with three modules: 1) a database including the real-time microgrid topology data (i.e., power demand, market price for electricity, etc.) and the data for environmental factors (i.e., solar radiation, wind speed, temperature, etc.); 2) a simulation, in which operation of the microgrid is simulated with embedded rule-based scale identification procedures; 3) a multi-objective optimization module which finds the near-optimal operation plan in terms of minimum operating cost and minimum emission using a particle-filtering based algorithm. The complexity of the optimization depends on the scale of the problem identified from the simulation module. The results obtained from the optimization module are sent back to the microgrid system to enhance its operation. The experiments conducted in this study have demonstrated the power of the proposed approach in real-time assessment and control of operation in microgrids. Xiaoran Shi, Haluk Damgacioglu, Nurcin Celik

### DDDAS-Dynamic Data Driven Applications Systems and Large-Scale-Big-Data & Large-Scale-Big-Computing (DDDAS-LS) Session 4

#### Chair: Frederica Darema

 470 Bayesian Computational Sensor Networks: Small-Scale Structural Health Monitoring [abstract]Abstract: The Bayesian Computational Sensor Network methodology is applied to small-scale structural health monitoring. A mobile robot equipped with vision and ultrasound sensor maps small-scale structures for damage localizes itself and the damage in the map. The combination of vision and ultrasound reduces the uncertainty in damage localization. The data storage and analysis takes place exploiting cloud computing mechanisms, and there is also an off-line computational model calibration component which returns information to the robot concerning updated on-board models as well as proposed sampling points. The approach is validated in a set of physical experiments. Wenyi Wang, Anshul Joshi, Nishith Tirpankar, Philip Erickson, Michael Cline, Palani Thangaraj, Tom Henderson 482 Highly Parallel Algorithm for Large Data In Core and Out Core Triangulation in E2 and E3 [abstract]Abstract: A triangulation of points in E^2, or a tetrahedronization of points in E^3, is used in many applications. It is not necessary to fulfill the Delaunay criteria in all cases. For large data (more then 5∙ 〖10〗^7 points), parallel methods are used for the purpose of decreasing run time. A new approach for fast, effective and highly parallel CPU and GPU triangulation, or tetrahedronization, of large data sets in E^2 or E^3 suitable for in core and out core memory processing, is proposed. Experimental results proved that the resulting triangulation/tetrahedralization, is close to the Delaunay triangulation/tetrahedralization. It also demonstrates the applicability of the method proposed in applications. Michal Smolik, Vaclav Skala 672 Resilient and Trustworthy Dynamic Data-Driven Application Systems for Crisis Environments [abstract]Abstract: Future cyber information systems are required to determine network performance including trust, resiliency, and timeliness. Using the Dynamic Data-Driven Application Systems (DDDAS) concepts; we develop a method for crisis management that incorporates sensed data, performance models, theoretical analysis, and service-based software. Using constructs from security and resiliency theories, the motivating concept is Resilient-DDDAS-as-a-Cloud Service (rDaaS). Service-based approaches allow a system to react as needed to the dynamics of the situation. The Resilient Cloud Middleware supports the analysis the data stored and retrieved in the cloud, management of processes, and coordination with the end user/application. The r-DaaS concept is demonstrated with a nuclear plant example for emergency response that demonstrates the importance of the DDDAS system level performance. Youakim Badr, Salim Hariri, Erik Blasch 216 Efficient Execution of Replicated Transportation Simulations with Uncertain Vehicle Trajectories [abstract]Abstract: Many Dynamic Data-Driven Application Systems (DDDAS) use replicated simulations to project possible future system states. In many cases there are substantial similarities among these different replications. In other cases output statistics are independent of certain simulation computations. This paper explores computational methods to exploit these properties to improve efficiency. We discuss a new algorithm to speed up the execution of replicated vehicle traffic simulations, where the output statistics of interest focus on one or more attributes such as the trajectory of a certain “target” vehicle. By focusing on correctly reproducing the behavior of the target vehicle and its interaction with other modeled entities across the different replications and modifying the event handling mechanism the execution time can be reduced on both serial and parallel machines. A speculative execution method using a tagging mechanism allows this speedup to occur without loss of accuracy in the output statistics. Philip Pecher, Michael Hunter, Richard Fujimoto 613 Adapting Stream Processing Framework for Video Analysis [abstract]Abstract: Stream processing (SP) became relevant mainly due to inexpensive and hence ubiquitous deployment of sensors in many domains (e.g., environmental monitoring, battle field monitoring). Other continuous data generators (web clicks, traffic data, network packets, mobile devices) have also prompted processing and analysis of these streams for applications such as traffic congestion/ accidents, network intrusion detection, and personalized marketing. Image processing has been researched for several decades. Recently there is emphasis on video stream analysis for situation monitoring due to the ubiquitous deployment of video cameras and unmanned aerial vehicles for security and other applications. This paper elaborates on the research and development issues that need to be addressed for extending the traditional stream processing framework for video analysis, especially for situation awareness. This entails extensions to: data model, operators and language for expressing complex situations, QoS specifications and algorithms needed for their satisfaction. Specifically, this paper demonstrates inadequacy of current data representation (e.g., relation and arrable) and querying capabilities to infer long-term research and development issues. S Chakravarthy, A Aved, S Shirvani, M Annappa, E Blasch

### Dynamic Data Driven Applications Systems (DDDAS) Session 1

 215 Ensemble Learning for Dynamic Data Assimilation [abstract]Abstract: The organization of an ensemble of initial perturbations by a nonlinear dynamical system can produce highly non-Gaussian patterns, evidence of which is clearly observed in position-amplitude-scale features of coherent fluids. The true distribution of the ensemble is unknown, in part because models are in error and imperfect. A variety of distributions have been proposed in the context of Bayesian inference, including for example, mixture and kernel models. We contend that seeking posterior modes in non-Gaussian inference is fraught with heightened sensitivity to model error and demonstrate this fact by showing that a large component of the total variance remains unaccounted for as more modes emerge. Further, we show that in the presence of bias, this unaccounted variance slows convergence and produces distributions with lower information that require extensive auxiliary clean up procedures such as resampling. These procedures are difficult in large-scale problems where ensemble members may be generated through myriad schemes. We show that by treating the estimation problem entailed as a regression machine, multiple objectives can be incorporated in inference. The relative importance of these objectives can morph over time and can be dynamically adjusted by the data. In particular, we show that both variance reduction and nonlinear modes can be targeted using a stacked cascade generalization. We demonstrate this approach by constructing a new sequential filter called the Boosted Mixture Ensemble Filter and illustrating this on a lorenz system. Sai Ravela 504 A Method for Estimating Volcanic Hazards [abstract]Abstract: This paper presents one approach to determining the hazard threat to a locale due to a large volcanic avalanche. The methodology employed includes large-scale numerical simulations, field data reporting the volume and runout of flow events, and a detailed statistical analysis of uncertainties in the modeling and data. The probability of a catastrophic event impacting a locale is calculated, together with a estimate of the uncertainty in that calculation. By a careful use of simulations, a hazard map for an entire region can be determined. The calculation can be turned around quickly, and the methodology can be applied to other hazard scenarios. E Bruce Pitman and Abani Patra 55 Forecasting Volcanic Plume Hazards With Fast UQ [abstract]Abstract: This paper introduces a numerically-stable multiscale scheme to efficiently generate probabilistic hazard maps for volcanic ash transport using models of transport, dispersion and wind. The scheme relies on graph-based algorithms and low-rank approximations of the adjacency matrix of the graph. This procedure involves representing both the parameter space and physical space by a weighted graph. A combination of clustering and low rank approximation is then used to create a good approximation of the original graph. By performing a multiscale data sampling, a well-conditioned basis of a low rank Gaussian kernel matrix, is identified and used for out-of-sample extensions used in generating the hazard maps. Ramona Stefanescu, Abani Patra, M. I Bursik, E Bruce Pitman, Peter Webley, Matthew D. Jones 45 Forest fire propagation prediction based on overalapping DDDAS forecasts [abstract]Abstract: The effects of forest fires cause a widespread devastation throughout the world every year. A good prediction of fire behavior can help on coordination and management of human and material resources in the extinction of these emergencies. Given the high uncertainty of fire behavior and the difficulty of extracting information required to generate accurate predictions, one system able to adapt to fire dynamics considering the uncertainty of the data is necessary. In this work two different systems based on Dynamic Data Driven Application are applied and a new probabilistic method based on the combination of both approaches is presented. This new method uses the computational power provided by high performance computing systems to adapt the chances in these kind of dynamic environments. Tomás Artés, Adrián Cardil, Ana Cortés, Tomàs Margalef, Domingo Molina, Lucas Pelegrín, Joaquín Ramírez 533 Towards an Integrated Cyberinfrastructure for Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience of Wildfires [abstract]Abstract: Wildfires are critical for ecosystems in many geographical regions. However, our current urbanized existence in these environments is inducing this ecological balance to evolve into a different dynamic leading to the biggest fires in history. Wildfire wind speeds and directions change in an instant, and first responders can only be effective if they take action as quickly as the conditions change. What is lacking in disaster management today is a system integration of real-time sensor networks, satellite imagery, near-real time data management tools, wildfire simulation tools, and connectivity to emergency command centers before, during and after a wildfire. As a first time example of such an integrated system, the WIFIRE project is building an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. This paper summarizes the approach and early results of the WIFIRE project to integrate networked observations, e.g., heterogeneous satellite data and real-time remote sensor data with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire’s Rate of Spread. Ilkay Altintas, Jessica Block, Raymond de Callafon, Daniel Crawl, Charles Cowart, Amarnath Gupta, Mai H. Nguyen, Hans-Werner Braun, Jurgen Schulze, Michael Gollner, Arnaud Trouve, Larry Smarr

 761 Matrix Completion via Fast Alternating Least Squares [abstract]Abstract: We develop a new scalable method for matrix completion via nuclear-norm regularization and alternating least squares. The algorithm has an EM flavor, which dramatically reduces the computational cost per iteration at the cost of more iterations. *joint work with Rahul Mazumder, Jason Lee and Reza Zadeh. Trevor Hastie 93 Stable Autoencoding: A Flexible Framework for Regularized Low-Rank Matrix Estimation [abstract]Abstract: Low-rank matrix estimation plays a key role in many scientific and engineering tasks, including collaborative filtering and image denoising. Low-rank procedures are often motivated by the statistical model where we observe a noisy matrix drawn from some distribution with expectation assumed to have a low-rank representation; the statistical goal is then to recover the signal from the noisy data. Given this setup, we develop a framework for low-rank matrix estimation that allows us to transform noise models into regularization schemes via a simple parametric bootstrap. Effectively, our procedure seeks an autoencoding basis for the observed matrix that is robust with respect to the specified noise model. In the simplest case, with an isotropic noise model, our procedure is equivalent to a classical singular value shrinkage estimator. For non-isotropic noise models, however, our method does not reduce to singular value shrinkage, and instead yields new estimators that perform well in experiments. Moreover, by iterating our stable autoencoding scheme, we can automatically generate low-rank estimates without specifying the target rank as a tuning parameter. Julie Josse, Stefan Wager 349 Finding Top UI/UX Design Talent on Adobe Behance [abstract]Abstract: The Behance social network allows professionals of diverse artistic disciplines to exhibit their work and connect amongst each other. We investigate the network properties of the UX/UI designer subgraph. Considering the subgraph is motivated by the idea that professionals in the same discipline are more likely to give a realistic assessment of a colleague's work. We therefore developed a metric to assess the in uence and importance of a specic member of the community based on structural properties of the subgraph and additional measures of prestige. For that purpose, we identied appreciations as a useful measure to include in a weighted PageRank algorithm, as it adds a notion of perceived quality of the work in the artist's portfolio to the ranking, which is not contained in the structural information of the graph. With this weighted PageRank, we identied locations that have a high density of in uential UX/UI designers. Susanne Halstead, Daniel Serrano, Scott Proctor 753 Graphs, Matrices, and the GraphBLAS: Seven Good Reasons [abstract]Abstract: The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data complexity, (3) security, (4) mathematical complexity, (5) theoretical analysis, (6) serial performance, and (7) parallel performance. Implementing graph algorithms using matrix-based approaches provides a number of promising solutions to these challenges. The GraphBLAS standard (istc-bigdata.org/GraphBlas) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. The GraphBLAS mathematically defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the GraphBLAS and describes how the GraphBLAS can be used to address many of the challenges associated with analysis of graphs. Jeremy Kepner

 757 Workshop on Large Scale Computational Physics - LSCP [abstract]Abstract: The LSCP workshop focuses on symbolic and numerical methods and simulations, algorithms and tools (software and hardware) for developing and running large-scale computations in physical sciences. Special attention goes to parallelism, scalability and high numerical precision. System architectures are also of interest as long as they are supporting physics related calculations, such as: massively parallel systems, GPUs, many-integrated-cores, distributed (cluster, grid/cloud) computing, and hybrid systems. Topics are chosen from areas including: theoretical physics (high energy physics, nuclear physics, astrophysics, cosmology, quantum physics, accelerator physics), plasma physics, condensed matter physics, chemical physics, molecular dynamics, bio-physical system modeling, material science/engineering, nanotechnology, fluid dynamics, complex and turbulent systems, and climate modeling. Elise de Doncker, Fukuko Yuasa 96 The Particle Accelerator Simulation Code PyORBIT [abstract]Abstract: The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT is an open source code accessible to the public through the Google Open Source Projects Hosting service. Andrei Shishlo 115 Simulations of several finite-sized objects in plasma [abstract]Abstract: Interaction of plasma with finite-sized objects is one of central problems in the physics of plasmas. Since object charging is often nonlinear and involved, it is advisable to address this problem with numerical simulations. First-principle simulations allow studying trajectories of charged plasma particles in self-consistent force fields. One of such approaches is the particle-in-cell (PIC) method, where the use of spatial grid for the force calculation significantly reduces the computational complexity. Implementing finite-sized objects in PIC simulations is often a challenging task. In this work we present simulation results and discuss the numerical representation of objects in the DiP3D code, which enables studies of several independent objects in various plasma environments. Wojciech Miloch 196 DiamondTorre GPU implementation algorithm of the RKDG solver for fluid dynamics and its using for the numerical simulation of the bubble-shock interaction problem [abstract]Abstract: In this paper the solver based upon the RKDG method for solving three-dimensional Euler equations of gas dynamics is considered. For the numerical scheme the GPU implementation algorithm called DiamondTorre is used, which helps to improve the performance speed of calculations. The problem of the interaction of a spherical bubble with a planar shock wave is considered in the three-dimensional setting. The obtained calculations are in agreement with the known results of experiments and numerical simulations. The calculation results are obtained with the use of the PC. Boris Korneev, Vadim Levchenko 460 Optimal Temporal Blocking for Stencil Computation [abstract]Abstract: Temporal blocking is a class of algorithms which reduces the required memory bandwidth (B/F ratio) of a given stencil computation, by “blocking” multiple time steps. In this paper, we prove that a lower limit exists for the reduction of the B/F attainable by temporal blocking, under certain conditions. We introduce the PiTCH tiling, an example of temporal blocking method that achieves the optimal B/F ratio. We estimate the performance of PiTCH tiling for various stencil applications on several modern CPUs. We show that PiTCH tiling achieves 1.5 ∼ 2 times better B/F reduction in three-dimensional applications, compared to other temporal blocking schemes. We also show that PiTCH tiling can remove the bandwidth bottleneck from most of the stencil applications considered. Takayuki Muranushi, Junichiro Makino

 684 A Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel [abstract]Abstract: The porting of a key kernel in the tracer advection routines of the Community Atmosphere Model - Spectral Element (CAM-SE) to use Graphics Processing Units (GPUs) using OpenACC is considered in comparison to an existing CUDA FORTRAN port. The development of the OpenACC kernel for GPUs was substantially simpler than that of the CUDA port. Also, OpenACC performance was about 1.5x slower than the optimized CUDA version. Particular focus is given to compiler maturity regarding OpenACC implementation for modern fortran, and it is found that the Cray implementation is currently more mature than the PGI implementation. Still, for the case that ran successfully on PGI, the PGI OpenACC runtime was slightly faster than Cray. The results show encouraging performance for OpenACC implementation compared to CUDA while also exposing some issues that may be necessary before the implementations are suitable for porting all of CAM-SE. Most notable are that GPU shared memory should be used by future OpenACC implementations and that derived type support should be expanded. Matthew Norman, Jeffrey Larkin, Aaron Vose and Katherine Evans 585 OpenCL vs OpenACC: lessons from development of lattice QCD simulation code [abstract]Abstract: OpenCL and OpenACC are generic frameworks for heterogeneous programming using CPU and accelerator devices such as GPUs. They have contrasting features: the former explicitly controls devices through API functions, while the latter generates such procedures along a guide of the directives inserted by a programmer. In this paper, we apply these two frameworks to a general-purpose code set for numerical simulations of lattice QCD, which is a computational physics of elementary particles based on the Monte Carlo method. The fermion matrix inversion, which is usually the most time-consuming part of the lattice QCD simulations, is off-loaded to the accelerator devices. From a viewpoint of constructing reusable components based on the object-oriented programming and also tuning the code to achieve high performance, we discuss feasibility of these frameworks through the practical implementations. Hideo Matsufuru, Sinya Aoki, Tatsumi Aoyama, Kazuyuki Kanaya, Shinji Motoki, Yusuke Namekawa, Hidekatsu Nemura, Yusuke Taniguchi, Satoru Ueda, Naoya Ukita 515 Application of GRAPE9-MPX for high precision calculation in particle physics and performance results [abstract]Abstract: There are scientific applications which require calculations with high precision such as Feynman loop integrals and orbital integrations. These calculations also need to be accelerated. We have been developing dedicated accelerator systems which consist of processing elements for high precision arithmetic operations and a programing interface. GRAPE9-MPX is our latest system with multiple Field Programmable Gate Array (FPGA) boards on which our developed PEs are implemented. We present the performance results for GRAPE9-MPX extended to have upto 16 FPGA boards for quadruple/hexuple/octuple-precision with some optimization. The achieved performance for a Feynman loop integral with 12 FPGA boards is 26.5 Gflops for quadruple precision. We also give an analytical consideration for the performance results. Hiroshi Daisaka, Naohito Nakasato, Tadashi Ishikawa, Fukuko Yuasa 734 Adaptive Integration for 3-loop Feynman Diagrams with Massless Propagators [abstract]Abstract: We apply multivariate adaptive integration to problems arising from self-energy Feynman loop diagrams with massless internal lines. Results are obtained with the ParInt integration software package, which is layered over MPI (Message Passing Interface) and incorporates advanced parallel computation techniques such as load balancing among processes that may be distributed over a network of nodes. To solve the problems numerically we introduce a parameter r in a factor of the integrand function. Some problem categories allow setting r = 0; other cases require an extrapolation as r -> 0. Furthermore we apply extrapolation with respect to the dimensional regularization parameter by setting the dimension n = 4 - 2*eps and extrapolating as eps -> 0. Timing results show near optimal parallel speedups with ParInt for the problems at hand. Elise de Doncker, Fukuko Yuasa, Omofolakunmi Olagbemi

 480 An algebraic approach to combining classifiers [abstract]Abstract: In distributed classification, each learner observes its environment and deduces a classifier. As a learner has only a local view of its environment, classifiers can be exchanged among the learners and integrated, or merged, to improve accuracy. However, the operation of merging is not defined for most classifiers. Furthermore, the classifiers that have to be merged may be of different types in settings such as ad-hoc networks in which several generations of sensors may be creating classifiers. We introduce decision spaces as a framework for merging possibly different classifiers. We formally study the merging operation as an algebra, and prove that it satisfies a desirable set of properties. The impact of time is discussed for the two main data mining settings. Firstly, decision spaces can naturally be used with non-stationary distributions, such as the data collected by sensor networks, as the impact of a model decays over time. Secondly, we introduce an approach for stationary distributions, such as homogeneous databases partitioned over different learners, which ensures that all models have the same impact. We also present a method using storage flexibly to achieve different types of decay for non-stationary distributions. Philippe Giabbanelli, Joseph Peters 36 Power LBP: A novel texture operator for smiling and neutral facial display classification [abstract]Abstract: Texture operators are commonly used to describe image content for many purposes. Recently they found its application in the task of emotion recognition, especially using local binary pattern method, LBP. This paper introduces a novel texture operator called power LBP, which defines a new ordering schema based on absolute intensity differences. Its definition as well as interpretation are given. The performance of suggested solution is evaluated on the problem of smiling and neutral facial display recognition. In order to evaluate the power LBP operator accuracy, its discriminative capacity work is compared to several members of the LPB family. Moreover, the influence of applied classification approach is also considered, by presenting results for k-nearest neighbour, support vector machine, and template matching classifiers. Furthermore, results for several databases are compared. Bogdan Smolka, Karolina Nurzynska 657 Incremental Weighted One-Class Classifier for Mining Stationary Data Streams [abstract]Abstract: Data streams and big data analytics is among the most popular contemporary machine learning problems. More and more often real-life problems could generate massive and continuous amounts of data. Standard classifiers cannot cope with a large volume of the training set and/or changing nature of the environment. In this paper, we deal with a problem of continuously arriving objects, that with each time interval may contribute new, useful knowledge to the patter classification system. This is known as stationary data stream mining. One-class classification is a very useful tool for stream analysis, as it can be used for tackling outliers, noise, appearance of new classes or imbalanced data to name a few. We propose a novel version of incremental One-Class Support Vector Machine, that assigns weights to each object according to its level of significance. This allows to train more robust one-class classifiers on incremental streams. We present two schemes for estimating weights for new, incoming data and examine their usefulness on a number of benchmark datasets. We also analyze time and memory requirements of our method. Results of experimental investigations prove, that our method can achieve better one-class recognition quality than algorithms used so far. Bartosz Krawczyk and Michal Wozniak 659 Wagging for Combining Weighted One-Class Support Vector Machines [abstract]Abstract: Most of machine learning problems assume, that we have at our disposal objects originating from two or more classes. By learning from a representative training set a classifier is able to estimate proper decision boundaries. However, in many real-life problems obtaining objects from some of the classes is difficult, or even impossible. In such cases, we are dealing with one-class classification, or learning in the absence of counterexamples. Such recognition systems must display a high robustness to new, unseen objects that may belong to an unknown class. That is why ensemble learning has become an attractive perspective in this field. In our work, we propose a novel one-class ensemble classifier, based on wagging. A weighted version of boosting is used, and the output weights for each object are used directly in the process of training Weighted One-Class Support Vector Machines. This introduces a diversity into the pool of one-class classifiers and extends the competence of formed ensemble. Experimental analysis, carried out on a number of benchmarks and backed-up with statistical analysis proves that the proposed method can outperform state-of-the-art ensembles dedicated to one-class classification. Bartosz Krawczyk, Michal Wozniak

 388 Statistical Inversion of Absolute Permeability in Single Phase Darcy Flow [abstract]Abstract: In this paper, we formulate the permeability inverse problem in the Bayesian framework using total variation (TV) and $\ell_p$ regularization prior. We use the Markov Chain Monte Carlo (MCMC) method for sampling the posterior distribution to solve the ill-posed inverse problem. We present simulations to estimate the distribution for each pixel for the image reconstruction of the absolute permeability. Thilo Strauss, Xiaolin Fan, Shuyu Sun, Taufiquar Khan 32 An enhanced velocity multipoint flux mixed finite element method for Darcy flow on non-matching hexahedral grids [abstract]Abstract: This paper proposes a new enhanced velocity method to directly construct a flux-continuous velocity approximation with multipoint flux mixed finite element method on subdomains. This gives an efficient way to perform simulations on multiblock domains with non-matching hexahedral grids. We develop a reasonable assumption on geometry, discuss implementation issues, and give several numerical results with slightly compressible single phase flow. Benjamin Ganis, Mary Wheeler, Ivan Yotov 124 A compact numerical implementation for solving Stokes equations using matrix-vector operations [abstract]Abstract: In this work, a numerical scheme is implemented to solve Stokes equations based on cell-centered finite difference over staggered grid. In this scheme, all the difference operations have been vectored thereby eliminating loops. This is particularly important when using programming languages that require interpretations, e.g., Matlab and Python. Using this scheme, the execution time becomes significantly smaller compared with non-vectored operations and also become comparable with those languages that require no repeated interpretations like FORTRAN, C, etc. This technique has also been applied to Navier-Stokes equations under laminar flow conditions. Tao Zhang, Amgad Salama, Shuyu Sun, Hua Zhong 265 Numerical Models for the Simulation of Aeroacoustic Phenomena [abstract]Abstract: In the development of a numerical model for aeroacoustic problems, two main issues arise: which level of physical approximation to adopt and which numerical scheme is the most appropriate. It is possible to consider a hierarchy of physical aproximations, ranging from the wave equation, without or with convective effects, to the linearized Euler and Navier-Stokes equations, as well as a wide range of high-order numerical schemes, ranging from compact finite difference schemes to the discontinuous Galerkin method (DGM) for unstructured grids. For problems in complex geometries, significant hydrodynamic-acoustic interactions, coupling acoustic waves and vortical modes, may occur. For example in ducts with sudden changes of area where flow separation occurs in correspondence of sharp edges with a consequent generation of vorticity for viscous effects. To correctly model this coupling, the Navier-Stokes equations, linearized with respect to a representative mean flow, must be solved. The formulation based on Linearized Navier-Stokes (LNS) equations is suitable to deal with problems involving such hydrodynamic-acoustic interactions. The occurrence of geometrical complexities, such as sharp edges, where acoustic energy is transferred into the vortical modes for viscous effects, requires an highly accurate numerical scheme with non only reduced dispersive properties, to accurate model the wave propagation, but also providing a very low level of numerical dissipation on unstructured grids. The DGM is the most appropriate numerical scheme satisfying these requirements. The objective of the present work is to develop an efficient numerical solution of the LNS equations, based on a DGM on unstructured grids. To our knowledge, there is only one work dealing with the solution of the LNS for aeroacoustics where the equations are solved in the frequency domain. In this work we develop the method in the time domain. The non-dispersive and non-diffusive nature of acoustic waves propagating over long distances forces us to adopt highly accurate numerical methods. DGM is one of the most promising scheme due to its intrinsic stability and to its capability to treat unstructured grids. Both advantages make this method well suited for problems characterized by wave propagation phenomena in complex geometries. The main disadvantage of DGM is the high computational requirements because the discontinuous character of the method which adds extra nodes on the interfaces between cells respect to a standard continuous Galerkin Method (GM). Techniques of optimization of the DGM in the case of the Navier-Stokes equations, to reduce the computational effort, are currently object of intense research. At our knowledge, no similar effort is made in the context of the solution of the LNS equations. The LNS equations are derived and the DGM is presented. Preliminary results for the case of the scattering of plane waves traveling in a duct with a sudden area expansion and a comparison between LEE and LNS calculations of vortical modes, are presented. Renzo Arina

 56 Numerical simulation of the flow in the fuel injector in sharply inhomogeneous electric field [abstract]Abstract: The results of detailed numerical simulation of the flow in an injector including electrohydrodynamic interaction in sharply inhomogeneous electric field formed by electrode system closed to the “needle-plane” type are presented. The aim of the simulation is to estimate the charge rate flow at the fuel injector outlet. The results were obtained using the open-source package OpenFOAM in which the corresponding models of electrohydrodynamics were added. The parametric calculations were performed for axis-symmetric model using RANS k-omega SST turbulence model. Due to swirl device in fuel injector the flow is strongly swirling. To obtain parameters for axis-symmetric flow calculations the 3D simulation was performed for the simplified injector model including swirl device and without electrods. Alexander Smirnovsky, Vladimir Nagorny, Dmitriy Kolodyazhny, Alexander Tchernysheff 122 An algorithm for the numerical solution of the pseudo compressible Navier-Stokes equations based on the experimenting fields approach [abstract]Abstract: In this work, the experimenting fields approach is applied to the numerical solution of the Navier-Stokes equation for incompressible viscous flow. In this work, the solution is sought for both the pressure and velocity fields in the same time. Apparently, the correct velocity and pressure fields satisfy the governing equations and the boundary conditions. In this technique a set of predefined fields are introduced to the governing equations and the residues are calculated. The flow according to these fields will not satisfy the governing equations and the boundary conditions. However, the residues are used to construct the matrix of coefficients. Although, in this setup it seems trivial constructing the global matrix of coefficients, in other setups it can be quite involved. This technique separates the solver routine from the physics routines and therefore makes easy the coding and debugging procedures. We compare with few examples that demonstrate the capability of this technique. Amgad Salama, Shuyu Sun, Mohamed El Amin 462 Pore network modeling of drainage process in patterned porous media: a quasi-static study [abstract]Abstract: This work represents a preliminary investigation on the role of wettability conditions on the flow of a two-phase system in porous media. Since such eects have been lumped implicitly in relative permeability-saturation and capillary pressure-saturation relationships, it is quite challenging to isolate its eects explicitly in real porous media applications. However, within the framework of pore network models, it is easy to highlight the effects of wettability conditions on the transport of two-phase systems. We employ quasi-static investigation in which the system undergo slow movement based on slight increment of the imposed pressure. Several numerical experiments of the drainage process are conducted to displace a wetting fluid with a non-wetting one. In all these experiments the network is assigned dierent scenarios of various wettability patterns. The aim is to show that the drainage process is very much aected by the imposed pattern of wettability. The wettability conditions are imposed by assigning the value of contact angle to each pore throat according to predefined patterns. Tao Zhang, Amgad Salama, Shuyu Sun and Mohamed El Amin

 123 Numerical Treatment of Two-Phase Flow in Porous Media Including Specific Interfacial Area [abstract]Abstract: In this work, we present a numerical treatment of the model of two-phase flow in porous media including specific interfacial area. For numerical discretization we use the cell-centered finite difference (CCFD) method based on the shifting-matrices method which could reduce the time-consuming operations. A new iterative implicit algorithm has been developed to solve the problem under consideration. All advection and advection-like terms that appear in saturation equation and interfacial area equation are treated using upwind schemes together with the CCFD and shifting-matrices techniques. Selected simulation results such as $p_c-S_w-a_{wn}$ surface have been introduced. The simulation results have a good agreement with those in the literature using either pore network modeling or Darcy scale modeling. Mohamed El-Amin, Redouane Meftah, Amgad Salama, Shuyu Sun 210 Chaotic states and order in the chaos of the paths of freely falling and ascending spheres [abstract]Abstract: The research extends and improves the parametric study of "Instabilities and transition of a sphere falling or ascending freely in a Newtonian fluid" of Jenny et al. (2004) with special focus on the onset of chaos and on chaotic states. The results show that the effect of density ratio responsible for two qualitatively different oblique oscillating states has a significant impact both on the onset of chaos and on the behavior of fully chaotic states. The observed difference between dense and light spheres is associated to the strength of coupling between fluid and solid degrees of freedom. While the low frequency mode of oblique oscillating state presents specific features due to a strong solid - fluid coupling, the dynamics of the high frequency mode is shown to be driven by the same vortex shedding as the wake of a fixed sphere. The different fluid-solid coupling also determines two different ways how chaos sets in. Two outstanding ordered regimes are evidenced and investigated in the chaotic domain. One of them, characteristic for its helical trajectories, might provide a link to the experimentally evidenced, but so far numerically unexplained, vibrating regime of ascension of light spheres. For fully chaotic states, it is shown that statistical averaging converges in a satisfactory manner. Several statistical characteristics are suggested and evaluated. Wei Zhou and Jan Dušek 288 Switching Between the NVT and NpT Ensembles Using the Reweighting and Reconstruction Scheme [abstract]Abstract: Recently, we have developed several techniques in order to accelerate Monte Carlo (MC) molecular simulations. For that purpose, two strategies were followed. In the first, new algorithms were proposed as a set of early rejection schemes performing faster than the conventional algorithm while preserving the accuracy of the method. On the other hand, a reweighting and reconstruction scheme was introduced that is capable of retrieving primary quantities and second derivative properties at several thermodynamic conditions from a single MC Markov chain. The latter scheme, was first developed to extrapolate quantities in NVT ensemble for structureless Lennard-Jones particles. However, it is evident that for most real life applications the NpT ensemble is more convenient, as pressure and temperature are usually known. Therefore, in this paper we present an extension to the reweighting and reconstruction method to solve NpT problems utilizing the same Markov chains generated by the NVT ensemble simulations. Eventually, the new approach allows elegant switching between the two ensembles for several quantities at a wide range of neighboring thermodynamic conditions. Ahmad Kadoura, Amgad Salama, Shuyu Sun 185 Coupled modelling of a shallow water flow and pollutant transport using depth averaged turbulent model. [abstract]Abstract: The paper presents a mathematical model of a turbulent river flow based on unsteady shallow water equations and depth averaged turbulence model. The numerical model is based on upwind finite volume method on structured staggered grid. In order to get a stable numerical solution simple-based algorithm was used. Among well-developed models of the river flow proposed approach stands out with its computational efficiency and high quality in describing processes in a river stream. For the main cases of pollution transport in river flows it is essential to know whether the model is appropriate to predict turbulent characteristics of the flow in the open channel. Two computational cases have been carried out to investigating and to applying established model. The first case shows the impact of confluents into generation of turbulence in the river flow and shows that recirculation flows effects on the process of pollutant dispersion in water basins. Driven cavity test case have been carried out to investigate the accuracy of the established method and its applicability to the streams with a complex structure. Alexander V. Starchenko and Vladislava V. Churuksaeva

 602 Cube v.4 : From Performance Report Explorer to Performance Analysis Tool [abstract]Abstract: Cube v.3 has been a powerful tool to examine Scalasca performance reports, but was basically unable to perform analyses on its own. With Cube v.4, we addressed several shortcomings of Cube v.3. We generalized the Cube data model, extended the list of supported data types, and allow operations with nontrivial algebras, e.g. for performance models or statistical data. Additionally, we introduced two major new features that greatly enhance the performance analysis features of Cube: Derived metrics and GUI plugins. Derived metrics can be used to create and manipulate metrics directly within the GUI, using a powerful domain-specific language called CubePL. Cube GUI plugins allow the development of novel performance analysis techniques based on Cube data without changing the source code of the Cube GUI. Michael Knobloch, Bernd Mohr, Anke Visser, Pavel Saviankou 51 Visual MPI Performance Analysis using Event Flow Graphs [abstract]Abstract: Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while preserving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application. In this paper we explore the usage of event flow graphs in the context of visual performance analysis. We show that graphs can be used to quickly spot performance problems, helping to better understand the behavior of an application. We demonstrate our performance analysis approach with MiniFE, a mini-application that mimics the key performance aspects of finite-element applications in High Performance Computing (HPC). Xavier Aguilar, Karl Fürlinger, Erwin Laure 75 Glprof: A Gprof inspired, Callgraph-oriented Per-Object Disseminating Memory Access Multi-Cache Profiler [abstract]Abstract: Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understanding, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage. Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec benchmarks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos. Tomislav Janjusic, Christos Kartsaklis 326 Graphical high level analysis of communication in distributed virtual reality applications [abstract]Abstract: Analysing distributed virtual reality applications communicating through message-passing is challenging. Their development is complex, and knowing if something is wrong depends on the states of each process, defects (bugs) cause software crashes, hangs, and generation of incorrect results. To address this daunting problem we specify functional behavior models (for example, using synchronization barriers and shared variables) for these applications that ensures correctness. We also developed the GTracer tool, which compares the functional behavior models developed with the messages transmitted among processes. GTracer checks for violations of these models automatically and displays the message traffic graphically. It is a tool made for libGlass, a message library for distributed computing. We have been able to find several non-trivial defects during the tests of this tool. Marcelo Guimarães, Bruno Gnecco, Diego Dias, José Brega, Luis Trevelin

 368 Providing Parallel Debugging for DASH Distributed Data Structures with GDB [abstract]Abstract: The C++ DASH template library provides distributed data container for Partitioned Global Address Space (PGAS)-like programming. Because DASH is new and under development no debugger is capable to handle the parallel processes or access/modify container elements in a convenient way. This paper describes how the DASH library has to be extended to interrupt the start-up process to connect a debugger with all started processes and to enable the debugger for accessing and modifying DASH container elements. Furthermore, an GDB extension to output well formatted DASH container information is presented. Denis Hünich, Andreas Knüpfer, José Gracia 156 Sequential Performance: Raising Awareness of the Gory Details [abstract]Abstract: The advent of multicore and manycore processors, including GPUs, in the customer market encouraged developers to focus on extraction of parallelism. While it is true that parallelism can deliver performance boosts, parallelization is also very complex and error-prone task. Many applications are still sequential, or dominated by sequential sections. Modern micro-architectures have become extremely complex, and they usually do a very good job at executing fast a given sequence of instructions. When they occasionally fail, however, the penalty may be severe. Pathological behaviors often have their roots in very low-level implementation details of the micro-architecture, hardly available to the programmer. We argue that the impact of these low-level features on performance has been overlooked, often relegated to experts. We show that a few metrics can be easily defined to help assess the overall performance of an applications, and quickly diagnose a problem. Finally we illustrate our claim with a simple prototype, along with several use cases. Erven Rohou, David Guyon 544 Evolving Fortran types with inferred units-of-measure [abstract]Abstract: Dimensional analysis is a well known technique for checking the consistency of equations involving physical quantities, constituting a kind of type system. Various type systems for dimensional analysis, and its refinement to units-of-measure, have been proposed. In this paper, we detail the design and implementation of a units-of-measure system for Fortran, implemented as a pre-processor. Our system is designed to aid adding units to existing code base: units may be polymorphic and can be inferred. Furthermore, we introduce a technique for reporting to the user a set of critical variables}which should be explicitly annotated with units to get the maximum amount of unit information with the minimal number of explicit declarations. This aids adoption of our type system to existing code bases, of which there are many in computational science projects. Dominic Orchard, Andrew Rice and Oleg Oshmyan

 353 Nonparallel hyperplanes support vector machine for multi-class classification [abstract]Abstract: In this paper, we proposed a nonparallel hyperplanes classier for multi-class classication, termed as NHCMC. This method inherits the idea of multiple birth support vector machine(MBSVM), that is the "max" decision criterion instead of the "min" one, but it has the incomparable advantages than MBSVM. First, the optimization problems in NHCMC can be solved eciently by sequential minimization optimization (SMO) without needing to compute the large inverses matrices before training as SVMs usually do; Second, kernel trick can be applied directly to NHCMC, which is superior to existing MBSVM. Experimental results on lots of data sets show the eciency of our method in multi-class classication accuracy. Xuchan Ju, Yingjie Tian, Dalian Liu, Zhiquan Qi 415 Multilevel dimension reduction Monte-Carlo simulation for high-dimensional stochastic models in finance [abstract]Abstract: One-way coupling often occurs in multi-dimensional stochastic models in finance. In this paper, we develop a highly efficient Monte Carlo (MC) method for pricing European options under a N-dimensional one-way coupled model, where N is arbitrary. The method is based on a combination of (i) the powerful dimension and variance reduction technique, referred to as drMC, developed in Dang et. al (2014), that exploits this structure, and (ii) the highly effiective multilevel MC (mlMC) approach developed by Giles (2008). By first applying Step (i), the dimension of the problem is reduced from N to 1, and as a result, Step (ii) is essentially an application of mlMC on a 1-dimensional problem. Numerical results show that, through a careful construction of the ml-dr estimator, improved efficiency expected from the Milstein timestepping with first order strong convergence can be achieved. Moreover, our numerical results show that the proposed ml-drMC method is significantly more efficient than the mlMC methods currently available for multi-dimensional stochastic problems. Duy-Minh Dang, Qifan Xu, Shangzhe Wu 671 Computational Visual Analysis of the Order Book Dynamics for Creating High-Frequency Foreign Exchange Trading Strategies. [abstract]Abstract: This paper presents a Hierarchical Hidden Markov Model used to capture the USD/COP market sentiment dynamics choosing from uptrend or downtrend latent regimes based on observed feature vector realizations calculated from transaction prices and wavelet-transformed order book volume dynamics. The HHMM learned a natural switching buy/uptrend sell/downtrend trading strategy using a training-validation framework over one month of market data. The model was tested on the following two months, and its performance was reported and compared to results obtained from randomly classified market states and a feed-forward Neural Network. This paper also separately assessed the contribution to the model’s performance of the order book information and the wavelet transformation. Javier Sandoval, German Hernandez 636 Influence of the External Environment Behaviour on the Banking System Stability [abstract]Abstract: There are plenty of researches dedicated to financial system stability, which takes significant place in prevention of financial crisis and its consequences. However banking system and external environment interaction and customers behaviour influence on the banking system stability are poorly studied. Current paper propose agent-based model of banking system and its external environment. We show how customers behaviour characteristics affect a banking system stability. Optimal interval for total environmental funds towards banking system wealthy is performed. Valentina Y. Guleva, Alexey Dukhanov

 117 Developing a Hands-On Course Around Building and Testing High Performance Computing Clusters [abstract]Abstract: We describe a successful approach to designing and implementing a High Performance Computing (HPC) class focused on creating competency in building, configuring, programming, troubleshooting, and benchmarking HPC clusters. By coordinating with campus services, we were able to avoid any additional costs to the students or the university. Students built three twelve-unit independently-operating clusters. Working groups were formed for each cluster and they installed the operating system, created users, connected to the campus network and wrote a variety of scripts and parallel programs while documenting the process. We describe how we solved unexpected problems encountered along the way. We illustrate through pre- and post-course surveys that students gained substantial knowledge in fundamental aspects of HPC through the hands-on approach of creating their own clusters. Karl Frinkle, Mike Morris 269 Interactively Exploring the Connection between Bidirectional Compression and Star Bicoloring [abstract]Abstract: The connection between scientific computing and graph theory is detailed for a particular problem called bidirectional compression. This scientific computing problem consists of finding a pair of seed matrices in automatic differentiation. In terms of graph theory, the problem is nothing but finding a star bicoloring of a suitably defined graph. An interactive educational module is designed and implemented to illustrate the connection between bidirectional com- pression and star bicoloring. The web-based module is intended to be used in classroom to illustrate the intricate nature of this combinatorial problem. M. Ali Rostami, Martin Buecker 651 Scientific Workflows with XMDD: A Way to Use Process Modeling in Computational Science Education [abstract]Abstract: Process models are well suited to describe in a formal but still intuitive fashion what a system should do. They can thus play a central role in problem-based computational science education with regard to qualifying students for the design and implementation of software applications for their specific needs without putting the focus on the technical part of coding. eXtreme Model Driven Design (XMDD) is a software development paradigm that explicitly focuses on the What (solving problems) rather than on the How (the technical skills of writing code). In this paper we describe how we apply an XMDD-based process modeling and execution framework for scientific workflow projects in the scope of a computer science course for students with a background in natural sciences. Anna-Lena Lamprecht, Tiziana Margaria 152 Teaching Science Using Computationally-Based Investigations [abstract]Abstract: Wofford College has initiated a computational laboratory course, Scientific Investigations Using Computation, which satisfies one of its Bachelor of Science requirements. In the course, which one professor teaches, students explore important concepts in science and, using computational tools, implement the scientific method to gain a better understanding of the natural world. Before the first class for a topic, which usually takes one week, students read a module by the authors of this abstract. Some of the topics are the carbon cycle, global warming, disease, adaptation and mimicry, fur patterns, membranes, gas laws, chemical kinetics, and enzyme kinetics. Each module includes a discussion of the topic, quick review questions, points of inquiry for further investigation, and references. In class, students take an online quiz from the quick review questions and complete an enriching activity related to the topic. Typically, in pairs or larger groups, students are assigned points of inquiry to investigate, develop, and present for subsequent periods in the week. A topic culminates in a three-hour laboratory, where students perform experiments at computers using the agent-based modeling tool NetLogo and the spreadsheet Excel. NetLogo, which is free to download, includes numerous computational models that have levels for Interface to run the simulation and view the results, Information about the model, and Code, which the user can view and change. Laboratory guidelines by the authors lead the students through the material in a step-by-step fashion. As well as conducting experiments computationally, the students modify the code to refine the models. Thus, the class examines scientific topics using the scientific method and various resources, gains an appreciation of the utility of computational simulations, and starts to learn to program and to think algorithmically. Angela Shiflet and George Shiflet 158 DNA and普通話(Mandarin): Bringing introductory programming to the Life Sciences and Digital Humanities [abstract]Abstract: The ability to write software (to script, to program, to code) is a vital skill for students and their future data-centric, multidisciplinary careers. We present a ten-year effort to teach introductory programming skills in domain-focused courses to students across divisions in our liberal arts college. By creatively working with colleagues in Biology, Statistics, and now English, we have designed, modified, and offered six iterations of two courses: “DNA” and “Computing for Poets”. Larger percentages of women have consistently enrolled in these two courses vs. the traditional first course in the major. We share our open source course materials and present here our use of a blended learning classroom that leverages the increasing quality of online video lectures and programming practice sites in an attempt to maximize faculty-student interactions in class. Mark Leblanc, Michael Drout

 755 Overview and Introduction [abstract]Abstract: TBD Derek Ruths 751 Jeff's Invited Talk [abstract]Abstract: TBD Jeff Shamma

 749 Sinan's Invited Talk [abstract]Abstract: TBD Sinan Aral 752 Bruce's Invited Talk [abstract]Abstract: TBD Bruce Desmarais

 748 A Role for Network Science in Social Norms Intervention [abstract]Abstract: Social norms theory has provided a foundation for public health interventions on critical issues such as alcohol and substance use, sexual violence, and risky sexual behavior. We assert that modern social norms interventions can be better informed with the use of network science methods. Social norms can be seen as a complex contagion on a social network, and the propagation of social norms as an information diffusion process. We observe instances where the recommendations of social norms theory match up to theoretical predictions from information diffusion models, but also places where the network science viewpoint highlights aspects of intervention design not addressed by the existing theory. Information about network structure and dynamics are often not used in existing social norms interventions; we argue that these factors may be contributing to the lack of efﬁcacy of social norms interventions delivered via online social networks. Network models of intervention also offer opportunities for better evaluation and comparison across application domains. Clayton Davis, Julia Heiman, Filippo Menczer 750 Ali's Invited Talk [abstract]Abstract: TBD Ali Jadbabaie 756 Closing and Wrap-up [abstract]Abstract: TBD Justin Ruths

 261 Surrogate-Based Airfoil Design with Space Mapping and Adjoint Sensitivity [abstract]Abstract: This paper presents a space mapping algorithm for airfoil shape optimization enhanced with adjoint sensitivities. The surrogate-based algorithm utilizes low-cost derivative information obtained through adjoint sensitivities to improve the space mapping matching between a high-fidelity airfoil model, evaluated through expensive CFD simulations, and its fast surrogate. Here, the airfoil surrogate model is constructed though low-fidelity CFD simulations. As a result, the design process can be performed at a low computational cost in terms of the number of high-fidelity CFD simulations. The adjoint sensitivities are also exploited to speed up the surrogate optimization process. Our method is applied to a constrained drag minimization problem in two-dimensional inviscid transonic flow. The problem is solved for several low-fidelity model termination criteria. The results show that when compared with direct gradient-based optimization with adjoint sensitivities, the proposed approach requires 49-78% less computational cost while still obtaining a comparable airfoil design. Yonatan Tesfahunegn, Slawomir Koziel, Leifur Leifsson, Adrian Bekasiewicz 317 How to Speed up Optimization? Opposite-Center Learning and Its Application to Differential Evolution [abstract]Abstract: This paper introduces a new sampling technique called Opposite-Center Learning (OCL) intended for convergence speedup of meta-heuristic optimization algorithms. It comprises an extension of Opposition-Based Learning (OBL), a simple scheme that manages to boost numerous optimization methods by considering the opposite points of candidate solutions. In contrast to OBL, OCL has a theoretical foundation – the opposite center point is defined as the optimal choice in pair-wise sampling of the search space given a random starting point. A concise analytical background is provided. Computationally the opposite center point is approximated by a lightweight Monte Carlo scheme for arbitrary dimension. Empirical results up to dimension 20 confirm that OCL outperforms OBL and random sampling: the points generated by OCL have shorter expected distances to a uniformly distributed global optimum. To further test its practical performance, OCL is applied to differential evolution (DE). This novel scheme for continuous optimization named Opposite-Center DE (OCDE) employs OCL for population initialization and generation jumping. Numerical experiments on a set of benchmark functions for dimensions 10 and 30 reveal that OCDE on average improves the convergence rates by 38% and 27% compared to the original DE and the Opposition-based DE (ODE), respectively, while remaining fully robust. Most promising are the observations that the accelerations shown by OCDE and OCL increase with problem dimensionality. H. Xu, C.D. Erdbrink, V.V. Krzhizhanovskaya 281 Visualizing and Improving the Robustness of Phase Retrieval Algorithms [abstract]Abstract: Coherent x-ray diffractive imaging is a novel imaging technique that utilizes phase retrieval and nonlinear optimization methods to image matter at nanometer scales. We explore how the convergence properties of a popular phase retrieval algorithm, Fienup’s HIO, behave by introducing a reduced dimensionality problem allowing us to visualize convergence to local minima and the globally optimal solution. We then introduce generalizations of HIO that improve upon the original algorithm’s ability to converge to the globally optimal solution. Ashish Tripathi, Sven Leyffer, Todd Munson, Stefan Wild 257 Fast Optimization of Integrated Photonic Components Using Response Correction and Local Approximation Surrogates [abstract]Abstract: A methodology for a rapid design optimization of integrated photonic couplers is presented. The proposed technique exploits variable-fidelity electromagnetic (EM) simulation models, additive response correction for accommodating the discrepancies between the EM models of various fidelities, and local response surface approximations for a fine tuning of the final design. A specific example of a 1,555 nm coupler is considered with an optimum design obtained at a computational cost corresponding to about 24 high-fidelity EM simulations of the structure. Adrian Bekasiewicz, Slawomir Koziel, Leifur Leifsson 197 Model Selection for Discriminative Restricted Boltzmann Machines Through Meta-heuristic Techniques [abstract]Abstract: Discriminative learning of Restricted Boltzmann Machines has been recently introduced as an alternative to provide a self-contained approach for both unsupervised feature learning and classification purposes. However, one of the main problems faced by researchers interested in such approach concerns with a proper selection of its parameters, which play an important role in its final performance. In this paper, we introduced some meta-heuristic techniques for this purpose, as well as we showed they can be more accurate than a random search, that is commonly used by some works. Joao Paulo Papa, Gustavo Rosa, Aparecido Marana, Walter Scheirer and David Cox

 619 A Cooperative Coevolutionary Differential Evolution Algorithm with Adaptive Subcomponents [abstract]Abstract: The performance of cooperative coevolutionary algorithms for large-scale continuous optimization is significantly affected by the adopted decomposition of the search space. According to the literature, a typical decomposition in case of fully separable problems consists of adopting equally sized subcomponents for the whole optimization process (i.e. static decomposition). Such an approach is also often used for fully non-separable problems, together with a random-grouping strategy. More advanced methods try to determine the optimal size of subcomponents during the optimization process using reinforcement-learning techniques. However, the latter approaches are not always suitable in this case because of the non-stationary and history-dependent nature of the learning environment. This paper investigates a new Cooperative Coevolutionary algorithm, based on Differential Evolution, in which several decompositions are applied in parallel during short learning phases. The experimental results on a set of large-scale optimization problems show that the proposed method can lead to a reliable estimate of the suitability of each subcomponent size. Moreover, in some cases it outperforms the best static decomposition. Giuseppe A. Trunfio 105 Multi-Level Job Flow Cyclic Scheduling in Grid Virtual Organizations [abstract]Abstract: Distributed environments with the decoupling of users from resource providers are generally termed as utility Grids. The paper focuses on the problems of efficient job flow distribution and scheduling in virtual organizations (VOs) of utility Grids while ensuring the VO stakeholders preferences and providing dependable strategies for resources utilization. An approach based on the combination of the cyclic scheduling scheme, backfilling and several heuristic procedures is proposed and studied. Comparative simulation results are introduced for different algorithms and heuristics depending on the resource domain composition and heterogeneity. Considered scheduling approaches provide different benefits depending on the VO scheduling objectives.The results justify the use of the proposed approaches in a broad range of the considered resource environment parameters. Victor Toporkov, Anna Toporkova, Alexey Tselishchev, Dmitry Yemelyanov, Petr Potekhin 346 The Stochastic Simplex Bisection Algorithm [abstract]Abstract: We propose the stochastic simplex bisection algorithm. It randomly selects one from a set of simplexes, bisects it, and replaces it with its two offspring. The selection probability is proportional to a score indicating how promising the simplex is to bisect. We generalize intervals to simplexes, rather than to hyperboxes, as bisection then only requires evaluating the function in one new point, which is somewhat randomized. Using a set of simplexes that partition the search space yields completeness and avoids redundancy. We provide an appropriate scale- and offset-invariant score definition and add an outer loop for handling hyperboxes. Experiments show that the algorithm is capable of exploring vast numbers of local optima, over huge ranges, yet finding the global one. The ease with which it handles quadratic functions makes it ideal for non-linear regression: it is here successfully applied to logistic regression. The algorithm does well, also when the number of function evaluations is severely restricted. Christer Samuelsson 243 Local Tuning in Nested Scheme of Global Optimization [abstract]Abstract: Numerical methods for global optimization of the multidimensional multiextremal functions in the framework of the approach oriented at dimensionality reduction by means of the nested optimization scheme are considered. This scheme reduces initial multidimensional problem to a set of univariate subproblems connected recursively. That enables to apply efficient univariate algorithms for solving the multidimensional problems. The nested optimization scheme served as the source of many methods for optimization of Lipschitzian function. However, in all of them there is the problem of estimating the Lipschitz constant as the parameter of the function optimized and, as a consequence, of tuning to it the optimization method. In the methods proposed earlier, as a rule, a global estimate (related to whole search domain) is used whereas local Lipschitz constants in some subdomains can differ significantly from the global constant. It can slow down the optimization process considerably. To overcome this drawback in the article the finer estimates of a priori unknown Lipschitz constants taking into account local properties of the objective function are considered and used in the nested optimization scheme. The results of numerical experiments presented demonstrate the advantages of methods with mixed (local and global) estimates of Lipschitz constants in comparison with the use the global ones only. Victor Gergel, Vladimir Grishagin, Ruslan Israfilov 226 Variations of Ant Colony Optimization for the solution of the structural damage identification problem [abstract]Abstract: In this work the inverse problem of identification of structural stiffness coefficients of a damped spring-mass system is tackled. The problem is solved by using different versions of Ant Colony Optimization (ACO) metaheuristic solely or coupled with the Hooke-Jeeves (HJ) local search algorithm. The evaluated versions of ACO are based on a discretization procedure to deal with the continuous domain design variables together with different pheromone evaporation and deposit strategies and also on the frequency of calling the local search algorithm. The damage estimation is evaluated using noiseless and noisy synthetic experimental data assuming a damage configuration throughout the structure. The reported results show the hybrid method as the best choice when both rank-based pheromone deposit and a new heuristic information based on the search history are used. Carlos Eduardo Braun, Leonardo D. Chiwiacowsky, Arthur T. Gómez

 256 Multi-Objective Design Optimization of Planar Yagi-Uda Antenna Using Physics-Based Surrogates and Rotational Design Space Reduction [abstract]Abstract: A procedure for low-cost multi-objective design optimization of antenna structures is discussed. The major stages of the optimization process include: (i) an initial reduction of the search space aimed at identifying its relevant subset containing the Pareto-optimal design space, (ii) construction—using sampled coarse-discretization electromagnetic (EM) simulation data—of the response surface approximation surrogate, (iii) surrogate optimization using a multi-objective evolutionary algorithm, and (iv) the Pareto front refinement. Our optimization procedure is demonstrated through the design of a planar quasi Yagi-Uda antenna. The final set of designs representing the best available trade-offs between conflicting objectives is obtained at a computational cost corresponding to about 172 evaluations of the high-fidelity EM antenna model. Slawomir Koziel, Adrian Bekasiewicz, Leifur Leifsson 644 Agent-Based Simulation for Creating Robust Plans and Schedules [abstract]Abstract: The paper describes methods for constructing the robust schedules using agent-based simulation. The measure of robustness represents the resistance of the schedule to random phenomena and we present the method for calculating robustness of the schedule. The procedure for creating the robust schedule combines standard solutions for planning and scheduling with computer simulation. It is described in detail and allows creation an executable robust schedule. Three different procedures for increasing the robustness (by changing the order of allocation of resources, by changing a plan and increasing time reserves) are short explained. The presented techniques were tested using real detailed simulation model of an existing container terminal. Peter Jankovič 413 Shape Optimization of Trawl-Doors Using Variable-Fidelity Models and Space Mapping [abstract]Abstract: Trawl-doors have a large influence on the fuel consumption of fishing vessels. Design and optimization of trawl-doors using computational models are key factors in minimizing the fuel consumption. This paper presents an efficient optimization algorithm for the design of trawl-door shapes using computational fluid dynamic models. The approach is iterative and uses variable-fidelity models and space mapping. The algorithm is applied to the design of a multi-element trawl-door, involving four design variables controlling the angle of attack and the slat position and orientation. The results demonstrate that a satisfactory design can be obtained at a cost of a few iterations of the algorithm. Compared with direct optimization of the high-fidelity model and local response surface surrogate models, the proposed approach requires 79% less computational time while, at the same time, improving the design significantly (over 12% increase in the lift-to-drag ratio). Ingi Jonsson, Leifur Leifsson, Slawomir Koziel, Yonatan Tesfahunegn, Adrian Bekasiewicz 347 Optimised robust treatment plans for prostate cancer focal brachytherapy [abstract]Abstract: Focal brachytherapy is a clinical procedure that can be used to treat low-risk prostate cancer with reduced side-effects compared to conventional brachytherapy. Current practice is to manually plan the placement of radioactive seeds inside the prostate to achieve a desired treatment dose. Problems with the current practice are that the manual planning is time-consuming and high doses to the urethra and rectum cause undesirable side-effects. To address this problem, we have designed an optimisation algorithm that constructs treatment plans which achieve the desired dose while minimizing dose to organs at risk. We also show that these seed plans are robust to post-operative movement of the seeds within the prostate. John Betts, Chris Mears, Hayley Reynolds, Guido Tack, Kevin Leo, Martin Ebert, Annette Haworth 514 Identification of Multi-inclusion Statistically Similar Representative Volume Element for Advanced High Strength Steels by Using Data Farming Approach [abstract]Abstract: Statistically Similar Representative Volume Element (SSRVE) is used to simplify computational domain for microstructure representation of material in multiscale modelling. The procedure of SSRVE creation is based on optimization loop which allows to find the highest similarity between SSRVE and an original material microstructure. The objective function in this optimization is built upon computationally intensive numerical methods, including simulations of virtual material deformation, which is very time consuming. To avoid such long lasting calculations we propose to use the data farming approach to identification of SSRVE for Advanced High Strength Steels (AHSS) characterized by multiphase microstructure. The optimization method is based on a nature inspired approach which facilitates distribution and parallelization. The concept of SSRVE creation as well as the software architecture of the proposed solution is described in the paper in details. It is followed by examples of the results obtained for the identification of SSRVE parameters for DP steels which are widely exploited in modern automotive industry. Possible directions for further development and uses are described in the conclusions. Lukasz Rauch, Danuta Szeliga, Daniel Bachniak, Krzysztof Bzowski, Renata Słota, Maciej Pietrzyk, Jacek Kitowski

 411 Point Distribution Tensor Computation on Heterogeneous Systems [abstract]Abstract: Big data in observational and computational sciences impose increasing challenges on data analysis. In particular, data from light detection and ranging (LIDAR) measurements are questioning conventional methods of CPU-based algorithms due to their sheer size and complexity as needed for decent accuracy. These data describing terrains are natively given as big point clouds consisting of millions of independent coordinate locations from which meaningful geometrical information content needs to be extracted. The method of computing the point distribution tensor is a very promising approach, yielding good results to classify domains in a point cloud according to local neighborhood information. However, an existing KD-Tree parallel approach, provided by the VISH visualization framework, may very well take several days to deliver meaningful results on a real-world dataset. Here we present an optimized version based on uniform grids implemented in OpenCL that is able to deliver results of equal accuracy up to 24 times faster on the same hardware. The OpenCL version is also able to benefit from a heterogeneous environment and we analyzed and compared the performance on various CPU, GPU and accelerator hardware platforms. Finally, aware of the heterogeneous computing trend, we propose two low-complexity dynamic heuristics for the scheduling of independent dataset fragments in multi-device heterogenous systems. Ivan Grasso, Marcel Ritter, Biagio Cosenza, Werner Benger, Günter Hofstetter, Thomas Fahringer 465 Toward a multi-level parallel framework on GPU cluster with PetSC-CUDA for PDE-based Optical Flow computation [abstract]Abstract: In this work we present a multi-level parallel framework for the Optical Flow computation on a GPUs cluster, equipped with a scientific computing middleware (the PetSc library). Starting from a flow-driven isotropic method, which models the optical flow problem through a parabolic partial differential equation (PDE), we have designed a parallel algorithm and its software implementation that is suitable for heterogeneous computing environments (multiprocessor, single GPU and cluster of GPUs). The proposed software has been tested on real SAR images sequences. Experiments highlight the performances obtained and a gain of about 95% with respect to the sequential implementation. Salvatore Cuomo, Ardelio Galletti, Giulio Giunta, Livia Marcellino 472 Performance Analysis and Optimisation of Two-Sided Factorization Algorithms for Heterogeneous Platform [abstract]Abstract: Many applications, ranging from big data analytics to nanostructure designs, require the solution of large dense singular value decomposition (SVD) or eigenvalue problems. A first step in the solution methodology for these problems is the reduction of the matrix at hand to condensed form by two-sided orthogonal transformations. This step is standardly used to significantly accelerate the solution process. We present a performance analysis of the main two-sided factorizations used in these reductions: the bidiagonalization, tridiagonalization, and the upper Hessenberg factorizations on heterogeneous systems of multicore CPUs and Xeon Phi coprocessors. We derive a performance model and use it to guide the analysis and to evaluate performance. We develop optimized implementations for these methods that get up to $80\%$ of the optimal performance bounds. Finally, we describe the heterogeneous multicore and coprocessor development considerations and the techniques that enable us to achieve these high-performance results. The work here presents the first highly optimized implementation of these main factorizations for Xeon Phi coprocessors. Compared to the LAPACK versions optmized by Intel for Xeon Phi (in MKL), we achieve up to $50\%$ speedup. Khairul Kabir, Azzam Haidar, Stanimire Tomov, Jack Dongarra 483 High-Speed Exhaustive 3-locus Interaction Epistasis Analysis on FPGAs [abstract]Abstract: Epistasis, the interaction between genes, has become a major topic in molecular and quantitative genetics. It is believed that these interactions play a significant role in genetic variations causing complex diseases. Several algorithms have been employed to detect pairwise interactions in genome-wide association studies (GWAS) but revealing higher order interactions remains a computationally challenging task. State of the art tools are not able to perform exhaustive search for all three-locus interactions in reasonable time even for relatively small input datasets. In this paper we present how a hardware-assisted design can solve this problem and provide fast, efficient and exhaustive third-order epistasis analysis with up-to-date FPGA technology. Jan Christian Kässens, Lars Wienbrandt, Jorge González-Domínguez, Bertil Schmidt and Manfred Schimmler 487 Evaluating the Potential of Low Power Systems for Headphone-based Spatial Audio Applications [abstract]Abstract: Embedded architectures have been traditionally designed tailored to perform a dedicated (specialized) function, and in general feature a limited amount of processing resources as well as exhibit very low power consumption. In this line, the recent introduction of systems-on-chip (SoC) composed of low power multicore processors, combined with a small graphics accelerator (or GPU), presents a notable increment of the computational capacity while partially retaining the appealing low power consumption of embedded systems. This paper analyzes the potential of these new hardware systems to accelerate applications that integrate spatial information into an immersive audiovisual virtual environment or into video games. Concretely, our work discusses the implementation and performance evaluation of a headphone-based spatial audio application on the Jetson TK1 development kit, a board equipped with a SoC comprising a quad-core ARM processor and an NVIDIA "Kepler" GPU. Our implementations exploit the hardware parallelism of both types of architectures by carefully adapting the underlying numerical computations. The experimental results show that the accelerated application is able to move up to 300 sound sources simultaneously in real time on this platform. Jose A. Belloch, Alberto Gonzalez, Rafael Mayo, Antonio M. Vidal, Enrique S. Quintana-Orti

 488 Real-Time Sound Source Localization on an Embedded GPU Using a Spherical Microphone Array [abstract]Abstract: Spherical microphone arrays are becoming increasingly important in acoustic signal processing systems for their applications in sound field analysis, beamforming, spatial audio, etc. The positioning of target and interfering sound sources is a crucial step in many of the above applications. Therefore, 3D sound source localization is a highly relevant topic in the acoustic signal processing field. However, spherical microphone arrays are usually composed of many microphones and running signal processing localization methods in real time is an important issue. Some works have already shown the potential of Graphic Processing Units (GPUs) for developing high-end real-time signal processing systems. New embedded systems with integrated GPU accelerators providing low power consumption are becoming increasingly relevant. These novel systems play a very important role in the new era of smartphones and tablets, opening further possibilities to the design of high-performance compact processing systems. This paper presents a 3D source localization system using a spherical microphone array fully implemented on an embedded GPU. The real-time capabilities of these platforms are analyzed, providing also a performance analysis of the localization system under different acoustic conditions. Jose A. Belloch, Maximo Cobos, Alberto Gonzalez, Enrique S. Quintana-Orti 81 The Scaled Boundary Finite Element Method for the Analysis of 3D Crack Interaction [abstract]Abstract: The Scaled Boundary Finite Element Method (SBFEM) can be applied to solve linear elliptic boundary value problems when a so-called scaling center can be defined such that every point on the boundary is \textit{visible} from it. From a more practical point of view, this means that in linear elasticity, a separation of variables ansatz can be used for the displacements in a scaled boundary coordinate system. This approach allows an analytical treatment of the problem in the scaling direction. Only the boundary needs to be discretized with Finite Elements. Employment of the separation of variables ansatz in the virtual work balance yields a Cauchy-Euler differential equation system of second order which can be transformed into an eigenvalue problem and solved by standard eigenvalue solvers for nonsymmetric matrices. A further obtained linear equation system serves for enforcing the boundary conditions. If the scaling center is located directly at a singular point, elliptic boundary value problems containing singularities can be solved with high accuracy and computational efficiency. The application of the SBFEM to the linear elasticity problem of two meeting inter-fiber cracks in a composite laminate exposed to a simple homogeneous temperature decrease reveals the presence of hypersingular stresses. Sascha Hell and Wilfried Becker 85 Algorithmic Differentiation of Numerical Methods: Second-Order Tangent Solvers for Systems of Parametrized Nonlinear Equations [abstract]Abstract: Forward mode algorithmic differentiation transforms implementations of multivariate vector functions as computer programs into first directional derivative (also: first-order tangent) code. Its reapplication yields higher directional derivative (higher-order tangent) code. Second derivatives play an important role in nonlinear programming. For example, second-order (Newtontype) nonlinear optimization methods promise faster convergence in the neighborhood of the minimum through taking into account second derivative information. Part of the objective function may be given implicitly as the solution of a system of n parameterized nonlinear equations. If the system parameters depend on the free variables of the objective, then second derivatives of the nonlinear system’s solution with respect to those parameters are required. The local computational overhead for the computation of second-order tangents of the solution vector with respect to the parameters by Algorithmic Differentiation depends on the number of iterations performed by the nonlinear solver. This dependence can be eliminated by taking a second-order symbolic approach to differentiation of the nonlinear system. Niloofar Safiran, Johannes Lotz, Uwe Naumann

 469 Expressively Modeling the Social Golfer Problem in SAT [abstract]Abstract: Constraint Satisfaction Problems allow one to expressively model problems. On the other hand, propositional satisfiability problem (SAT) solvers can handle huge SAT instances. We thus present a technique to expressively model set constraint problems and to encode them automatically into SAT instances. Our technique is expressive and less error-prone. We apply it to the Social Golfer Problem and to symmetry breaking of the problem. Frederic Lardeux, Eric Monfroy 538 Multi-Objective Genetic Algorithm for Variable Selection in Multivariate Classication Problems: A Case Study in Verification of Biodiesel Adulteration [abstract]Abstract: This paper proposes multi-objective genetic algorithm for the problem of variable selection in multivariate calibration. We consider the problem related to the classification of biodiesel samples to detect adulteration, Linear Discriminant Analysis classifier. The goal of the multi-objective algorithm is to reduce the dimensionality of the original set of variables; thus, the classification model can be less sensitive, providing a better generalization capacity. In particular, in this paper we adopted a version of the Non-dominated Sorting Genetic Algorithm (NSGA-II) and compare it to a mono-objective Genetic Algorithm (GA) in terms of sensitivity in the presence of noise. Results show that the mono-objective selects 20 variables on average and presents an error rate of 14%. One the other hand, the multi-objective selects 7 variables and has an error rate of 11%. Consequently, we show that the multi-objective formulation provides classification models with lower sensitivity to the instrumental noise when compared to the mono-objetive formulation. Lucas de Almeida Ribeiro, Anderson Da Silva Soares 653 Sitting Multiple Observers for Maximum Coverage: An Accurate Approach [abstract]Abstract: The selection of the lowest number of observers that ensures the maximum visual coverage over an area represented by a digital elevation model (DEM) is an important problem with great interest in many elds, e.g., telecommunications, environment planning, among others. However, this problem is complex and intractable when the number of points of the DEM is relatively high. This complexity is due to three issues: 1) the diculty in determining the visibility of the territory from a point, 2) the need to know the visibility at all points of the territory and 3) the combinatorial complexity of the selection of observers. The recent progress in total-viewshed maps computation not only provides an ecient solu-tion to the rst two problems, but also opens other ways to new solutions that were unthinkable previously. This paper presents a new type of cartography, called the masked total viewshed map, and based on this algorithm, optimal solutions for both sequential and simultaneous observers location are provided. Antonio Manuel Rodriguez Cervilla, Siham Tabik, Luis Felipe Romero Gómez 169 USING CRITERIA RECONSTRUCTION OF LOW-SAMPLING TRAJECTORIES AS A TOOL FOR ANALYTICS [abstract]Abstract: Today, a lot of applications with incorporated Geo Positional Systems (GPS) deliver huge quantities of spatio-temporal data. Trajectories followed by moving objects can be generated from this data. However, these trajectories may have silent durations, i.e., time durations when no data are available for describing the route of a MO. As a result, the movement during silent durations must be described and the low sampling data trajectory need to be filled in using specialized techniques of data imputation to study and discover new knowledge based on movement. Our interest is to show opportunities of analytical tasks using a criteria based operator over reconstructed low-sampling trajectories. Also, a simple visual analysis of the reconstructed trajectories is done to offer a simple analytic perspective of the reconstruction and how the criterion of movement can change the analysis. To the best of our knowledge, this work is the first attempt to use the different reconstruction of trajectories criteria to identify the opportunities of analytical tasks over reconstructed low-sampling trajectories as a whole. Francisco Moreno, Edison Ospina, Iván Amón Uribe 258 Using Genetic Algorithms for Maximizing Technical Efficiency in Data Envelopment Analysis [abstract]Abstract: Data Envelopment Analysis (DEA) is a non-parametric technique for estimating the technical efficiency of a set of Decision Making Units (DMUs) from a database consisting of inputs and outputs. This paper studies DEA models based on maximizing technical efficiency, which aim to determine the least distance from the evaluated DMU to the production frontier. Usually, these models have been solved through unsatisfactory methods used for combinatorial NP-hard problems. Here, the problem is approached by metaheuristic techniques and the solutions are compared with those of the methodology based on the determination of all the facets of the frontier in DEA. The use of metaheuristics provides solutions close to the optimum with low execution time. Martin Gonzalez, Jose J. Lopez-Espin, Juan Aparicio, Domingo Gimenez, Jesus T. Pastor

 379 Towards a Cognitive Agent-Based Model for Air Conditioners Purchasing Prediction [abstract]Abstract: Climate change as a result of human activities is a problem of a paramount importance. The global temperature on Earth is gradually increasing and it may lead to substantially hotter summers in a moderate belt of Europe, which in turn is likely to influence the air conditioning penetration in this region. The current work is an attempt to predict air conditioning penetration in different residential areas in the UK between 2030-2090 using an integration of calibrated building models, future weather predictions and an agent-based model. Simulation results suggest that up to 12% of homes would install an air conditioner in 75 years’ time assuming an average purchasing ability of the households. The performed simulations provide more insight into the influence of overheating intensity along with households’ purchasing ability and social norms upon households’ decisions to purchase an air conditioner. Nataliya Mogles, Alfonso Ramallo-González, Elizabeth Gabe-Thomas 481 Crowd evacuations SaaS: an ABM approach [abstract]Abstract: Crowd evacuations involve thousands of persons in closed spaces. Having knowledge about where the problematic exits will be or where the disaster may occur can be crucial in emergency planning. We implemented a simulator using Agent Based Modelling able to model the behaviour of people in evacuation situations and a workflow able to run it in the cloud. The input is just a PNG image and the output are statistical results of the simulation executed on the cloud. This allows to provide the user with a system abstraction and only a map of the scenario is needed. Many events are held in main city squares, so to test our system we chose Siena and we fit about 28,000 individuals in the centre of the square. The software has special computational requirements because the results need to be statistically reliable. Because these needs we use distributed computing. In this paper we show how the simulator scales efficiently on the cloud. Albert Gutierrez-Milla, Francisco Borges, Remo Suppi, Emilio Luque 499 Differential Evolution with Sensitivity Analysis and the Powell's Method for Crowd Model Calibration [abstract]Abstract: Evolutionary algorithms (EAs) are popular and powerful approaches for model calibration. This paper proposes an enhanced EA-based model calibration method, namely the differential evolution (DE) with sensitivity analysis and the Powell's method (DESAP). In contrast to traditional EA-based model calibration methods, the proposed DESAP owns three main features. First, an entropy-based sensitivity analysis operation is integrated so as to dynamically identify important parameters of the model as evolution progresses online. Second, the Powell's method is performed periodically to fine-tune the important parameters of the best individual in the population. Finally, in each generation, the DE operators are performed on a small number of better individuals rather than all individuals in the population. These new search mechanisms are integrated into the DE framework so as to reduce the computational cost and to improve the search efficiency. To validate its effectiveness, the proposed DESAP is applied to two crowd model calibration cases. The results demonstrate that the proposed DESAP outperforms several state-of-the-art model calibration methods in terms of accuracy and efficiency. Jinghui Zhong and Wentong Cai 525 Strip Partitioning for Ant Colony Parallel and Distributed Discrete-Event Simulation [abstract]Abstract: Data partitioning is one of the main problems in parallel and distributed simulation. Distribution of data over the architecture directly influences the efficiency of the simulation. The partitioning strategy becomes a complex problem because it depends on several factors. In an Individual-oriented Model, for example, the partitioning is related to interactions between the individual and the environment. Therefore, parallel and distributed simulation should dynamically enable the interchange of the partitioning strategy in order to choose the most appropriate partitioning strategy for a specific context. In this paper, we propose a strip partitioning strategy to a spatially dependent problem in Individual-oriented Model applications. This strategy avoids sharing resources, and, as a result, it decreases communication volume among the processes. In addition, we develop an objective function that calculates the best partitioning for a specific configuration and gives the computing cost of each partition, allowing for a computing balance through a mapping policy. The results obtained are supported by statistical analysis and experimentation with an Ant Colony application. As a main contribution, we developed a solution where the partitioning strategy can be chosen dynamically and always returns the lowest total execution time. Francisco Borges, Albert Gutierrez-Milla, Remo Suppi, Emilio Luque 530 Model of Collaborative UAV Swarm Toward Coordination and Control Mechanisms Study [abstract]Abstract: In recent years, thanks to the low cost of deploying, maintaining an Unmanned Aerial Vehicle (UAV) system and the possibility to operating them in areas inaccessible or dangerous for human pilots, UAVs have attracted much research attention both in the military field and civilian application. In order to deal with more sophisticated tasks, such as searching survival points, multiple target monitoring and tracking, the application of UAV swarms is forseen. This requires more complex control, communication and coordination mechanisms. However, these mechanisms are difficult to test and analyze under flight dynamic conditions. These multi- UAV scenarios are by their nature well suited to be modeled and simulated as multi-agent systems. The first step of modeling an multi-agent system is to construct the model of agent, namely accurate model to represent its behavior, constraints and uncertainties of UAVs. In this paper we introduce our approach to model an UAV as an agent in terms of multi-agent system principle. Construction of the model to satisfy the need for a simulation environment that researchers can use to evaluate and analyze swarm control mechanisms. Simulations results of a case study is provided to demonstrate one possible use of this approach. Xueping Zhu, Zhengchun Liu, Jun Yang

 712 Collaborative Knowledge Fusion by Ad-Hoc Information Distribution in Crowds [abstract]Abstract: We study situations where (such as in a city festival) in the case of a phone signal outage cell phones can communicate opportunistically (for instance, using WiFi or Bluetooth) and we want to understand and control information spreading. A particular question is, how to prevent false information from spreading, and how to facilitate the spreading of useful (true) information? We introduce collaborative knowledge fusion as the operation by which individual, local knowledge claims are merged". Such fusion events are local, e.g. happen upon the physical meetings of knowledge providers. We study and evaluate different methods for collaborative knowledge fusion and study the conditions for and tradeoffs of the convergence to a global true knowledge state under various conditions. George Kampis, Paul Lukowicz 220 Modeling Deflagration in Energetic Materials using the Uintah Computational Framework [abstract]Abstract: Predictive computer simulations of large-scale deflagration and detonation are dependent on the availability of robust reaction models embedded in a computational framework capable of running on massively parallel computer architectures. We have been developing such models in the Uintah Computational Framework, which is capable of scaling up to 512k cores. Our particular interest is in predicting DDT for accident scenarios involving large numbers of energetic devices; the 2005 truck explosion in Spanish Fork Canyon, UT is a prototypical example. Our current reaction model adapts components from Ward, Son and Brewster to describe the effects of pressure and initial temperature on deflagration, from Berghout et al. for burning in cracks in damaged explosives, and from Souers for describing fully developed detonation. The reaction model has been subjected to extensive validation against experimental tests. Current efforts are focused on effects of carrying the computational grid elements on multiple aspects of deflagration and the transition to detonation. Jacqueline Beckvermit, Todd Harman, Andrew Bezdjian, Charles Wight 237 Fast Equilibration of Coarse-Grained Polymeric Liquids [abstract]Abstract: The study of macromolecular systems may require large computer simulations that are too time consuming and resource intensive to execute in full atomic detail. The integral equation coarse-graining approach by Guenza and co-workers enables the exploration of longer time and spatial scales without sacrificing thermodynamic consistency, by approximating collections of atoms using analytically-derived soft-sphere potentials. Because coarse-grained (CG) characterizations evolve polymer systems far more efficiently than the corresponding united atom (UA) descriptions, we can feasibly equilibrate a CG system to a reasonable geometry, then transform back to the UA description for a more complete equilibration. Automating the transformation between the two different representations simultaneously exploits CG efficiency and UA accuracy. By iteratively mapping back and forth between CG and UA, we can quickly guide the simulation towards a configuration that would have taken many more time steps within the UA representation alone. Accomplishing this feat requires a diligent workflow for managing input/output coordinate data between the different steps, deriving the potential at runtime, and inspecting convergence. In this paper, we present a lightweight workflow environment that accomplishes such fast equilibration without user intervention. The workflow supports automated mapping between the CG and UA descriptions in an iterative, scalable, and customizable manner. We describe this technique, examine its feasibility, and analyze its correctness. David Ozog, Jay McCarty, Grant Gossett, Allen Malony and Marina Guenza 392 Massively Parallel Simulations of Hemodynamics in the Human Vasculature [abstract]Abstract: We present a computational model of three-dimensional and unsteady hemodynamics within the primary large arteries in the human on 1,572,864 cores of the IBM Blue Gene/Q. Models of large regions of the circulatory system are needed to study the impact of local factors on global hemodynamics and to inform next generation drug delivery mechanisms. The HARVEY code successfully addresses key challenges that can hinder effective solution of image-based hemodynamics on contemporary supercomputers, such as limited memory capacity and bandwidth, flexible load balancing, and scalability. This work is the first demonstration of large (> 500 cm) fluid dynamics simulations of the circulatory system modeled at resolutions as high as 10 μm. Amanda Randles, Erik W. Draeger and Peter E. Bailey 402 Parallel performance of an IB-LBM suspension simulation framework [abstract]Abstract: We present performance results from ficsion, a general purpose parallel suspension solver, employing the Immersed-Boundary lattice-Boltzmann method (IB-LBM). ficsion is build on top of the open-source LBM framework Palabos, making use of its data structures and their inherent parallelism. We describe in brief the implementation and present weak and strong scaling results for simulations of dense red blood cell suspensions. Despite its complexity the simulations demonstrate a fairly good, close to linear scaling, both in the weak and strong scaling scenarios. Lampros Mountrakis, Eric Lorenz, Orestis Malaspinas, Saad Alowayyed, Bastien Chopard and Alfons G. Hoekstra

 405 A New Stochastic Cellular Automata Model for Traffic Flow Simulation with Driver's Behavior Prediction [abstract]Abstract: In this work we introduce a novel, flexible and robust traffic flow cellular automata model. Our proposal includes two important stages that make possible the consideration of different profiles of drivers' behaviors. We first consider the motion expectation of cars that are in front of each driver. Secondly, we define how a specific car decides to get around, considering the foreground traffic configuration. Our model uses stochastic rules for both situations, adjusting the Probability Density Function of the Beta Distribution for three neighborhoods drives behavior, adjusting different parameters of the Beta distribution for each one. Marcelo Zamith, Leal-Toledo Regina, Esteban Clua, Elson Toledo and Guilherme Magalhães 557 A Model Driven Approach to Water Resource Analysis based on Formal Methods and Model Transformation [abstract]Abstract: Several frameworks have been proposed in literature in order to cope with critical infrastructure modelling issues, and almost all rely on simulation techniques. Anyway simulation is not enough for critical systems, where any problem may lead to consistent loss in money and even human lives. Formal methods are widely used in order to enact exhaustive analyses of these systems, but their complexity grows with system dimension and heterogeneity. In addition, experts in application domains could not be familiar with formal modelling techniques. A way to manage complexity of analysis is the use of Model Based Transformation techniques: analysts can express their models in the way they use to do and automatic algorithms translate original models into analysable ones, reducing analysis complexity in a completely transparent way. In this work we describe an automatic transformation algorithm generating hybrid automata for the analysis of a natural water supply system. We use real system located in the South of Italy as case study. Francesco Moscato, Flora Amato, Francesco De Paola, Crescenzo Diomaiuta, Nicola Mazzocca, Maurizio Giugni 175 An Invariant Framework for Conducting Reproducible Computational Science [abstract]Abstract: Computational reproducibility depends on being able to isolate necessary and sufficient computational artifacts and preserve them for later re-execution. Both isolation and preservation of artifacts can be challenging due to the complexity of existing software and systems and the resulting implicit dependencies, resource distribution, and shifting compatibility of systems as time progresses---all conspiring to break the reproducibility of an application. Sandboxing is a technique that has been used extensively in OS environments for isolation of computational artifacts. Several tools were proposed recently that employ sandboxing as a mechanism to ensure reproducibility. However, none of these tools preserve the sandboxed application for re-distribution to a larger scientific community---aspects that are equally crucial for ensuring reproducibility as sandboxing itself. In this paper, we describe a combined sandboxing and preservation framework, which is efficient, invariant and practical for large-scale reproducibility. We present case studies of complex high energy physics applications and show how the framework can be useful for sandboxing, preserving and distributing applications. We report on the completeness, performance, and efficiency of the framework, and suggest possible standardization approaches. Haiyan Meng, Rupa Kommineni, Quan Pham, Robert Gardner, Tanu Malik and Douglas Thain 264 Very fast interactive visualization of large sets of high-dimensional data [abstract]Abstract: The embedding of high-dimensional data into 2D (or 3D) space is the most popular way of data visualization. Despite recent advances in developing of very accurate dimensionality reduction algorithms, such as BH-SNE, Q-SNE and LoCH, their relatively high computational complexity still remains the obstacle for interactive visualization of truly large sets of high-dimensional data. We show that a new clone of the multidimensional scaling method (MDS) – nr-MDS – can be up to two orders of magnitude faster than the modern dimensionality reduction algorithms. We postulate its linear O(M) computational and memory complexity. Simultaneously, our method preserves in 2D and 3D target spaces high separability of data, similar to that obtained by the state-of-the-art dimensionality reduction algorithms. We present the effects of nr-MDS application in visualization of data repositories such as 20 Newsgroups (M=18000), MNIST (M=70000) and REUTERS (M=267000). Witold Dzwinel, Rafał Wcisło 315 Automated Requirements Extraction for Scientific Software [abstract]Abstract: Requirements engineering is crucial for software projects, but formal requirements engineering is often ignored in scientific software projects. Scientists do not often see the benefit of directing their time and effort towards documenting requirements. Additionally, there is a lack of requirements engineering knowledge amongst scientists who develop software. We aim at helping scientists to easily recover and reuse requirements without acquiring prior requirements engineering knowledge. We apply an automated approach to extract requirements for scientific software from available knowledge sources, such as user manuals and project reports. The approach employs natural language processing techniques to match defined patterns in input text. We have evaluated the approach in three different scientific domains, namely seismology, building performance and computational fluid dynamics. The evaluation results show that 78--97% of the extracted requirement candidates are correctly extracted as early requirements. Yang Li, Emitzá Guzmán Ortega, Konstantina Tsiamoura, Florian Schneider, Bernd Bruegge

 387 Interactive 180º Rear Projection Public Relations [abstract]Abstract: In the globalized world, good products may not be enough to reach potential clients if creative marketing strategies are not well delineated. Public relations are also important when it comes to capture clients attention, making the first contact between them and companies products while being persuasive enough to gain the of the client that the company has the right products to fit their needs. A virtual public relations is purposed, combining technology and a human like public relations capable of interacting with potential clients placed 180 degrees in front of the installation, by using gestures and sound. Four 4 Microsoft Kinects were used to develop de 180 degrees model for interaction, which allows recognition of gestures, sound sources, words, extract the face and body of the user and track users positions (including an heat map). Ricardo Alves, Aldric Négrier, Luís Sousa, J.M.F Rodrigues, Paulo Felizberto, Miguel Gomes, Paulo Bica 11 Identification of DNA Motif with Mutation [abstract]Abstract: The conventional way of identifying possible motif sequences in a DNA strand is to use representative scalar weight matrix for searching good match substring alignments. However, this approach, solely based on match alignment information, is susceptible to a high number of ambiguous sites or false positives if the motif sequences are not well conserved. A significant amount of time is then required to verify these sites for the suggested motifs. Hence in this paper, the use of mismatch alignment information in addition to match alignment information for DNA motif searching is proposed. The objective is to reduce the number of ambiguous false positives encountered in the DNA motif searching, thereby making the process more efficient for biologists to use. Jian-Jun Shu 231 A software tool for the automatic quantification of the left ventricle myocardium hyper-trabeculation degree [abstract]Abstract: Isolated left ventricular non-compaction (LVNC) is a myocardial disorder characterised by prominent ventricular trabeculations and deep recesses extending from the LV cavity to the subendocardial surface of the LV. Up to now, there is no common and stable solution in the medical community for quantifying and valuing the non-compacted cardiomyopathy. A software tool for the automatic quantification of the exact hyper-trabeculation degree in the left ventricle myocardium is designed, developed and tested. This tool is based on medical experience, but the possibility of the human appreciation error has been eliminated. The input data for this software are the cardiac images of the patients obtained by means of magnetic resonance. The output results are the percentage quantification of the trabecular zone with respect to the compacted area. This output is compared with human processing performed by medical specialists. The software proves to be a valuable tool to help diagnosis, so saving valuable diagnosis time. Gregorio Bernabe, Javier Cuenca, Pedro E. López de Teruel, Domingo Gimenez, Josefa González-Carrillo 453 Blending Sentence Optimization Weights of Unsupervised Approaches for Extractive Speech Summarization [abstract]Abstract: This paper evaluates the performance of two unsupervised approaches, Maximum Marginal Relevance (MMR) and concept-based global optimization framework for speech summarization. Automatic summarization is very useful techniques that can help the users browse a large amount of data. This study focuses on automatic extractive summarization on multi-dialogue speech corpus. We propose improved methods by blending each unsupervised approach at sentence level. Sentence level information is leveraged to improve the linguistic quality of selected summaries. First, these scores are used to filter sentences for concept extraction and concept weight computation. Second, we pre-select a subset of candidate summary sentences according to their sentence weights. Last, we extend the optimization function to a joint optimization of concept and sentence weights to cover both important concepts and sentences. Our experimental results show that these methods can improve the system performance comparing to the concept-based optimization baseline for both human transcripts and ASR output. The best scores are achieved by combining all three approaches, which are significantly better than the baseline system. Noraini Seman, Nursuriati Jamil 513 The CardioRisk Project: Improvement of Cardiovascular Risk Assessment [abstract]Abstract: The CardioRisk project addresses the coronary artery disease (CAD), namely, the management of myocardial infarction (MI) patients. The main goal is the development of personalized clinical models for cardiovascular (CV) risk assessment of acute events (e.g. death and new hospitalization), in order to stratify patients according to their care needs. This paper presents an overview of the scientific and technological issues that are under research and development. Three major scientific challenges can be identified: i) the development of fusion approaches to merge CV risk assessment tools; ii) strategies for the grouping (clustering) of patients; iii) biosignal processing techniques to achieve personalized diagnosis. At the end of the project, a set of algorithms/models must properly address these three challenges. Additionally, a clinical platform was implemented, integrating the developed models and algorithms. This platform supports a clinical observational study (100 patients) that is being carried out in Leiria Hospital Centre to validate the developed approach. Inputs from the hospital information system (demographics, biomarkers, clinical exams) are considered as well as an ECG signal acquired based on a Holter device. A real patient dataset provided by Santa Cruz Hospital, Portugal, comprising N=460 ACS-NSTEMI patients is also applied to perform initial validations (individual algorithms). The CardioRisk team is composed by two research institutions, the University of Coimbra (Portugal), Politecnico di Milano (Italy) and Leiria Hospital Centre (a Portuguese public hospital). Simão Paredes, Teresa Rocha, Paulo de Carvalho, Jorge Henriques, Diana Mendes, Ricardo Cabete, Ramona Cabiddu, Anna Maria Bianchi and João Morais

 59 Swarming collapse under limited information flow between individuals [abstract]Abstract: Information exchange is critical to the execution and effectiveness of natural and artificial collective behaviors: fish schooling, birds flocking, amoebae aggregating or robots swarming. In particular, the emergence of dynamic collective responses in swarms confronted to complex environments underscore the central role played by social transmission of information. Here, the different possible origins of information flow bottlenecks are identified, and the associated effects on dynamic collective behaviors revealed using a combination of network-, control- and information-theoretic elements applied to a group of interacting self-propelled particles (SPPs). Specifically, we consider a minimalistic agent-based model consisting of N topologically interacting SPPs moving at constant speed through a domain having periodic boundaries. Each individual agent is characterized by its direction of travel and a canonical swarming behavior of the consensus type is examined. To account for the finiteness of the bandwidth, we consider synchronous information exchanges occurring every T = 1/2B, where the unit interval T is the minimum time interval between condition changes of data transmission signal. The agents move synchronously at discrete time steps T by a fixed distance upon receiving informational signals from their neighbors as per a linear update rule involving. We find a sufficient condition on the agents’ bandwidth B that guarantees the effectiveness of swarming while also highlighting the profound connection with the topology of the underlying interaction network. We also show that when decreasing B, the swarming behavior invariably vanishes following a second-order phase transition irrespectively of the intrinsic noise level. Roland Bouffanais 63 Multiscale simulation of organic electronics via massive nesting of density functional theory computational kernels [abstract]Abstract: Modelling is essential for development of organic electronics, such as organic light emitting diodes (OLEDs), organic field-effect transistors (OFETs) and organic photovoltaics (OPV). OLEDs have currently most applications, as they are already used in super-thin energy-efficient displays for television sets and smartphones, and in future will be used for lighting applications exploiting a world market worth tens of billions Euro. OLEDs should be further developed to increase their performance and durability, and reduce the currently high production costs. The conventional development process is very costly and time-demanding due to the large number of possible materials which have to be synthesized for the production and characterization of prototypes. Deeper understanding of the relationship between OLED device properties and materials structure allows for high-throughput materials screening and thus a tremendous reduction of development costs. In simulations, the properties of various materials one can be virtually and cost-effectively explored and compared to measurements. Based on these results, material composition, morphology and manufacturing processes can be systematically optimized. A typical OLED consists of a stack of multiple crystalline or amorphous organic layers. To compute electronic transport properties, e.g. charge mobilities, a quantum mechanical model, in particular the density functional theory (DFT) is commonly employed. Recently, we performed simulations of electronic processes in OLED materials achieved by multiscale modelling, i.e. by integrating sub-models on different length scales to investigate charge transport in thin films based on the experimentally characterized semi-conducting small molecules [1]. Here, we present a novel scale-out computational strategy to for a tightly coupled multiscale model consisting of a core region with 500 molecules (5000 pairs) of charge hopping sites and a embedding region, containing about 10000 electrostatically interacting molecules. The energy levels of each site depend on the local electrostatic environment yielding a significant contribution to the energy disor-der. This effect is explicitly taken into account in the quantum mechanical sub-model in a self-consistent manner, which represents however, a considerable computational challenge. Thus the total number of DFT calculations needed is of the order of 10^5-10^6. DFT models scale mostly as N^3, where N is the number of basis functions which is strongly related to the number of electrons. While DFT is implemented in a number of efficiently parallelized electronic structure codes, the computational scaling of a single DFT calculation applied for amorphous organic materials is naturally limited by the molecule size. After every iteration cycle, data are exchanged between all contained molecules of the self-consistence loop to update the electrostatic environment of each site. This requires that the DFT sub-model is executed employing a second-level parallelisation with a special scheduling strategy. The realisation of this model on high performance computer (HPC) systems has several issues: i) The DFT sub-models, which are stand-alone applications (such as NWChem or TURBOMOLE), have to be spawned at run time via process forking; ii) Large amounts of input and output data have to be transferred to and from the DFT sub-models though the cluster file system. These two requirements limit the computational performance and often conflict with the usage policies of common HPC environments. In addition, sub-model scheduling and DFT data pre-/post-processing have severe impact on the overall performance. To this end, we designed a DFT application programming interface (API) with different language bindings, such as Python and C++, allowing linking of DFT sub-models, independent of the concrete DFT implementation, to multiscale models. In addition, we propose solutions for in-core handling large input and output data as well as efficient scheduling algorithms. In this contribution, we will describe the architecture and outline the technical implementation of a framework for nesting DFT sub-models. We will demonstrate the use and analyse the performance of the framework for multiscale modelling of OLED materials. The framework provides an API which can be used to integrate DFT sub-models in other applications. [1] P. Friederich, F. Symalla, V. Meded, T. Neumann and W. Wenzel, “Ab Initio Treatment of Disorder Effects in Amorphous Organic Materials: Toward Parameter Free Materials Simulation”, Journal of Chemical Theory and Computation 10, 3720–3725 (2014). Angela Poschlad, Pascal Friederich, Timo Strunk, Wolfgang Wenzel and Ivan Kondov 189 Optimization and Practical Use of Composition Based Approaches Towards Identification and Collection of Genomic Islands and Their Ontology in Prokaryotes [abstract]Abstract: Motivation: Horizontally transferred genomic islands (islands, GIs) have been referred to as important factors which contribute towards the emergences of pathogens and outbreak instances. The development of tools towards the identification of such elements and retracing their distribution patterns will help to understand how such cases arise. Sequence composition has been used to identify islands, infer their phylogeny; and determine their relative times of insertions. The collection and curation of known islands will enhance insight into island ontology and flow. Results: This paper introduces the merger of SeqWord Genomic Islands Sniffer (SWGIS) which utilizes composition based approaches for identification of islands in bacterial genomic sequences and the Predicted Genomic Islands (Pre_GI) database which houses 26,744 islands found in 2,407 bacterial plasmids and chromosomes. SWGIS is a standalone program that detects genomic islands using a set of optimized parametric measures with estimates of acceptable false positive and false negative rates. Pre_GI is novel repository that includes island ontology and flux. This study furthermore illustrates the need for parametric optimization towards the prediction of islands to minimize false negative and false positive predictions. In addition Pre_GI emphasizes the practicality of compounded knowledge a database affords in the detection and visualization of ontological links between islands. Availability: SWGIS is freely available on the web at http://www.bi.up.ac.za/SeqWord/sniffer/index.html. Pre_GI is freely accessible at http://pregi.bi.up.ac.za/index.php. Rian Pierneef, Oliver Bezuidt, Oleg Reva

 235 Public service system design by radial formulation with dividing points [abstract]Abstract: In this paper, we introduce an approximate approach to public service system design making use of a universal IP-solver. The solved problem consists in minimization of the total discomfort of system users, which is usually proportional to the sum of demand-weighted distances between users and the nearest source of provided service. Presented approach is based on radial formulation. The disutility values are estimated by some upper and lower bounds given by so-called dividing points. Deployment of dividing points in uences the solution accuracy. The process of the dividing point deployment is based on the idea that some disutility values can be considered relevant and are expected to obtain in the optimal solution. Hereby, we study various approaches to the relevance with their impact on the accuracy and computational time. Jaroslav Janacek, Marek Kvet 439 An Improved Cellular Automata Algorithm for Wildfire Spread [abstract]Abstract: Despite being computationally more efficient than vector based approaches, the use of raster-based techniques for simulating wildfire spread has been limited by the distortions that affect the fire shapes. This work presents a Cellular Automata (CA) approach that is able to mitigate this problem with a redefinition of the spread velocity, where the equations generally used in vector-based approaches are modified by mean of a number of correction factors. A numerical optimization approach is used to find the optimal values for the correction factors. The results are compared to the ones given by two well-known Cellular Automata simulators. According to this work, the proposed approach provides better results, in terms of accuracy, at a comparable computational cost. Tiziano Ghisu, Bachisio Arca, Grazia Pellizzaro, Pierpaolo Duce 537 I-DCOP: Train Classification Based on an Iterative Process Using Distributed Constraint Optimization [abstract]Abstract: This paper presents an Iterative process based on Distributed Constraint Optimization (I-DCOP), to solve train classification problems. The input of the I-DCOP is the train classification problem modelled as a DCOP, named Optimization Model for Train Classification (OMTC). The OMTC generates a feasible schedule for a train classification problem defined by the inbound trains, the total of outbound trains and the cars assigned to them. The expected result, named feasible schedule, leads to the correct formation of the outbound trains, based on the order criteria defined. The OMTC minimizes the schedule execution time and the total number of roll-ins (operation executed on cars, sometimes charged by the yards). I-DCOP extends the OMTC including the constraints of limited amount of classification tracks ant their capacity. However, these constraints are included iteratively by adding domain restrictions on the OMTC. Both OMTC and I-DCOP have been measured using scenarios based on real yard data. OMTC has generated optimal and feasible schedules to the scenarios, optimizing the total number of roll-ins. I-DCOP solved more complex scenarios, providing sub-optimal solutions. The experiments have shown that distributed constraint optimization problems can include additional constraints based on interactively defined domain. Denise Maria Vecino Sato, André Pinz Borges, Peter Márton, Edson E. Scalabrin 622 An Investigation of the Performance Limits of Small, Planar Antennas Using Optimisation [abstract]Abstract: This paper presents a generalised parametrisation as well as an approach to computational optimisation for small, planar antennas. A history of previous, more limited antenna optimisation techniques is discussed and a new parametrisation introduced in this context. Validation of this new approach against previously developed structures is provided and preliminary results of the optimisation are demonstrated and discussed. For the optimisation, a binary Multi-Objective Particle Swarm Optimisation (MOPSO) is used and several methods for generating a viable initial population are introduced and discussed in the context of practical limitations computational simulations. Jan Hettenhausen, Andrew Lewis, David Thiel, Morteza Shahpari

 244 Big Data on Ice: The Forward Observer System for In-Flight Synthetic Aperture Radar Processing [abstract]Abstract: We introduce the Forward Observer system, which is designed to provide data assurance in field data acquisition while receiving significant amounts (several terabytes per flight) of Synthetic Aperture Radar data during flights over the polar regions, which provide unique requirements for developing data collection and processing systems. Under polar conditions in the field and given the difficulty and expense of collecting data, data retention is absolutely critical. Our system provides a storage and analysis cluster with software that connects to field instruments via standard protocols, replicates data to multiple stores automatically as soon as it is written, and provides pre-processing of data so that initial visualizations are available immediately after collection, where they can provide feedback to researchers in the aircraft during the flight. Richard Knepper, Matthew Standish, Matthew Link 690 Multi-Scale Coupling Simulation of Seismic Waves and Building Vibrations using ppOpen-HPC [abstract]Abstract: In order to simulate an earthquake shock originating from the earthquake source and the damage it causes to buildings, not only the seismic wave that propagates over a wide region of several 100 km2, but also the building vibrations that occur over a small region of several 10 m2 must be resolved concurrently. Such a multi-scale simulation is difficult because such kind of modeling and implementation by only a specific application are limited. To overcome these problems, a multi-scale weak-coupling simulation of seismic wave and building vibrations using "ppOpen-HPC" libraries is conducted. The ppOpen-HPC, wherein "pp" stands for "post-peta scale", is an open source infrastructure for development and execution of optimized and reliable simulation codes on large-scale parallel computers. On the basis of our evaluation, we confirm that an acceptable result can be achieved that ensures that the overhead cost of the coupler is negligible and it can work on large-scale computational resources. Masaharu Matsumoto, Takashi Arakawa, Takeshi Kitayama, Futoshi Mori, Hiroshi Okuda, Takashi Furumura, Kengo Nakajima 621 A hybrid SWAN version for fast and efficient practical wave modelling [abstract]Abstract: In the Netherlands, for coastal and inland water applications, wave modelling with SWAN has become a main ingredient. However, computational times are relatively high. Therefore we investigated the parallel efficiency of the current MPI and OpenMP versions of SWAN. The MPI version is not that efficient as the OpenMP version within a single node. Therefore, in this paper we propose a hybrid version of SWAN. It combines the efficiency of the current OpenMP version on shared memory with the capability of the current MPI version to distribute memory over nodes. We describe the numerical algorithm. With initial numerical experiments we show the potential of this hybrid version. Parallel I/O, further optimization, and behavior for larger number of nodes will be subject of future research. Menno Genseberger, John Donners 573 Numerical verification criteria for coseismic and postseismic crustal deformation analysis with large-scale high-fidelity model [abstract]Abstract: Numerical verification of postseismic crustal deformation analysis, computed using a large-scale finite element simulation, was carried out, by proposing new criteria that consider the characteristics of the target phenomenon. Specifically, pointwise displacement was used in the verification. In addition, the accuracy of the numerical solution was explicitly shown by considering the observation error of the data used for validation. The computational resource required for each analysis implies that high-performance computing techniques are necessary to obtain a verified numerical solution of crustal deformation analysis for the Japanese Islands. Such verification in crustal deformation simulations should take on greater importance in the future, since continuous improvement in the quality and quantity of crustal deformation data is expected. Ryoichiro Agata, Tsuyoshi Ichimura, Kazuro Hirahara, Mamoru Hyodo, Takane Hori, Chihiro Hashimoto, Muneo Hori

 404 Modelling Molecular Crystals by QM/MM [abstract]Abstract: Computational modelling of chemical systems is most easily carried out in the vacuum for single molecules. Accounting for environmental effects accurately in quantum chemical calculations, however, is often necessary for computational predictions of chemical systems to have any relevance to experiments carried out in the condensed phases. I will discuss a quantum mechanics/molecular mechanics (QM/MM) based method to account for solid-state effects on geometries and molecular properties in molecular crystals. The method in its recent black-box implementation in Chemshell can satisfactorily describe the crystal packing effects on local geometries in a molecular crystals and account for the electrostatic effects that affects certain molecular properties such as transition metal NMR chemical shifts, electric field gradients, Mössbauer and other spectroscopic properties. Ragnar Bjornsson 437 A Quaternion Method for Removing Translational and Rotational Degrees of Freedom from Transition State Search Methods [abstract]Abstract: In finite systems, such as nanoparticles and gas-phase molecules, calculations of minimum energy paths connecting initial and final states of transitions as well as searches for saddle points are complicated by the presence of external degrees of freedom, such as overall translation and rotation. A method based on quaternion algebra for removing the external degrees of freedom is presented and applied in calculations using two commonly used methods: the nudged elastic band (NEB) method for finding minimum energy paths and DIMER for minimum-mode following to find transition states. With the quaternion approach, fewer images in the NEB are needed to represent MEPs accurately. In both the NEB and DIMER calculations, the number of iterations required to reach convergence is significantly reduced. Marko Melander 438 Drag Assisted Simulated Annealing Method for Geometry Optimization of Molecules [abstract]Abstract: One of the methods to find the global minimum of a potential energy surface of a molecular system is simulated annealing. The main idea of simulated annealing is to start you system at a high temperature and then slowly cool it down so that there is a chance for the atoms in the system to explore the different degrees of freedom and ultimately find the global minimum. Simulated annealing is traditionally used in classical Monte Carlo or in classical molecular dynamics. One of the methods to find the global minimum of a potential energy surface of a molecular system is simulated annealing. The main idea of simulated annealing is to start you system at a high temperature and then slowly cool it down so that there is a chance for the atoms in the system to explore the different degrees of freedom and ultimately find the global minimum. Simulated annealing is traditionally used in classical Monte Carlo or in classical molecular dynamics. In molecular dynamics, one of the traditional methods was first implemented by Woodcock in 1971. In this method the velocities are scaled down after a given number of molecular dynamics steps, let the system explore the potential energy surface and scale down the velocities again until a minimum is found. In this work we propose to use a viscous friction term, similar to the one used in Langevin dynamics, to slowly bring down the temperature of the system in a natural way. We use drag terms that depend linearly or quadraticaly on the velocity of the particles. These drag terms will naturally bring the temperature the system down and when the system reaches equilibrium they will vanish. Thus, imposing a natural criterion to stop the simulation. We tested the method in Lenard-Jones clusters of up to 20 atoms. We started the system in different initial conditions and used different values for the temperature and the drag coefficients and found the global minima of every one of the clusters. This method demonstrated to be conceptually very simple, but very robust, in finding the global minima. Bilguun Woods, Paulo Acioli 597 Modeling electrochemical reactions at the solid-liquid interface using density functional calculations [abstract]Abstract: Charged interfaces are physical phenomena found in various natural systems and artificial devices within the fields of biology, chemistry and physics. In electrochemistry, this is known as the electrochemical double layer, introduced by Helmholtz over 150 years ago. At this interface, between a solid surface and the electrolyte, chemical reactions can take place in a strong electric field. In this presentation, a new computational method is introduced for creating charged interfaces and to study charge transfer reactions on the basis of periodic DFT calculations. The electrochemical double layer is taken as an example, in particular the hydrogen electrode as well as the O2, N2 and CO2 reductions. With this method the mechanism of forming hydrogen gas, water, ammonia and methane/methanol is studied. The method is quite general and could be applied to a wide variety of atomic scale transitions at charged interfaces. Egill Skúlason 601 Transition Metal Nitride Catalysts for Electrochemical Reduction of Nitrogen to Ammonia at Ambient Conditions [abstract]Abstract: Computational screening for catalysts that are stable, active and selective towards electrochemical reduction of nitrogen to ammonia at room temperature and ambient pressure is presented from a range of transition metal nitride surfaces. Density functional theory (DFT) calculations are used to study the thermochemistry of cathode reaction so as to construct the free energy profile and to predict the required onset potential via the Mars-van Krevelen mechanism. Stability of the surface vacancy as well as the poisoning possibility of these catalysts under operating conditions are also investigated towards catalyst engineering for sustainable ammonia formation. The most promising candidates turned out to be the (100) facets of rocksalt structure of VN, CrN, NbN and ZrN that should be able to form ammonia at -0.51 V, -0.76 V, -0.65 V and -0.76 V vs. SHE, respectively. Another interesting result of the current work is that for the introduced nitride candidates hydrogen evolution is no longer the competing reaction; thus, high formation yield of ammonia is expected at low onset potentials. Younes Abghoui, Egill Skúlason

 529 An Empirical Evaluation of a Programming Model for Context-Dependent Real-time Streaming Applications [abstract]Abstract: We present a Programming Model for real-time streaming applications on high performance embedded multi- and many-core systems. Realistic streaming applications are highly dependent on the execution context (usually of physical world), past learned strategies, and often real-time constraints. The proposed Programming Model encompasses both real-time requirements, determinism of execution and context dependency. It is an extension of the well-known Cyclo-Static Dataflow (CSDF), for its desirable properties (determinism and composability), with two new important data-flow filters: Select-duplicate, and Transaction which retain the main properties of CSDF graphs and also provide useful features to implement real-time computational embedded applications. We evaluate the performance of our programming model thanks to several real-life case-studies and demonstrate that our approach overcomes a range of limitations that use to be associated with CSDF models. Xuan Khanh Do, Stephane Louise, Albert Cohen 617 A Case Study on Using a Proto-Application as a Proxy for Code Modernization [abstract]Abstract: The current HPC system architecture trend consists in the use of many-core and heterogeneous architectures. Programming and runtime approaches struggle to scale with the growing number of nodes and cores. In order to take advantage of both distributed and shared memory levels, flat MPI seems unsustainable. Hybrid parallelization strategies are required. In a previous work we have demonstrated the efficiency of the D&C approach for the hybrid parallelization of finite element method assembly on unstructured meshes. In this paper we introduce the concept of proto-application as a proxy between computer scientists and application developers.The D&C library has been entirely developed on a proto-application, extracted from an industrial application called DEFMESH, and then ported back and validated on the original application. In the meantime, we have ported the D&C library in AETHER, an industrial fluid dynamics code developed by Dassault Aviation. The results show that the speed-up validated on the proto-application can be reproduced on other full scale applications using similar computational patterns. Nevertheless, this experience draws the attention on code modernization issues, such as data layout adaptation and memory management. As the D\&C library uses a task based runtime, we also make a comparison between Intel\textregistered Cilk\texttrademark Plus and OpenMP. Nathalie Möller, Eric Petit, Loïc Thébault, Quang Dinh 422 A Methodology for Profiling and Partitioning Stream Programs on Many-core Architectures [abstract]Abstract: Maximizing the data throughput is a very common implementation objective for several streaming applications. Such task is particularly challenging for implementations based on many-core and multi-core target platforms because, in general, it implies tackling several NP-complete combinatorial problems. Moreover, an efficient design space exploration requires an accurate evaluation on the basis of dataflow program execution profiling. The focus of the paper is on the methodology challenges for obtaining accurate profiling measures. Experimental results validate a many-core platform built by an array of Transport Triggered Architecture processors for exploring the partitioning search space based on the execution trace analysis. Malgorzata Michalska, Jani Boutellier, Marco Mattavelli 424 Execution Trace Graph Based Multi-Criteria Partitioning of Stream Programs [abstract]Abstract: One of the problems proven to be NP-hard in the field of many-core architectures is the partitioning of stream programs. In order to maximize the execution parallelism and obtain the maximal data throughput for a streaming application it is essential to find an appropriate actors assignment. The paper proposes a novel approach for finding a close-to-optimal partitioning configuration which is based on the execution trace graph of a dataflow network and its analysis. We present some aspects of dataflow programming that make the partitioning problem different in this paradigm and build the heuristic methodology on them. Our optimization criteria include: balancing the total processing workload with regards to data dependencies, actors idle time minimization and reduction of data exchanges between processing units. Finally, we validate our approach with experimental results for a video decoder design case and compare them with some state-of-the-art solutions. Malgorzata Michalska, Simone Casale-Brunet, Endri Bezati, Marco Mattavelli 365 A First Step to Performance Prediction for Heterogeneous Processing on Manycores [abstract]Abstract: In order to maintain the continuous growth of the performance of computers while keeping their energy consumption under control, the microelecttronic industry develops architectures capable of processing more and more tasks concurrently. Thus, the next generations of microprocessors may count hundreds of independent cores that may differ in their functions and features. As an extensive knowledge of their internals cannot be a prerequisite to their programming and for the sake of portability, these forthcoming computers necessitate the compilation flow to evolve and cope with heterogeneity issues. In this paper, we lay a first step toward a possible solution to this challenge by exploring the results of SPMD type of parallelism and predicting performance of the compilation results so that our tools can guide a compiler to build an optimal partition of task automatically, even on heterogeneous targets. We show on experimental results a very good accuracy of our tools to predict real world performance. Nicolas Benoit, Stephane Louise

 528 Towards an automatic co-generator for manycores’ architecture and runtime: STHORM case-study [abstract]Abstract: The increasing design complexity of manycore architectures at the hardware and software levels imposes to have powerful tools capable of validating every functional and non-functional property of the architecture. At the design phase, the chip architect needs to explore several parameters from the design space, and iterate on different instances of the architecture, in order to meet the defined requirements. Each new architectural instance requires the configuration and the generation of a new hardware model/simulator, its runtime, and the applications that will run on the platform, which is a very long and error-prone task. In this context, the IP-XACT standard has become widely used in the semiconductor industry to package IPs and provide low level SW stack to ease their integration. In this work, we present a primer work on a methodology to automatically configuring and assembling an IP-XACT golden model and generating the corresponding manycore architecture HW model, low-level software runtime and applications. We use the STHORM manycore architecture and the HBDC application as a case study. Charly Bechara, Karim Ben Chehida, Farhat Thabet 249 Retargeting of the Open Community Runtime to Intel Xeon Phi [abstract]Abstract: The Open Community Runtime (OCR) is a recent effort in the search for a runtime for extreme scale parallel systems. OCR relies on the concept of a dynamically generated task graph to express the parallelism of a program. Rather than being directly used for application development, the main purpose of OCR is to become a low-level runtime for higher-level programming models and tools. Since manycore architectures like the Intel Xeon Phi are likely to play a major role in future high performance systems, we have implemented the OCR API for shared-memory machines, including the Xeon Phi. We have also implemented two benchmark applications and performed experiments to investigate the viability of the OCR as a runtime for manycores. Our experiments and a comparison with OpenMP indicate that OCR can be an efficient runtime system for current and emerging manycore systems. Jiri Dokulil, Siegfried Benkner 14 Prefetching Challenges in Distributed Memories for CMPs [abstract]Abstract: Prefetch engines working on distributed memory systems behave independently by analyzing the memory accesses that are addressed to the attached piece of cache. They potentially generate prefetching requests targeted at any other tile on the system that depends on the computed address. This distributed behavior involves several challenges that are not present when the cache is unified. In this paper, we identify, analyze, and quantify the effects of these challenges, thus paving the way to future research on how to implement prefetching mechanisms at all levels of this kind of system with shared distributed caches. Marti Torrents, Raul Martínez, Carlos Molina