Session3 16:40 - 18:20 on 6th June 2016

ICCS 2016 Main Track (MT) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: KonTiki Ballroom

Chair: Andrea Zonca

369 A Performance Characterization of Streaming Computing on Supercomputers [abstract]
Abstract: Streaming computing models allow for on-the-fly processing of large data sets. With the increased demand for processing large amount of data in a reasonable period of time, streaming models are more and more used on supercomputers to solve data-intensive problems. Because supercomputers have been mainly used for compute-intensive workload, supercomputer performance metrics focus on the number of floating point operations in time and cannot fully characterize a streaming application performance on supercomputer. We introduce the injection and processing rates as main metrics to characterize the performance of streaming computing on supercomputers. We analyze the dynamics of these quantities in a modified STREAM benchmark developed atop of an MPI streaming library in a series of different configurations. We show that after a brief transient the injection and processing rates converge to sustained rates. We also demonstrate that streaming computing performance strongly depends on the number of connections between data producers and consumers and on the processing task granularity.
Stefano Markidis, Ivy Bo Peng, Roman Iakymchuk, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
35 High-Performance Tensor Contractions for GPUs [abstract]
Abstract: We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. In particular, to accelerate large scale tensor-formulated high-order finite element method (FEM) simulations, which is the main focus and motivation for this work, we represent contractions as tensor index reordering plus matrix-matrix multiplications (GEMMs). This is a key factor to achieve algorithmically many-fold acceleration (vs. not using it) due to possible reuse of data loaded in fast memory. In addition to using this context knowledge, we design tensor data-structures, tensor algebra interfaces, and new tensor contraction algorithms and implementations to achieve 90+% of a theoretically derived peak on GPUs. On a K40c GPU for contractions resulting in GEMMs on square matrices of size 8 for example, we are 2.8× faster than CUBLAS, and 8.5× faster than MKL on 16 cores of Intel Xeon ES-2670 (Sandy Bridge) 2.60GHz CPUs. Finally, we apply autotuning and code generation techniques to simplify tuning and provide an architecture-aware, user-friendly interface.
Ahmad Abdelfattah, Marc Baboulin, Veselin Dobrev, Jack Dongarra, Christopher Earl, Joel Falcou, Azzam Haidar, Ian Karlin, Tzanio Kolev, Ian Masliah, Stanimire Tomov
52 Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs [abstract]
Abstract: Solving a large number of relatively small linear systems has recently drawn more attention in the HPC community, due to the importance of such computational workloads in many scientific applications, including sparse multifrontal solvers. Modern hardware accelerators and their architecture require a set of optimization techniques that are very different from the ones used in solving one relatively large matrix. In order to impose concurrency on such throughput-oriented architectures, a common practice is to batch the solution of these matrices as one task offloaded to the underlying hardware, rather than solving them individually. This paper presents a high performance batched Cholesky factorization on large sets of relatively small matrices using Graphics Processing Units (GPUs), and addresses both fixed and variable size batched problems. We investigate various algorithm designs and optimization techniques, and show that it is essential to combine kernel design with performance tuning in order to achieve the best possible performance. We compare our approaches against state-of-the-art CPU solutions as well as GPU-based solutions using existing libraries, and show that, on a K40c GPU for example, our kernels are more than 2x faster.
Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra
143 Performing Unstructured Grid Flux Calculations on GPUs [abstract]
Abstract: The Finite Volume Method (FVM) is a numerical approach for the approximate solution of Partial Differential Equations (PDE) on discretized volumetric fields. Accurate solutions of PDEs derived from continuum mechanics, especially of complex fields, require structured or unstructured meshes with an ever increasing number of computational volumes. Computing solutions, particularly solutions to time dependent equations, with the Finite Volume Method can take 1000s of computer cores of a supercomputer months to complete. With increased computational and memory throughput, Graphics Processing Units (GPU) have the potential to improve on current im-plementations, providing a decrease in time to solu-tion of FVMs. Through the use of a model equation, we show that GPUs can improve the performance of an open source computational continuum mechanics toolbox, OpenFOAM. It is shown herein that through the use of an NVIDIA Tesla K20 achieves 3-10 times greater performance than using all 10 cores of an Intel Xeon E5-2670 v2.
Matthew Conley, Christian Sarofeen, Hua Shan and Justin Williams
255 Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU [abstract]
Abstract: Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes further performance degradation. We propose a new sparse matrix format called the Adaptive Multi-level Blocking (AMB) format, which aggressively reduces the memory traffic in SpMV computation to improve performance. By several optimization techniques such as division and blocking of the given matrix, the column indices are compressed and the reusability of input vector element in the cache is highly improved. An auto-tuning mechanism determines the best set of parameters for each matrix data by estimating the memory traffic and predicting the performance of a given SpMV computation. For 32 matrix datasets taken from the Sparse Matrix Collection collected by the University of Florida, AMB format achieves speedups of up to x2.92 compared to NVIDIA's cuSparse library and up to x1.40 compared to yaSpMV, which was recently proposed and has been the best known library to date for fast SpMV computation.
Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

ICCS 2016 Main Track (MT) Session 10

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Toucan

Chair: Ian Foster

453 An Exploratory Sentiment and Facial Expressions Analysis of Data from Photo-sharing Social Media: The Case of Football Violence [abstract]
Abstract: In this article we propose the possibility to increase the level of security during football matches due to analysis of data that are placed on the social networks of these events visitors. We considered different ways to recognize emotions from photographs and evaluate the tone of texts to trace the changes in the level of emotions in the photos depending on, took these pictures during the game with fights in the stands or during normal games. We tested this assumption and our hypothesis is partially confirmed. The software solution for emotion recognition from Microsoft Oxford showed that the level of emotion anger is almost 5 times higher in the photographs taken during the match with fights. In addition, other curious results were obtained, including after an analysis of the key of the comments left by events visitors’ photos.
Vasiliy Boychuk, Kirill Sukharev, Daniil Voloshin, Vladislav Karbovskii
86 Hybrid Computational Steering for Dynamic Data-Driven Application Systems [abstract]
Abstract: We consider steering of Dynamic Data-Drive Application Systems from two sources, firstly from dynamic data and secondly via human intervention to change parameters of the system. We propose an architecture for such hybrid steering and identify a Time Manager as an essential component. We perform experiments on an actual realisation of such a system, modelling a wa- ter distribution network, to show how the parameters of the Time Manager can be determined.
Junyi Han, John Brooke
298 Error Function Impact in Dynamic Data-Driven Framework Applied to Forest Fire Spread Prediction [abstract]
Abstract: In order to use environmental models effectively for management and decision-making, it is vital establish an appropriate level of confidence in their performance. There are different ways and different methodologies to establish the confidence of the models. For this reason an adequate error formula is a very important thing, because the results of the model can vary substantially. In this paper, we focus on the forest fire spread prediction. Several models have been developed to determine the forest fire propagation. Simulators implementing such models require diverse input parameters to deliver predictions about fire propagation. However, the data describing the actual scenario where the fire is taking place are usually subject to high levels of uncertainty. In order to minimize the impact of the input-data uncertainty a two-stage methodology was developed to calibrate the input parameters in an adjustment stage so that the calibrated parameters are used in the prediction stage to improve the quality of the predictions. Is in the adjustment stage where the error formula plays a crucial role, because different formulas implies different adjustments and, in consequence, different spread predictions. In this paper, different formulas are compared to show the impact in terms of prediction quality in DDDAS for forest fire spread prediction. These formulas have been tested using a real forest fire that took place in Arkadia (Greece) in 2011.
Carlos Carrillo, Tomàs Artés, Ana Cortes, Tomàs Margalef
456 Data-driven Forecasting Paradigms for Wildland Fires using the CAWFE® modeling system and Fire Detection Data [abstract]
Abstract: Large wildfires can cover hundreds of thousands of acres and continue for months, varying in intensity as they encounter different environmental conditions, which may vary dramatically in time and space during a single fire. They can produce extreme behaviors such as fire whirls, blow-ups, bursts of flame along the surface, and winds ten times stronger than ambient conditions, all of which result from the interactions between a fire and its atmospheric environment and are beyond the capabilities of current operational tools. Coupled weather-wildland fire models tie numerical weather prediction models to wildland fire behavior modules to simulate the impact of a fire on the atmosphere and the subsequent feedback of these fire-induced winds on fire behavior, i.e. how a fire “creates it’s own weather”. The methodology uses one such coupled model, the Coupled Atmosphere-Wildland Fire Environment (CAWFETM) Model, which contains two-way coupling between two components: (1) a numerical weather prediction model formulated for and with numerical methods optimized for simulating airflow at 100s of m in very complex terrain, and (2) a wildland fire component that is based upon semi-empirical relationships for surface fire rate of spread, post-frontal heat release, and a canopy fire model. The fire behavior is coupled to the atmospheric model such that low level winds drive the spread of the surface fire, which in turn release sensible heat, latent heat, and smoke fluxes into the lower atmosphere, in turn feeding back to affect the winds directing the fire. CAWFE been used to explain basic examples of fire behavior and, in retrospective simulations, to reproduce large wildland fire events. Over a wide range of conditions, model results show rough agreement in area, shape, and direction of spread at periods for which fire location data is available; additional events unique to each fire such as locations of sudden acceleration, flank runs up canyons, and bifurcations of a fire into two heads; and locations favorable to formation of phenomena such as fire whirls and horizontal roll vortices. The duration of such events poses a prediction challenge, as meteorological models lose skill over time after initialization, firefighting may impact the fire, and processes such as spotting, in which burning embers are lofted ahead of the fire, are not readily represented with deterministic models. Moreover, validation data for such models is limited and fire mapping and monitoring has been done piecemeal with infrared imaging sensors producing 12-hourly maps of active fires with nominal 1 km pixels, complemented by sub-hourly observations from geostationary satellites at coarser resolution and other valuable but non-routine tools such as airborne infrared mapping. Thus, in recent work, CAWFE has been integrated with with spatially refined (375 m) satellite active fire data derived from the Visible Infrared Imaging Radiometer Suite (VIIRS), which is used for initialization of a wildfire already in progress in the model and evaluation of its simulated progression at the time of the next pass. This work develops and applies Dynamic Data System techniques to create innovative approaches to wildfire growth forecasting based on a more symbiotic data-model system.
Janice L. Coen and Wilfrid Schroeder
151 D-STHARk: Evaluating Dynamic Scheduling of Tasks in Hybrid Simulated Architectures [abstract]
Abstract: The emergence of applications that demand to handle efficiently growing amounts of data has stimulated the development of new computing architectures with several Processing Units (PUs), such as CPUs core, graphics processing units (GPUs) and Intel Xeon Phi (MIC). Aiming to better exploit these architectures, recent works focus on proposing novel runtime environments that offer a variety of methods for scheduling tasks dynamically on different PUs. A main limitation of such proposals refers to the constrained system configurations, usually adopted to tune and test the proposals, since setting more complete and diversified evaluation environments is costly. In this context, we present D-STHARk, a GUI tool for evaluating Dynamic Scheduling of Tasks in Hybrid Simulated ARchitectures. D-STHARk provides a complete simulated execution environment that allows evaluating dynamic scheduling strategies on simulated applications and hybrid architectures. We evaluate our tool by simulating the dynamic scheduling strategies presented in~\cite{sbac2014}, using the same architecture and application. {\it D-STHARk} was able to achieve the same conclusions originally reported by the authors. Moreover, we performed an experiment varying the number of coprocessors, which was not previously verified due to lack of real architectures, showing that we may reduce the energy consumption, while keeping the same performance.
Leonardo Rocha, Fernando Mourão, Guilherme Andrade, Renato Ferreira, Srinivasan Parthasarathy, Danilo Melo, Sávyo Toledo, Aniket Chakrabarti

Agent-based simulations, adaptive algorithms and solvers (ABS-AAS) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Macaw

Chair: Maciej Paszynski

60 Efficient Memetic Continuous Optimization in Agent-based Computing [abstract]
Abstract: This paper deals with a concept of memetic search in agent-based evolutionary computation. In the presented approach, local search is applied during mutation of an agent. Using memetic algorithms causes increased demand on the computing power as the number of fitness function calls increases, therefore careful planning of the fitness computing (through the proposed local search mechanism based on caching parts of the fitness function) leads to significant lowering of this demand. Moreover, applying local search with care, can lead to gradual improvement of the whole population. In the paper the results obtained for selected high-dimensional (5000 dimensions) benchmark functions are presented. Results obtained by the evolutionary and memetic multi-agent systems are compared with classic evolutionary algorithm.
Wojciech Korczynski, Aleksander Byrski, Marek Kisiel-Dorohinicki
107 Reinforcement Learning with Multiple Shared Rewards [abstract]
Abstract: A major concern in multi-agent coordination is how to select algorithms that can lead agents to learn together to achieve certain goals. Much of the research on multi-agent learning relates to reinforcement learning (RL) techniques. One element of RL is the interaction model, which describes how agents should interact with each other and with the environment. Discrete, continuous and objective-oriented interaction models can improve convergence among agents. This paper proposes an approach based on the integration of multi-agent coordination models designed for reward-sharing policies. By taking the best features from each model, better agent coordination is achieved. Our experimental results show that this approach improves convergence among agents even in large state-spaces and yields better results than classical RL approaches.
Douglas M. Guisi, Richardson Ribeiro, Marcelo Teixeira, André Pinz Borges, Fabrício Enembreck
353 Why invasive Argentine ant supercolonies are a limited social transition [abstract]
Abstract: Around the world, invasive Argentine ants have formed “supercolonies”: societies of societies where separate colonies share one identity and behave as one colony, but can they be maintained long enough to transform into the next major evolutionary transition? Here we argue no, and use an agent-based, spatially-explicit model of Argentine ant supercolonies to demonstrate how the extreme supercolony structure of invasive Argentine ants will collapse over time. Our model explains supercolonies as ethnocentric collections of ant colonies, making social decisions based on cuticular hydrocarbon (CHC) markers demarcating in-group from out-group. When simulated, we observe supercolonies as fragile assemblages, where the shared identity that constitutes supercolonies depends on consistent warfare, and fades over time in invasive habitats where supercolonies face no competitors. With unicoloniality in ants being a rare and phylogenetically scattered trait, they appear to be only a temporary evolutionary phenomenon. Our model explains why. It is because the consistency of “self” cannot be maintained at the scale of supercolonies, and therefore the next major evolutionary transition must be found elsewhere.
Brian Whyte, John Marken, Matthew Zeffermen, Nicole Rooks and Keenan Mack
294 Hybrid direct and iterative solver with library of multi-criteria optimal orderings for h adaptive finite element method computations [abstract]
Abstract: In this paper we present a multi-criteria optimization of element partition trees and resulting orderings for multi-frontal solver algorithms executed for two dimensional h adaptive finite element method. In particular, the problem of optimal ordering of elimination of rows in the sparse matrices resulting from adaptive finite element method computations is reduced to the problem of finding of optimal element partition trees. Given a two dimensional h refined mesh, we find all optimal element partition trees by using the dynamic programming approach. An element partition tree defines a prescribed order of elimination of degrees of freedom over the mesh. We utilize three different metrics to estimate the quality of the element partition tree. As the first criterion we consider the number of floating point operations (FLOPs) performed by the multi -frontal solver. As the second criterion we consider the number of memory transfers (MEMOPS) performed by the multifrontal solver algorithm. As the third criterion we consider memory usage ( NONZEROS) of the multi-frontal direct solver. We show the optimization results for FLOPs vs MEMOPS as well as for the execution time estimated as FLOPs+100*MEMOPS vs NONZEROS. We obtain Pareto fronts with multiple optimal trees, for each mesh, and for each refinement level. We generate a library of optimal elimination trees for small grids with local singularities. We also propose an algorithm that for a given large mesh with multiple local singularities finds separators that partition the mesh into multiple sub -grids, each one with local singularity. We compute Schur complements over the local singularities using the optimal trees from the library, and later we submit the sequence of Schur complements into the iterative solver ILUPCG.
Hassan Aboueisha, Konrad Jopek, Bartłomiej Medygrał, Szymon Nosek, Mikhail Moshkov, Anna Paszynska, Maciej Paszynski, Keshav Pingali

Advances in High-Performance Computational Earth Sciences: Applications and Frameworks (IHPCES) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Cockatoo

Chair: Yifeng Cui

549 Inside the Pascal GPU Architecture and Benefits to Seismic Applications (Invited) [abstract]
Abstract: Stencil computations are one of the major computational patterns for seismic applications. In this talk I will first describe techniques to implement stencil computations efficiently on GPU. Then I will introduce the Pascal architecture in NVIDIA’s latest Tesla P100 GPU, especially focusing on new architecture features such as HBM2 and NVLINK. I will highlight how those features will enable significant performance improvement for seismic applications. Pascal also introduces GPU page fault which enables Unified Virtual Memory on GPU. I will illustrate how UVM will simplify GPU programming by removing the need to manage GPU data manually in the code while still get good performance in most cases. Bio: Peng Wang is a senior engineer in the HPC developer technology group of NVIDIA, where he works on parallelizing and optimizing scientific applications on GPU. One of his main focuses is on optimizing seismic algorithms on GPU. He got his Ph.D. in computational astrophysics from Stanford University.
Peng Wang
433 High-productivity Framework for Large-scale GPU/CPU Stencil Applications [abstract]
Abstract: A high-productivity framework for multi-GPU and multi-CPU computation of stencil applications is proposed. Our framework is implemented in C++ and CUDA languages. It automatically translates user-written stencil functions that update a grid point and generates both GPU and CPU codes. The programmers write user code just in the C++ language, and can execute the translated user code on either multiple multicore CPUs or multiple GPUs with optimization. The user code can be executed on multiple GPUs with the auto-tuning mechanism and the overlapping method to hide communication cost by computation. It can be also executed on multiple CPUs with OpenMP. The compressible flow code on GPU exploiting the optimizations provided by the framework has achieved 2.7 times faster than the non-optimized version.
Takashi Shimokawabe, Takayuki Aoki, Naoyuki Onodera
305 GPU acceleration of a non-hydrostatic ocean model with a multigrid Poisson/Helmholtz solver [abstract]
Abstract: To meet the demand for fast and detailed calculations in numerical ocean simulations, we implemented a non-hydrostatic ocean model on a graphics processing unit (GPU). We improved the model’s Poisson/Helmholtz solver by optimizing the memory access, using instruction-level parallelism, and applying a mixed precision calculation to the preconditioning of the Poisson/Helmholtz solver. The GPU-implemented model was 4.7 times faster than a comparable central processing unit execution. The output errors due to this implementation will not significantly influence oceanic studies.
Takateru Yamagishi, Yoshimasa Matsumura

Workshop on Computational and Algorithmic Finance (WCAF) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Boardroom East

Chair: A. Itkin and J.Toivanen

77 Global Optimization of nonconvex VaR measure using homotopy methods [abstract]
Abstract: Value at Risk is defined as the maximum loss of a portfolio given a future time horizon within high confidence (or probability, typical values used are 95% or 99%). In our work we devise novel techniques to minimize the non-convex Value-at-Risk function. VaR has the following properties: 1. VaR is a non-coherent measure of risk; in particular, it is not sub-additive. 2. VaR also happens to be a non-convex (multiple local solutions). The above properties make search for a global minimum of VaR a very diffcult problem, in fact an NP-hard problem. CVaR is a coherent and convex measure of risk and we use homotopy methods to project CVaR optimal solutions to VaR optimum. The results show that optimal VaR is within 1% of global minimum if found and as efficient as finding a solution to a convex conditional-VaR minimization problem.
Arun Verma
502 Optimal Pairs Trading with Time-Varying Volatility [abstract]
Abstract: We propose a pairs trading model that incorporates a time-varying volatility of the Constant Elasticity of Variance type. Our approach is based on stochastic control techniques; given a fixed time horizon and a portfolio of two cointegrated assets, we define the trading strategies as the portfolio weights maximizing the expected power utility from terminal wealth. We compute the optimal pairs strategies by using a Finite Difference method. We then show some empirical tests on data of stocks that are dual listed in Shanghai and Hong Kong of China, with low frequency and high frequency.
Thomas Lee
239 Computational Approach to an Optimal Hedging Problem [abstract]
Abstract: Consider a hedging strategy g(s) for using short-term futures contracts to hedge a long-term exposure. Here the underline commodity $S_t$ follows the stochastic differential equation $d S_t = \mu dt + \sigma dW_t$. It is known that the full hedging is not a good choice in terms of the risk. We establish a numerical approach for searching a strategy g(s) which reduces the running risk of the hedging. The approach also leads to the numerical solution of the optimal strategy for such a hedging problem.
Chaoqun Ma, Zhijian Wu and Xinwei Zhao
382 Novel Heuristic Algorithm for Large-scale Complex Optimization [abstract]
Abstract: Research in finance and lots of other areas often encounter large-scale complex optimization problems that are hard to find solutions. Classic heuristic algorithms often have limitations from the objectives that they are trying to mimic, leading to drawbacks such as lacking memory-efficiency, trapping in local optimal solutions, unstable performances, etc. This work considers imitating market competition behavior (MCB) and develops a novel heuristic algorithm accordingly, which combines characteristics of searching-efficiency, memory-efficiency, conflict avoidance, recombination, mutation and elimination mechanism. In searching space, the MCB algorithm updates solution dots according to the inertia and gravity rule, avoids falling into local optimal solution by introducing new enterprises while ruling out of the old enterprises at each iteration, and recombines velocity vector to speed up solution searching efficiency. This algorithm is capable of solving large-scale complex optimization model of large input dimension, including Over Lapping Generation Models, and can be easily applied to solve for other complex financial models. As a sample case, MCB algorithm is applied to a hybrid investment optimization model on R&D, riskless and risky assets over a continuous time period.
Honghao Qiu, Yehong Liu

International Workshop on Computational Flow and Transport: Modeling, Simulations and Algorithms (CFT) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Boardroom West

Chair: Shuyu Sun

417 MHD Relaxation with Flow in a Sphere [abstract]
Abstract: Relaxation process of magnetohydrodynamics (MHD) inside a sphere is investigated by a newly developed spherical grid system, Yin-Yang-Zhong grid. An MHD fluid with low dissipation rates is confined by a perfectly conducting, stress-free, and thermally insulating spherical boundary. The Reynolds number Re and the magnetic Reynolds number Rm is the same: Re=Rm=8600. Starting from a simple and symmetric state in which a ring-shaped magnetic flux without flow, a dynamical relaxation process of the magnetic energy is numerically integrated. The relaxed state has a characteristic structure of the magnetic field and the flow field with four vortices.
Kohei Yamamoto, Akira Kageyama
425 Numerical aspects related to the dynamic update of anisotropic permeability field during the transport of nanoparticles in the subsurface [abstract]
Abstract: Nanoparticles are particles that are between 1 and 100 nanometers in size. They present possible dangers to the environment due to the high surface to volume ratio, which can make the particles very reactive or catalytic. Furthermore, rapid increase in the implementation of nanotechnologies has released large amount of the nanowaste into the environment. In the last two decades, transport of nanoparticles in the subsurface and the potential hazard they impose to the environment have attracted the attention of researchers. In this work, we use numerical simulation to investigate the problem regarding the transport phenomena of nanoparticles in anisotropic porous media. We consider the case in which the permeability in the principal direction components will vary with respect to time. The interesting thing in this case is the fact that the anisotropy could disappear with time. We investigate the effect of the degenerating anisotropy on various fields such as pressure, porosity, concentration and velocities.
Meng-Huo Chen, Amgad Salama, Mohamed Ei-Amin
455 Localized computation of Newton updates in fully-implicit two-phase flow simulation [abstract]
Abstract: Fully-Implicit (FI) Methods are often employed in the numerical simulation of large-scale subsurface flows in porous media. At each implicit time step, a Newton-like method is used to solve the FI discrete nonlinear algebraic system. The linear solution process for the Newton updates is the computational workhorse of FI simulations. Empirical observations suggest that the computed Newton updates during FI simulations of multiphase flow are often sparse. Moreover, the level of sparsity observed can vary dramatically from iteration to the next, and across time steps. In several large scale applications, it was reported that the level of sparsity in the Newton update can be as large as 99\%. This work develops a localization algorithm that conservatively predetermines the sparsity pattern of the Newton update. Subsequently, only the flagged nonzero components of the system need be solved. The localization algorithm is developed for general FI models of two phase flow. Large scale simulation results of benchmark reservoir models show a 10 to 100 fold reduction in computational cost for homogeneous problems, and a 4 to 10 fold reduction for strongly heterogeneous problems.
Soham Sheth, Rami Younis
474 A Fully Coupled XFEM-EDFM Model for Multiphase Flow and Geomechanics in Fractured Tight Gas Reservoirs [abstract]
Abstract: Unconventional reservoirs are typically comprised of a multicontinuum stimulated formation, with complex fracture networks that have a wide range of length scales and geometries. A timely topic in the simulation of unconventional petroleum resources is in coupling the geomechanics of the fractured media to multiphase fluid flow and transport. We propose a XFEM-EDFM method which couples geomechanics with multiphase flow in fractured tight gas reservoirs. A proppant model is developed to simulate propped hydraulic fractures. The method is verified by analytical solutions. A simulation example with the configuration of two multiple-fractured horizontal wells is investigated. The influence of stress-dependent fracture permeability on cumulative production is analyzed.
Guotong Ren, Jiamin Jiang, Rami Younis

Bridging the HPC Tallent Gap with Computational Science Research Methods (BRIDGE) Session 1

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Rousseau West

Chair: Nia Alexandrov

418 Using Ontology Engineering Methods to Improve Computer Science and Data Science Skills [abstract]
Abstract: This paper focuses on issues of ontology construction process, Computing Classification System and Data Science domain ontology all used to help not only IT-students but any IT-specialists from industry and academia also to tackle the problems addressing the Big Data and Data Science skills gap. We discuss some methodological aspects of ontology design process and enriching of existing free accessible ontologies and show how suggested methods and software tools help IT-specialists including master students to implement their research work and participate in real world projects. The role of visual data exploration tools for certain issues under discussion and some use cases are discussed.
Svetlana Chuprina, Vassil Alexandrov, Nia Alexandrov
412 The Bilingual Semantic Network of Computing Concepts [abstract]
Abstract: We describe the construction of a bilingual (English-Russian /Russian-English) semantic network covering basic concepts of computing. To construct the semantic network, we used the Computing Curricular series created during 2000-2015 under the aegis of ACM and IEEE and the current stan-dards of IT specialists training in Russia, as well as some other English language and Russian lan-guage sources. The resulting network can be used as a basic component in an intelligent information system that allows processing bilingual search queries while considering their semantics and to help support and guide automated translation efforts of academic texts from one language to the other. The network can also be useful to support comparative analysis and integration of the programs and teach-ing materials for Computing and IT education in Russia and English speaking countries. This network can support cross-lingual information retrieval, knowledge management and machine translation, which play an important role in e-learning personalization and retrieval in the computing domain, thus allowing to benefit from online educational resources that are available in both languages.
Evgeniy Khenner, Olfa Nasraoui
514 Biomedical Big Data Training Collaborative (BBDTC): An effort to bridge the talent gap in biomedical science and research [abstract]
Abstract: The BBDTC (https://biobigdata.ucsd.edu) is a community-oriented platform to encourage high-quality knowledge dissemination with the aim of growing a well-informed biomedical big data community through collaborative efforts on training and education. The BBDTC collaborative is an e-learning platform that supports the biomedical community to access, develop and deploy open training materials. The BBDTC supports Big Data skill training for biomedical scientists at all levels, and from varied backgrounds. The natural hierarchy of courses allows them to be broken into and handled as modules. Modules can be reused in the context of multiple courses and reshuffled, producing a new and different, dynamic course called a playlist. Users may create playlists to suit their learning requirements and share it with individual users or the wider public. BBDTC leverages the maturity and design of the HUBzero content-management platform for delivering educational content. To facilitate the migration of existing content, the BBDTC supports importing and exporting course material from the edX platform. Migration tools will be extended in the future to support other platforms. Hands-on training software packages, i.e., toolboxes, are supported through Amazon EC2 and Virtualbox virtualization technologies, and they are available as: (i) downloadable lightweight Virtualbox Images providing a standardized software tool environment with software packages and test data on their personal machines, and (ii) remotely accessible Amazon EC2 Virtual Machines for accessing biomedical big data tools and scalable big data experiments. At the moment, the BBDTC site contains three open Biomedical big data training courses with lecture contents, videos and hands-on training utilizing VM toolboxes, covering diverse topics. The courses have enhanced the hands-on learning environment by providing structured content that users can use at their own pace. A four course biomedical big data series is planned for development in early 2016.
Shweta Purawat, Charles Cowart, Rommie Amaro, Ilkay Altintas
516 Ontology Based Data Access Methods to Teach Students to Transform Traditional Information Systems and Simplify Decision Making Process [abstract]
Abstract: We describe a service-based approach that provides a natural language interface to legacy information systems, built on top of relational database management systems. The long term goal is to make data management and analysis accessible to a wider range of users for a diverse range of purposes and to simplify the decision making process. We present an ontology-driven web-service, named Reply, that transforms traditional information systems into intelligent systems, endowed with a natural language interface, so that they can be queried by any novice user much like modern day search engines. The principal mechanism of our approach is turning a natural language query into a SQL-query for structured data sources by using Ontology-Based Data Access methods. We also outline how the proposed approach allows semantic searching of large structured, unstructured, or semi-structured data within the database or outside sources, thus helping bridge the talent gap in the case of Big Data Analytics used by researchers and postgraduate students.
Svetlana Chuprina, Igor Postanogov, Olfa Nasraoui
342 The Impact of Learning Activities on the Final Grade in Engineering Education [abstract]
Abstract: A principal component analysis is carried out on the undergraduate level “Stochastic Models” course. We determine that the first principal component has a positive correlation with the score of the final written cumulative exam. This could possible mean that the final exam could be eliminated from engineering curricula, but the variability is significant as measured by the correlation R statistic. We gathered a much larger sample and found that the variability increased, indicating changes in the course and students emphasis in learning activities. Therefore we concluded, that the evidence presented does not justify eliminating written cumulative final exams.
Raul Ramirez-Velarde, Nia Alexandrov, Miguel Sanhueza-Olave, Raul Perez-Cazares

The Workshop on Computational Finance and Business Intelligence (CFBI) Session 1

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Rousseau East

Chair: Yong Shi

293 Some Experimental Issues in Financial Fraud Mining [abstract]
Abstract: Financial fraud detection is an important problem with a number of design aspects to consider. Issues such as problem representation, choice of detection technique, feature selection, and performance analysis will all affect the perceived ability of solutions, so for auditors and researchers to be able to sufficiently detect financial fraud it is necessary that these issues be thoroughly explored. In this paper we will analyse some of the relevant experimental issues of fraud detection with a focus on credit card fraud. Observations will be made on issues that have been explored by prior researchers for general data mining problems but not yet thoroughly explored in the context of financial fraud detection, including problem representation, feature selection, and performance metrics. We further investigated some of these issues with controlled simulations, concentrating on detection algorithms, feature selection, and performance metrics for credit card fraud.
Jarrod West, Maumita Bhattacharya
323 Ramp Loss Linear Programming Nonparallel Support Vector Machine [abstract]
Abstract: Motivated by the fact that the l1-penalty is piecewise linear, we proposed a ramp loss linear programming nonparallel support vector machine (ramp-LPNPSVM), in which the l1-penalty is applied for the RNPSVM, for binary classification. Since the ramp loss has the piecewise linearity as well, ramp-LPNPSVM is a piecewise linear minimization problem and a local minimum can be effectively found by the Concave Convex Procedure and experimental results on benchmark datasets confirm the effectiveness of the proposed algorithm. Moreover, the l1-penalty can enhance the sparsity.
Dalian Liu, Dandan Chen, Yong Shi, Yingjie Tian
432 The Combination of Topology and Nodes' States Dynamics as an Early-Warning Signal of Critical Transition in a Banking Network Model [abstract]
Abstract: Banking systems, modelled with networks, evolve over time overcoming critical points. Topology-oriented indicators of tipping points and early-warning signals of criticality in networks do not reflect the gradual movement of a system towards a tipping point. Plenty of networks with SIR-like dynamics have restricted numbers of node states. In the case of banking networks, the range space of node states is continual, which allows an estimation of single bank remoteness from an insolvent state. Remoteness and velocity reflect change in the state per iteration and are considered in order to estimate the influence of node dynamics. Both node dynamics and topology are taken into account. We consider the positive and negative impact of interbank interactions (edge presence). Each edge is considered with weight and length parameters corresponding to the size of interbank lending and the number of iterations remaining before it expires. It was shown that the dropping well below zero of the presented indicator, is referred to as the potential of interactions, is a sign of a forthcoming tipping point. The introduced $\mathscr{T}$-Threatened set allows the detection of an approaching a tipping point in terms of nodes' states.
Valentina Y. Guleva
484 High-order numerical method for generalized Black-Scholes model [abstract]
Abstract: This work presents a high order numerical method for the solution of generalized Black-Scholes model for European call option. The numerical method is derived using a two-step backward differentiation formula in the temporal discretization and a High-Order Difference approximation with Identity Expansion (HODIE) scheme in the spatial discretization. The present scheme gives second order accuracy in time and third order accuracy in space. Numerical experiments are conducted in support of the theoretical results.
S Chandra Sekhara Rao, Manisha Manisha

ICCS 2016 Main Track (MT) Session 16

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Rousseau Center

Chair: Christopher Monterola

357 Distributed Multi-authority Attribute-based Encryption Scheme for Friend Discovery in Mobile Social Networks [abstract]
Abstract: In recent years, the rapid expansion of the capability of portable devices, cloud servers and cellular network technologies is the wind beneath the wing of mobile social networks. Compared to traditional web-based online social networks, the mobile social networks can assist users to easily discover and make new social interaction with others. A challenging task is to protect the privacy of the users' profiles and communications. Existing works are mainly based on traditional cryptographic methods, such as homomorphic and group signatures, which are very computationally costly. In this paper, we propose a novel distributed multi-authority attribute-based encryption scheme to efficiently achieve privacy-preserving without additional special signatures. In addition, the proposed scheme can achieve fine-grained and flexible access control. Detailed analysis demonstrates the effectiveness and practicability of our scheme.
Wenbo Wang, Fang Qi, Xiaoqiang Wu, Zhe Tang
58 ADAMANT: tools to capture, analyze, and manage data movement [abstract]
Abstract: In the converging world of High Performance Computing and Big Data, moving data is becoming a critical aspect of performance and energy efficiency. In this paper we present the Advanced DAta Movement Analysis Toolkit (ADAMANT), a set of tools to capture and analyze data movement within an application, and to aid in understanding performance and efficiency in current and future systems. ADAMANT identifies all the data objects allocated by an application and uses instrumentation modules to monitor relevant events (e.g. cache misses). Finally, ADAMANT produces a per-object performance profile. In this paper we demonstrate the use of ADAMANT in analyzing three applications, BT, BFS, and Velvet, and evaluate the impact of different memory technology. With the information produced by ADAMANT we were able to model and compare different memory configurations and object placement solutions. In BFS we devised a placement which outperforms caching, while in the other two cases we were able to point out which data objects may be problematic for the configurations explored, and would require refactoring to improve performance.
Pietro Cicotti, Laura Carrington
67 Urgent Computing - A General Makespan Robustness Model for Ensembles of Forecasts [abstract]
Abstract: Urgent computing requires computations to commence in short order and complete within a stipulated deadline so as to support mitigation activities in preparation, response and recovery from an event that requires immediate attention. Missing an urgent deadline can lead to dire consequences where avoidable human and financial losses ensued. Allocation of resources such that the deadline can be met is thus crucial. Robustness is of great importance to ensure that small perturbations on the computing systems do not affect the makespan of computations such that the deadline is missed. This work focuses on developing a general mathematical makespan model for urgent computing to enable a robust allocation of ensemble forecasts while minimising the makespan. Three patterns of different resource allocation will be investigated to illustrate the model. The result will aid in satisfying the most crucial requirement, the time criterion, of urgent computing.
Siew Hoon Leong, Dieter Kranzlmüller