ICCS 2016 Main Track (MT) Session 1

Time and Date: 10:35 - 12:15 on 6th June 2016

Room: KonTiki Ballroom

Chair: David Abramson

19	Performance Analysis and Optimization of a Hybrid Seismic Imaging Application [abstract] Abstract: Applications to process seismic data are computationally expensive and, therefore, employ scalable parallel systems to produce timely results. Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the subsurface. Tuning MPI+OpenMP programs for modern platforms is difficult, and, therefore, assistance is required from performance analysis tools. These tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from Rice University's HPCToolkit and hardware performance counters, we were able to improve the performance of Shell's prototype distributed-memory Reverse Time Migration code by roughly 30 percent.	Sri Raj Paul, Mauricio Araya-Polo, John Mellor-Crummey, Detlef Hohl
33	Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications [abstract] Abstract: As parallel machines increase their number of processors, so does the failure rate of the global system, thus, long-running applications will need to make use of fault tolerance techniques to ensure the successful execution completion. Most of current HPC systems are built as clusters of multicores. The hybrid MPI-OpenMP paradigm provides numerous benefits on these systems. This paper presents a checkpointing solution for hybrid MPI-OpenMP applications, in which checkpoint consistency is guaranteed by using a coordination protocol intra-node, while no internode coordination is needed. The proposal reduces network utilization and storage resources in order to optimize the I/O cost of fault tolerance, while minimizing the checkpointing overhead. Besides, the portability of the solution and the dynamic parallelism provided by OpenMP enable the restart of the applications using machines with different architectures, operating systems and/or number of cores, adapting the number of running OpenMP threads for the best exploitation of the available resources. Extensive evaluation using hybrid MPI-OpenMP applications from the ASC Sequoia Benchmark Codes and NERSC-8/Trinity benchmarks is presented, showing the effectiveness and efficiency of the approach.	Nuria Losada, María J. Martín, Gabriel Rodríguez, Patricia González
38	Checkpointing of Parallel MPI Applications using MPI One-sided API with Support for Byte-addressable Non-volatile RAM [abstract] Abstract: The increasing size of computational clusters results in an increasing probability of failures, which in turn requires application checkpointing in order to survive those failures. Traditional checkpointing requires data to be copied from application memory into persistent storage medium, which increases application execution time as it is usually done in a separate step. In this paper we propose to use emerging byte-addressable non-volatile RAM (NVRAM) as a persistent storage medium and we analyze various methods of making consistent checkpoints with support of MPI one-sided API in order to minimize checkpointing overhead. We test our solution on two applications: HPCCG benchmark and PageRank algorithm. Our experiments showed that NVRAM based checkpointing performs much better than traditional disk based approach. We also simulated different possible latencies and bandwidth of future NVRAM and our experiments showed that only bandwidth had visible impact onto application execution time.	Piotr Dorożyński, Pawel Czarnul, Artur Malinowski, Krzysztof Czuryło, Łukasz Dorau, Maciej Maciejewski, Paweł Skowron
57	Acceleration of Tear Film Map Definition on Multicore Systems [abstract] Abstract: Dry eye syndrome is a public health problem, and one of the most common conditions seen by eye care specialists. Among the clinical tests for its diagnosis, the evaluation of the interference patterns observed in the tear film lipid layer is often employed. In this sense, tear film maps illustrate the spatial distribution of the patterns over the whole tear film and provide useful information to practitioners. However, the creation of a single map usually takes tens of minutes. Medical experts currently demand applications with lower response time in order to provide a faster diagnosis for their patients. In this work, we explore different parallel approaches to accelerate the definition of the tear film map by exploiting the power of today's ubiquitous multicore systems. They can be executed on any multicore system without special software or hardware requirements. The experimental evaluation determines the best approach (on-demand with dynamic seed distribution) and proves that it can significantly decrease the runtime. For instance, the average runtime of our experiments with 50 real-world images on a system with AMD Opteron processors is reduced from more than 20 minutes to one minute and 12 seconds.	Jorge González-Domínguez, Beatriz Remeseiro, María J. Martín
99	Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis [abstract] Abstract: With the emergence of exascale computing and big data analytics, many important scientific applications require the integration of computationally intensive modeling and simulation with data-intensive analysis to accelerate scientific discovery. In this paper, we create an analytical model to steer the optimization of the end-to-end time-to-solution for the integrated computation and data analysis. We also design and develop an intelligent data broker to efficiently intertwine the computation stage and the analysis stage to practically achieve the optimal time-to-solution predicted by the analytical model. We perform experiments on both synthetic applications and real-world computational fluid dynamics (CFD) applications. The experiments show that the analytic model exhibits an average relative error of less than 10%, and the applications’ performance can be improved by up to 131% for the synthetic programs and by up to 78% for the real-world CFD application.	Yuankun Fu, Fengguang Song, Luoding Zhu

ICCS 2016 Main Track (MT) Session 2

Time and Date: 14:30 - 16:10 on 6th June 2016

Room: KonTiki Ballroom

Chair: Maria Indrawan

115	EMINENT: EMbarrassINgly parallEl mutatioN Testing [abstract] Abstract: During the last decade, the fast evolution in communication networks has facilitated the development of complex applications that manage vast amounts of data, like Big Data applications. Unfortunately, the high complexity of these applications hampers the testing process. Moreover, generating adequate test suites to properly check these applications is a challenging task due to the elevated number of potential test cases. Mutation testing is a valuable technique to measure the quality of the selected test suite that can be used to overcome this difficulty. However, one of the main drawbacks of mutation testing lies on the high computational cost associated to this process. In this paper we propose a dynamic distributed algorithm focused on HPC systems, called EMINENT, which has been designed to face the performance problems in mutation testing techniques. EMINENT alleviates the computational cost associated with this technique since it exploits parallelism in cluster systems to reduce the final execution time. In addition, several experiments have been carried out on three applications in order to analyse the scalability and performance of EMINENT. The results show that EMINENT provides an increase in the speed-up in most scenarios.	Pablo C. Cañizares, Mercedes G. Merayo, Alberto Núñez
386	CHiS: Compressed Hierarchical Schur Linear System Solver for 3D FDFD Photonic Device Analysis with Hardware Acceleration [abstract] Abstract: Finite-difference frequency-domain (FDFD) analysis of wave optics and photonics requires linear system solver for discretized vector Helmholtz equation. The linear system can be ill-conditioned when computation domain is large or perfectly-matched layers (PMLs) are used. Direct factorization of the linear systems for 3D photonic simulation may require tremendous amount of computation resources. We propose compressed hierarchical Schur method (CHiS) for computation time and memory usage savings. The results show that the CHiS method takes 45% less factorization time and 35% less memory usage compared with the uncompressed hierarchical Schur method in selected test. The computing procedure also involves many dense linear algebra operations, which can be efficiently executed in modern high-performance hardwares such as graphics processing units (GPUs) and multicore/manycore processors. We investigate the GPU acceleration strategy and hardware tuning by rescheduling the factorization. The proposed CHiS is also tested on a dual-GPU server for performance analysis. These new techniques can efficiently utilize modern high-performance environment and greatly accelerate development of future development of photonic devices and circuits.	Cheng-Han Du and Weichung Wang
424	Faster cloud Star Joins with reduced disk spill and network communication [abstract] Abstract: Combining powerful parallel frameworks and on-demand commodity hardware, cloud computing has made both analytics and decision support systems canonical to enterprises of all sizes. Associated with unprecedented volumes of data stacked by such companies, filtering and retrieving them are pressing challenges. This data is often organized in star schemas, in which Star Joins are ubiquitous and expensive operations. In particular, excessive disk spill and network communication are tight bottlenecks for all current MapReduce or Spark solutions. Here, we propose two efficient solutions that drop the computation time by at least 60%: the Spark Bloom-Filtered Cascade Join (SBFCJ) and the Spark Broadcast Join (SBJ). Conversely, a direct Spark implementation of a sequence of joins renders poor performance, showcasing the importance of further filtering for minimal disk spill and network communication. Finally, while SBJ is twice faster when memory per executor is large enough, SBFCJ is remarkably resilient to low memory scenarios. Both algorithms pose very competitive solutions to Star Joins in the cloud.	Jaqueline Joice Brito, Thiago Mosqueiro, Ricardo Rodrigues Ciferri, Cristina Dutra De Aguiar Ciferri
438	Jupyter in High Performance Computing [abstract] Abstract: High Performance Computing has traditionally been the natural habitat of highly specialized parallel programming experts running large batch jobs. With every field of Science becoming richer and richer in the amount of data available, many more scientists are transitioning to Supercomputers or cloud computing resources. In this paper I would like to review how the Jupyter project, a suite of scientific computing tools, can help to democratize access to Supercomputers by lowering the entry barrier for new scientific communities and provide a gradual path to harnessing more distributed computing capabilities. I will start from the interactive usage of the Jupyter Notebook, a widespread browser-based data exploration environment, on a HPC cluster, then explain how notebooks can be used as scripts directly or in a workflow environment and finally how batch data processing like traditional MPI, Spark and XSEDE Gateways can benefit from inter-operating with a Jupyter Notebook environment.	Andrea Zonca
463	High Performance LDA through Collective Model Communication Optimization [abstract] Abstract: LDA is a widely used machine learning technique for big data analysis. The application includes an inference algorithm that iteratively updates a model until it converges. A major challenge is the scaling issue in parallelization owing to the fact that the model size is huge and parallel workers need to communicate the model continually. We identify three important features of the model in parallel LDA computation: 1. The volume of model parameters required for local computation is high; 2. The time complexity of local computation is proportional to the required model size; 3. The model size shrinks as it converges. By investigating collective and asynchronous methods for model communication in different tools, we discover that optimized collective communication can improve the model update speed, thus allowing the model to converge faster. The performance improvement derives not only from accelerated communication but also from reduced iteration computation time as the model size shrinks during the model convergence. To foster faster model convergence, we design new collective communication abstractions and implement two Harp-LDA applications, "lgs" and "rtt". We compare our new approach with Yahoo! LDA and Petuum LDA, two leading implementations favoring asynchronous communication methods in the field, on a 100-node, 4000-thread Intel Haswell cluster. The experiments show that "lgs" can reach higher model likelihood with shorter or similar execution time compared with Yahoo! LDA, while "rtt" can run up to 3.9 times faster compared with Petuum LDA when achieving similar model likelihood.	Bingjing Zhang, Bo Peng, Judy Qiu

ICCS 2016 Main Track (MT) Session 3

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: KonTiki Ballroom

Chair: Andrea Zonca

369	A Performance Characterization of Streaming Computing on Supercomputers [abstract] Abstract: Streaming computing models allow for on-the-fly processing of large data sets. With the increased demand for processing large amount of data in a reasonable period of time, streaming models are more and more used on supercomputers to solve data-intensive problems. Because supercomputers have been mainly used for compute-intensive workload, supercomputer performance metrics focus on the number of floating point operations in time and cannot fully characterize a streaming application performance on supercomputer. We introduce the injection and processing rates as main metrics to characterize the performance of streaming computing on supercomputers. We analyze the dynamics of these quantities in a modified STREAM benchmark developed atop of an MPI streaming library in a series of different configurations. We show that after a brief transient the injection and processing rates converge to sustained rates. We also demonstrate that streaming computing performance strongly depends on the number of connections between data producers and consumers and on the processing task granularity.	Stefano Markidis, Ivy Bo Peng, Roman Iakymchuk, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
35	High-Performance Tensor Contractions for GPUs [abstract] Abstract: We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. In particular, to accelerate large scale tensor-formulated high-order finite element method (FEM) simulations, which is the main focus and motivation for this work, we represent contractions as tensor index reordering plus matrix-matrix multiplications (GEMMs). This is a key factor to achieve algorithmically many-fold acceleration (vs. not using it) due to possible reuse of data loaded in fast memory. In addition to using this context knowledge, we design tensor data-structures, tensor algebra interfaces, and new tensor contraction algorithms and implementations to achieve 90+% of a theoretically derived peak on GPUs. On a K40c GPU for contractions resulting in GEMMs on square matrices of size 8 for example, we are 2.8× faster than CUBLAS, and 8.5× faster than MKL on 16 cores of Intel Xeon ES-2670 (Sandy Bridge) 2.60GHz CPUs. Finally, we apply autotuning and code generation techniques to simplify tuning and provide an architecture-aware, user-friendly interface.	Ahmad Abdelfattah, Marc Baboulin, Veselin Dobrev, Jack Dongarra, Christopher Earl, Joel Falcou, Azzam Haidar, Ian Karlin, Tzanio Kolev, Ian Masliah, Stanimire Tomov
52	Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs [abstract] Abstract: Solving a large number of relatively small linear systems has recently drawn more attention in the HPC community, due to the importance of such computational workloads in many scientific applications, including sparse multifrontal solvers. Modern hardware accelerators and their architecture require a set of optimization techniques that are very different from the ones used in solving one relatively large matrix. In order to impose concurrency on such throughput-oriented architectures, a common practice is to batch the solution of these matrices as one task offloaded to the underlying hardware, rather than solving them individually. This paper presents a high performance batched Cholesky factorization on large sets of relatively small matrices using Graphics Processing Units (GPUs), and addresses both fixed and variable size batched problems. We investigate various algorithm designs and optimization techniques, and show that it is essential to combine kernel design with performance tuning in order to achieve the best possible performance. We compare our approaches against state-of-the-art CPU solutions as well as GPU-based solutions using existing libraries, and show that, on a K40c GPU for example, our kernels are more than 2x faster.	Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra
143	Performing Unstructured Grid Flux Calculations on GPUs [abstract] Abstract: The Finite Volume Method (FVM) is a numerical approach for the approximate solution of Partial Differential Equations (PDE) on discretized volumetric fields. Accurate solutions of PDEs derived from continuum mechanics, especially of complex fields, require structured or unstructured meshes with an ever increasing number of computational volumes. Computing solutions, particularly solutions to time dependent equations, with the Finite Volume Method can take 1000s of computer cores of a supercomputer months to complete. With increased computational and memory throughput, Graphics Processing Units (GPU) have the potential to improve on current im-plementations, providing a decrease in time to solu-tion of FVMs. Through the use of a model equation, we show that GPUs can improve the performance of an open source computational continuum mechanics toolbox, OpenFOAM. It is shown herein that through the use of an NVIDIA Tesla K20 achieves 3-10 times greater performance than using all 10 cores of an Intel Xeon E5-2670 v2.	Matthew Conley, Christian Sarofeen, Hua Shan and Justin Williams
255	Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU [abstract] Abstract: Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes further performance degradation. We propose a new sparse matrix format called the Adaptive Multi-level Blocking (AMB) format, which aggressively reduces the memory traffic in SpMV computation to improve performance. By several optimization techniques such as division and blocking of the given matrix, the column indices are compressed and the reusability of input vector element in the cache is highly improved. An auto-tuning mechanism determines the best set of parameters for each matrix data by estimating the memory traffic and predicting the performance of a given SpMV computation. For 32 matrix datasets taken from the Sparse Matrix Collection collected by the University of Florida, AMB format achieves speedups of up to x2.92 compared to NVIDIA's cuSparse library and up to x1.40 compared to yaSpMV, which was recently proposed and has been the best known library to date for fast SpMV computation.	Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

ICCS 2016 Main Track (MT) Session 4

Time and Date: 10:15 - 11:55 on 7th June 2016

Room: KonTiki Ballroom

Chair: Alfredo Tirado-Ramos

287	Embedded real-time stereo estimation via Semi-Global Matching on the GPU [abstract] Abstract: Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results on the new embedded energy- efficient GPU devices. Our design runs on a Tegra X1 at 42 frames per second (fps) for an image size of 640×480, 128 disparity levels, and using 4 path directions for the SGM method.	Daniel Hernández Juárez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure, Antonio M. López
314	Multivariate Polynomial Multiplication on GPU [abstract] Abstract: Multivariate polynomial multiplication is a fundamental operation which is used in many scientific domains, for example in the optics code for particle accelerator design at CERN. We present a novel and efficient multivariate polynomial multiplication algorithm for GPUs using floating-point double precision coefficients implemented using the CUDA parallel programming platform. We obtain very good speedups over another multivariate polynomial multiplication library for GPUs (up to 548x), and over the implementation of our algorithm for multi-core machines using OpenMP (up to 7.46x).	Diana Andreea Popescu, Rogelio Tomas Garcia
329	CUDA Optimization of Non-Local Means Extended to Wrapped Gaussian Distributions for Interferometric Phase Denoising [abstract] Abstract: Interferometric Synthetic Aperture Radar (InSAR) captures hundreds of millions of phase measurements with a single image, which can be differenced with a subsequent matching image to measure the Earth’s physical properties such as atmosphere, topography, and ground instability. Each pixel in an InSAR image lies somewhere between perfect information and complete noise; deriving useful measurements from InSAR is therefore predicated upon estimating the quality (coherence) of each pixel, while also enhancing the information-bearing pixels through filtering. Rejecting noisy pixels at the outset and filtering the available information without introducing artifacts is crucial for generating accurate and spatially dense measurements. A capable filtering strategy must accommodate the diversity of manmade and natural ground cover exhibiting noise spawned by vegetation and water interwoven with useable signals echoed by infrastructure, rocks, and bare ground. Traditional filtering strategies assuming spatial homogeneity have lately been replaced by filters that honor discontinuities in ground cover, but two key improvements are needed: a) techniques must be adapted to enhance phase rather than amplitude, and b) runtime needs to be reduced to support deployment for operational land-information products. We present a new algorithm for wrapped phase filtering based on the nonlocal means algorithm (NLM) of Baudes et al. (2005) and the non-local InSAR (NL-InSAR) algorithm of Deledalle et al. (2011). The new filter, wrapped-NLM (WNLM), extends NLM to wrapped phase data that is inherently lossy due to an unknown integer number of phase ambiguities per pixel. The filter is similar to that of NL-InSAR in that we adopt their procedure of iteratively improving the filtered phase estimates by updating the Bayesian prior based on the previously filtered data (2009). Our filter differs from NL-INSAR in that it does not assume the Goodman model (1963) nor that of speckle noise (Goodman J. W., 2007) which were found to suffer in some areas due to having too many degrees of freedom; instead we use a more general assumption that the phase noise distribution is additive wrapped Gaussian, making the filter more robust to a larger variety of input data. This also simplifies the algorithm making it possible to implement an efficient parallel algorithm on the GPU using CUDA.	Aaron Zimmer, Parwant Ghuman
449	A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs [abstract] Abstract: This paper presents unique modeling algorithms of performance prediction for sparse matrix-vector multiplication on GPUs. Based on the algorithms, we develop a framework that is able to predict SpMV kernel performance and to analyze the reported prediction results. We make the following contributions: (1) We provide theoretical basis for the generation of benchmark matrices according to the hardware features of a given specic GPU. (2) Given a sparse matrix, we propose a quantitative method to collect some features representing its matrix settings. (3) We propose four performance modeling algorithms to accurately predict kernel performance for SpMV computing using CSR, ELL, COO, and HYB SpMV kernels. We evaluate the accuracy of our framework with 8 widely-used sparse matrices (totally 32 test cases) on NVIDIA Tesla K80 GPU. In our experiments, the average performance differences between the predicted and measured SpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 5.1%, 5.3%, 1.7%, and 6.1%, respectively.	Ping Guo, Chung-Wei Lee
71	A Multi-GPU Fast Iterative Method for Eikonal Equations using On-the-fly Adaptive Domain Decomposition [abstract] Abstract: The recent research trend of Eikonal solver focuses on employing state-of-the-art parallel computing technology, such as GPUs. Even though there exists previous work on GPU-based parallel Eikonal solvers, only little research literature exists on the multi-GPU Eikonal solver due to its complication in data and work management. In this paper, we propose a novel on-the-fly, adaptive domain decomposition method for efficient implementation of the Block-based Fast Iterative Method on a multi-GPU system. The proposed method is based on dynamic domain decomposition so that the region to be processed by each GPU is determined on-the-fly when the solver is running. In addition, we propose an efficient domain assignment algorithm that minimizes communication overhead while maximizing load balancing between GPUs. The proposed method scales well, up to 6.17x for eight GPUs, and can handle large computing problems that do not fit to limited GPU memory. We assess the parallel efficiency and runtime performance of the proposed method on various distance computation examples using up to eight GPUs.	Sumin Hong, Won-Ki Jeong

ICCS 2016 Main Track (MT) Session 5

Time and Date: 14:10 - 15:50 on 7th June 2016

Room: KonTiki Ballroom

Chair: Sreya Gosh

146	Optimization of Parallel Legendre Transform using Graphics Processing Unit (GPU) for a Geodynamo Code [abstract] Abstract: Convection and magnetic field of the Earth's outer core are expected to have vast length scales. To resolve these flows at extremely high spatial resolution, high performance computing is required for geodynamo simulations. Calypso has been shown to scale well on computer clusters capable of computing at the order of 10⁵ cores using Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) parallelization for CPUs. Calypso is designed to model magnetohydrodynamics (MHD) of a Boussinesq fluid in a rotating spherical shell, such as the outer core of Earth. Calypso employs a pseudo-spectral method that requires both a forward and backward spherical harmonic transform (SHT) in each iteration of the time integration loop. The time complexity of SHT is dominated by the time complexity of Legendre transform which is O(N^3). We investigate time efficiency of three different algorithms of the SHT using graphics processing units (GPUs). One is to preemptively compute the Legendre polynomials on the CPU before executing Legendre transform on the GPU. In the second approach, the Legendre polynomials are computed “on-the-fly” on a need-by-basis. In the third approach, we employ sum reduction intrinsics offered by the library cuda Unbound. We examine the trade-offs between space and time on Maverick, a Texas Advanced Computing Center (TACC) supercomputer using a GPU enabled Legendre transform. The first algorithm leads to a memory bottleneck that can be alleviated by utilizing the memory hierarchy of Nvidia GPUs. Moreover, coalesced reads and prefetching can further reduce the memory bottleneck. The second algorithm is highly computationally intensive relative to memory bandwidth utilization as a result of repetitive computations. The third algorithm is well-balanced between computational and memory utilization. We conclude that the most-efficient approach is a hybrid model that involves a look-up table as well as on-the-fly computations over the GPU.	Harsha Lokavarapu and Hiroaki Matsui
49	A Case Study in Adjoint Sensitivity Analysis of Parameter Calibration Problems [abstract] Abstract: Adjoint sensitivity computation of parameter estimation problems is a widely used technique in the field of computational science and engineering for retrieving derivatives of a cost func- tional with respect to parameters efficiently. Those derivatives can be used, e.g., for sensitivity analysis, optimization, or robustness analysis. Deriving and implementing adjoint code is an error-prone, non-trivial task which can be avoided by using Algorithmic Differentiation (AD) software. Generating adjoint code by AD software has the downside of usually requiring a huge amount of memory as well as a non-optimal run time. In this article, we couple two approaches for achieving both, a robust and efficient adjoint code: symbolically derived adjoint formula- tions are coupled with AD. Comparisons are carried out for a real-world case study originating from the remote atmospheric sensing simulation software JURASSIC developed at the Institute of Energy and Climate Research – Stratosphere, Research Center Jülich. We show, that the coupled approach outperforms the fully algorithmic approach by AD in terms of run time and memory requirement and argue that this can be achieved while still preserving the desireable feature of AD being automatic.	Johannes Lotz, Marc Schwalbach, Uwe Naumann
80	Tuning the Computation of the Interpolation Operator in an Algebraic Multigrid Solver [abstract] Abstract: In this paper, we discuss strategies for computing subsets of eigenvectors of matrices corresponding to subdomains of finite element meshes achieving compromise between two contradicting goals. The subset of eigenvectors is required in the construction of coarse spaces used in algebraic multigrid methods (AMG) as well as in certain domain decomposition (DD) methods. The quality of the coarse spaces depends on the number of eigenvectors, which improves the approximation properties of the coarse space and impacts the overall performance and convergence of the associated AMG or DD algorithms. However, a large number of eigenvectors reflects negatively the sparsity of the corresponding coarse matrices, which can become fairly dense. The sparsity of the coarse matrices can be controlled to a certain extent by the size of the subdomains (union of finite elements) referred to as agglomerates. If the size of the agglomerates is too large, then the cost of the eigensolvers increases and eventually can become unacceptable for the purpose of constructing the AMG or DD solvers. This paper investigates strategies to optimize the solution of the partial eigenproblems of interest. In particular, we examine direct and iterative eigensolvers for computing those subsets. Our experiments with synthetic meshes and with a well-known model of an oil-reservoir simulation benchmark indicate that iterative eigensolvers can lead to significant improvements in the overall performance of an AMG solver that exploits such spectral construction of coarse spaces.	Osni Marques, Alex Druinsky, Xiaoye Li, Andrew Barker, Delyan Kalchev, Panayot Vassilevski
413	Induced Dimension Reduction method for solving linear matrix equations [abstract] Abstract: This paper discusses the solution of large-scale linear matrix equations using the Induced Dimension reduction method (IDR(s)). IDR(s) was originally presented to solve system of linear equations, and is based on the IDR(s) theorem. We generalize the IDR(s) theorem to solve linear problems in any finite-dimensional space. This generalization allows us to develop IDR(s) algorithms to approximate the solution of linear matrix equations. The IDR(s) method presented here has two main advantages; firstly, it does not require the computation of inverses of any matrix, and secondly, it allows incorporation of preconditioners. Additionally, we present a simple preconditioner to solve the Sylvester equation based on a fixed point iteration. Several numerical examples illustrate the performance of IDR($s$) for solving linear matrix equations. We also present the software implementation.	Reinaldo Astudillo, Martin van Gijzen
134	A Cylindrical Basis Function for Solving Partial Differential Equations on Manifolds [abstract] Abstract: Numerical solutions of partial differential equations (PDEs) on manifolds continues to generate a lot of interest among scientists in the natural and applied sciences. On the other hand, recent developments of 3D scanning and computer vision technologies have produced a large number of 3D surface models represented as point clouds. Herein, we develop a simple and efficient method for solving PDEs on closed surfaces represented as point clouds. By projecting the radial vector of standard radial basis function(RBF) kernels onto the local tangent plane, we are able to produce a representation of functions that permits the replacement of surface differential operators with their Cartesian equivalent. We demonstrate, numerically, the efficiency of the method in discretizing the Laplace Beltrami operator.	Emmanuel Asante-Asamani, Lei Wang, Zeyun Yu

ICCS 2016 Main Track (MT) Session 6

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: KonTiki Ballroom

Chair: Jianwu Wang

295	Finite Element Model for Brittle Fracture and Fragmentation [abstract] Abstract: A new computational model for brittle fracture and fragmentation has been developed based on finite element analysis of non-linear elasticity equations. The proposed model propagates the cracks by splitting the mesh nodes alongside the most over-strained edges based on the principle direction of strain tensor. To prevent elements from overlapping and folding under large deformations, robust geometrical constraints using the method of Lagrange multipliers have been incorporated. The model has been applied to 2D simulations of the formation and propagation of cracks in brittle materials, and the fracture and fragmentation of stretched and compressed materials.	Wei Li, Tristan Delaney, Xiangmin Jiao, Roman Samulyak, Cao Lu
350	Aggressive Pruning Strategy for Time Series Retrieval Using a Multi-Resolution Representation Based on Vector Quantization Coupled with Discrete Wavelet Transform [abstract] Abstract: Time series representation methods are widely used to handle time series data by projecting them onto low-dimensional spaces where queries are processed. Multi-resolution representation methods speed up the similarity search process by using pre-computed distances which are calculated and stored at the indexing stage and then used at the query stage together with filters in the form of exclusion conditions. In this paper we present a new multi-resolution representation method that combines the Haar wavelet- based multiresolution method with vector quantization to maximize the pruning power of the similarity search algorithm. The new method is validated through extensive experiments on different datasets from several time series repositories. The results obtained prove the efficiency of the new method.	Muhammad Marwan Muhammad Fuad
392	Integration of Randomized Sketches for Singular Value Decomposition and Principal Component Analysis [abstract] Abstract: Low-rank singular value decomposition (SVD) and principal component analysis (PCA) of large-scale matrices is one key tool in modern data analytics and scientific computing. Rapid growing matrix size further increases the needs and poses the challenges for developing efficient large-scale SVD algorithms. Random sketching is a promising method to reduce the problem size before computing an approximate SVD. We generalize the one-time sketching to multiple random sketches and develop algorithms to integrate these random sketches containing various subspace information in different randomizations. Such integration procedure can lead to SVD with higher accuracy and the multiple randomizations can be conducted on parallel computers simultaneously. We also reveal the insights and analyze the performance of the proposed algorithms from statistical and geometric viewpoints. Numerical results are presented and discussed to demonstrate the efficiency of the proposed algorithms. This is a joint work with Ting-Li Chen and Su-Yun Huang at the Institute of Statistical Science, Academia Sinica, David Chang, Hung Chen, and Chen-Yao Lin at Institute of Applied Mathematical Sciences, National Taiwan University.	Weichung Wang

ICCS 2016 Main Track (MT) Session 7

Time and Date: 10:15 - 11:55 on 8th June 2016

Room: KonTiki Ballroom

Chair: Vassil Alexandrov

162	On Parallel Induction of Nondeterministic Finite Automata [abstract] Abstract: The paper discusses the problem of the induction of a minimal nondeterministic finite automaton (NFA) consistent with a given set of examples and counterexamples. The main contribution of the paper is the proposal of an efficient parallel algorithm transforming the NFA induction problem into a family of constraint satisfaction problems (CSP). Two original techniques for fast CSPs evaluation are proposed and discussed. The efficacy of the parallel algorithm is evaluated experimentally on selected benchmarks. The proposed algorithm solves all analyzed benchmark examples, so called Tomita languages, in the time below half a minute, which should be considered an important achievement, as compared to our previous efforts which required minutes or hours to solve some of the aforementioned benchmarks.	Tomasz Jastrzab
168	On Optimizing Aluminum Extrusion Yield [abstract] Abstract: The purpose of this research is to develop a decision making tool to select aluminum billet size as well as to plan the extrusion operation to meet the required profile length with minimum waste. The decision making scheme developed includes practical constraints and considerations in the extrusion operation which are not typically considered by the optimization community. To address the variations in aluminum density, extrusion experiments to investigate the behavior of Al 6063 are performed on various cross-sectional areas (products) with different process parameters as well as desired lengths. The range of aluminum density is determined. The range of weight per lengths of the new aluminum profile is predicted from the curve fitting scheme. Maximum density is used in the optimization tool to ensure that the extruded profile is always sufficient for the number of workpieces determined. The billet size can be reduced incrementally if the extruded profile is longer than expected. Even with modest problem sizes, solving the ILP formulation is time-consuming. The solutions for non-linear integer programming for the case of multiple billets per extrusion are not obtainable. Otherwise the optimal solutions from the ILP are compared with the solutions from heuristic algorithms developed. For online billet cutting case, it is found that the solutions from ILP usually call for the use of more billet sizes than those obtained from the heuristic algorithm. However, the solutions from ILP lead to better material utilization than those from the algorithm by 1-4%. The tradeoff between cutting and extruding more billets and 4% difference in waste should be examined by the end user. In the standard billet size case, the heuristic algorithms perform well as compared to those obtained from solving the ILP. The solutions from both sources are almost identical. Solutions obtained from both ILP and heuristic methods are equal or better than the manual calculation employed by the company. The actual implementation of the program developed is now underway. It is expected that the solution methodology developed in this research can improve material utilization up to 10% over the current approach used (manual calculation) in some cases.	Jaramporn Hassamontr and Theera Leebhaicharoen
172	Acceleration and Parallelization of ZENO/Walk-on-Spheres [abstract] Abstract: This paper describes our on-going work to accelerate ZENO, a software tool based on Monte Carlo methods (MCMs), used for computing material properties at nanoscale. ZENO employs three main algorithms: (1) Walk on Spheres (WoS), (2) interior sampling, and (3) surface sampling. We have accelerated the first two algorithms. For the sake of brevity, the paper will discuss our work on the first one only as it is the most commonly used and the acceleration techniques were similar in both cases. WoS is a Brownian motion MCM for solving a class of partial differential equations (PDEs). It provides a stochastic solution to a PDE by estimating the probability that a random walk, which started at infinity, will hit the surface of the material under consideration. WoS is highly effective when the problem's geometry is additive, as this greatly reduces the number of walk steps needed to achieve accurate results. The walks start on the surface of an enclosing sphere and can make much larger jumps than in a direct simulation of Brownian motion. Our current implementation represents the molecular structure of nanomaterials as a union of possibly overlapping spheres. The core processing bottleneck in WoS is a Computational Geometry one, as the algorithm repeatedly determines the distance from query point to the material surface in each step of the random walk. In this paper, we present results from benchmarking spatial data structures, including several open-source implementations of k-D trees, for accelerating WoS algorithmically. The paper also presents results from our multicore and cluster parallel implementation to show that it exhibits linear strong scaling with the number of cores and compute nodes; this implementation delivers up to 4 orders of magnitude speedup compared to the original FORTRAN code when run on 8 nodes (each with dual 6-core Intel Xeon CPUs) with 24 threads per node.	Derek Juba, Walid Keyrouz, Michael Mascagni, Mary Brady
358	Permutation-based Recombination Operator to Node-Depth Encoding [abstract] Abstract: The node-depth encoding is a representation for evolutionary algorithms applied to tree prob-lems. Its represents trees by storing the nodes and their depth in a proper ordered list. Theoriginal formulation of the node-depth encoding has only mutation operators as the searchmechanism. Although the representation has this restriction, it has obtained good results withlow convergence. Then, this work proposes a specic recombination operator to improve theconvergence of the node-depth encoding representation. These operators are based on recom-bination for permutation representations. An investigation into the bias and heritability of theproposed recombination operator shows that it has a bias towards stars and low heritability.The performance of node-depth encoding with the proposed operator is investigated for theoptimal communication spanning tree problem. The results are presented for benchmark in-stances in the literature. The use of the recombination operator results in a faster convergencethan with only mutation operators.	Telma de Lima, Alexandre Delbem, Roney Lopes Lima, Gustavo Post Sabin, Marcos Antônio Almeida de Oliveira
403	Assessing metaheuristics by means of random benchmarks [abstract] Abstract: Typically, the performance of swarm and evolutionary methods is assessed by comparing their results when applied to some known finite benchmarks. In general, these metaheuristics depend on many parameter values and many possible exchangeable sub-steps, which yields a huge number of possible algorithm configurations. In this paper we argue that this high setup versatility lets developers expressively tune the method, in an ad-hoc way, to the target inputs to be solved, and hence to those in the benchmark under consideration. However, this does not imply properly solving any other input not considered in the benchmark. Several subtle ways to support that tuning (which can be consciously noticed by the developer, but can also be unconscious) are presented and discussed in the paper. Besides, as a posible alternative to using known finite benchmarks, we discuss the pros and cons of using random input generators, and we illustrate how to use such generators in a specific problem, MAX-3SAT. A general protocol to support the fair development of comparisons of metaheuristics based on random input generators is presented.	Pablo Rabanal, Ismael Rodriguez, Fernando Rubio

ICCS 2016 Main Track (MT) Session 8

Time and Date: 10:35 - 12:15 on 6th June 2016

Room: Toucan

Chair: Jack Dongarra

12	Identifying the Sport Activity of GPS Tracks [abstract] Abstract: The wide propagation of devices, such as mobile phones, that include a global positioning system (GPS) sensor has popularised the storing of geographic information for different kind of activities, many of them recreational, such as sport. Extracting and learning knowledge from GPS data can provide useful geographic information that can be used for the design of novel applications. In this paper we address the problem of identifying the sport from a GPS track that is recorded during a sport session. For that purpose, we store 8500 GPS tracks from ten different kind of sports. We extract twelve features that are able to represent the activity that was recorded in a GPS track. From these features several models are induced by diverse machine learning classification techniques. We study the problem from two different perspectives: flat classification, i.e, models classify the track in one of the ten possible sport types; and hierarchical classification, i.e. given the high number of classes and the structure of the problem, we induce a hierarchy in the classes and we address the problem as a hierarchical classification problem. For this second framework, we analyse three different approaches. According to our results, multiclassifier systems based on decision trees obtain the better performance in both scenarios.	Cesar Ferri Ramírez
13	Wind-sensitive interpolation of urban air pollution forecasts [abstract] Abstract: People living in urban areas are exposed to outdoor air pollution. Air contamination is linked to numerous premature and pre-native deaths each year. Urban air pollution is estimated to cost approximately 2% of GDP in developed countries and 5% in developing countries. Some works reckon that vehicle emissions produce over 90% of air pollution in cities in these countries. This paper presents some results in predicting and interpolating real-time urban air pollution forecasts for the city of Valencia in Spain. Although many cities provide air quality data, in many cases, this information is presented with significant delays (three hours for the city of Valencia) and it is limited to the area where the measurement stations are located. We compare several regression models able to predict the levels of four different pollutants (NO, NO2, SO2, O3) in six different locations of the city. Wind strength and direction is a key feature in the propagation of pollutants around the city, in this sense we study different techniques to incorporate this factor in the regression models. Finally, we also analyse how to interpolate forecasts all around the city. Here, we propose an interpolation method that takes wind direction into account. We compare this proposal with respect to well-known interpolation methods. By using these contamination estimates, we are able to generate a real-time pollution map of the city of Valencia.	Lidia Contreras-Ochando, Cesar Ferri
66	Optimal Customer Targeting for Sustainable Demand Response in Smart Grids [abstract] Abstract: Demand Response (DR) is a widely used technique to minimize the peak to average consumption ratio during high demand periods. We consider the DR problem of achieving a given curtailment target for a set of consumers equipped with a set of discrete curtailment strategies over a given duration. An effective DR scheduling algorithm should minimize the curtailment error - the difference between the targeted and achieved curtailment values - to minimize costs to the utility provider and maintain system reliability. The availability of smart meters with fine-grained customer control capability can be leveraged to offer customers a dynamic range of curtailment strategies that are feasible for small durations within the overall DR event. Both the availability and achievable curtailment values of these strategies can vary dynamically through the DR event and thus the problem of achieving a target curtailment over the entire DR interval can be modeled as a dynamic strategy selection problem over multiple discrete sub-intervals. We argue that DR curtailment error minimizing algorithms should not be oblivious to customer curtailment behavior during sub-intervals as (expensive) demand peaks can be concentrated in a few sub-intervals while consumption is heavily curtailed during others in order to achieve the given target, which makes such solutions expensive for the utility. Thus in this paper, we formally develop the notion of Sustainable DR (SDR) as a solution that attempts to distribute the curtailment evenly across sub-intervals in the DR event. We formulate the SDR problem as an Integer Linear Program and provide a very fast $\sqrt{2}$-factor approximation algorithm. We then propose a Polynomial Time Approximation Scheme (PTAS) for approximating the SDR curtailment error to within an arbitrarily small factor of the optimal. We then develop a novel ILP formulation that solves the SDR problem while explicitly accounting for customer strategy switching overhead as a constraint. We perform experiments using real data acquired from the University of Southern California’s smart grid and show that our sustainable DR model achieves results with a very low absolute error of 0.001-0.05 kWh range.	Sanmukh R. Kuppannagari, Rajgopal Kannan, Charalampos Chelmis, Arash S Tehrani, Viktor K Prasanna
366	Influence of Charging Behaviour given Charging Station Placement at Existing Petrol Stations and Residential Car Park Locations in Singapore [abstract] Abstract: Electric Vehicles (EVs) are set to play a crucial role in making transportation systems more sustainable. However, charging infrastructure needs to be built up before EV adoption can increase. A crucial factor that is ignored in most existing studies of optimal charging station (CS) deployment is the role played by the charging behaviour of drivers. In this study, through an agent-based traffic simulation, we analyse the impact of different driver charging behaviour under the assumption that CSs are placed at existing petrol stations and residential car park locations in Singapore. Three models are implemented: a simple model with a charging threshold and two more sophisticated models where the driver takes the current trip distance and existing CS locations into account. We analyse the performance of these three charging behaviours with respect to a number of different measures. Results suggest that charging behaviours do indeed have a significant impact on the simulation outcome. We also discover that the sensitivity of model parameters in each charging behaviour is an important factor to consider as variations in model parameter can lead to significant different results.	Ran Bi, Jiajian Xiao, Vaisagh Viswanathan, Alois Knoll
222	Crack Detection in Earth Dam and Levee Passive Seismic Data Using Support Vector Machines [abstract] Abstract: We investigate techniques for earth dam and levee health monitoring and automatic detection of anomalous events in passive seismic data. We have developed a novel data-driven workflow that uses machine learning and geophysical data collected from sensors located on the surface of the levee to identify internal erosion events. In this paper, we describe our research experiments with binary and one-class Support Vector Machines (SVMs). We used experimental data from a laboratory earth embankment (80% normal and 20% anomalies) and extracted nine spectral features from decomposed segments of the time series data. The two-class SVM with 10-fold cross validation achieved over 97% accuracy. Experiments with the one-class SVM use the top two features selected by the ReliefF algorithm and our results show that we can successfully separate normal from anomalous data observations with over 83% accuracy.	Wendy Fisher, Tracy Camp, Valeria Krzhizhanovskaya

ICCS 2016 Main Track (MT) Session 9

Time and Date: 14:30 - 16:10 on 6th June 2016

Room: Toucan

Chair: Ana Cortes

436	Identifying Venues for Female Commercial Sex Work Using Spatial Analysis of Geocoded Advertisements [abstract] Abstract: Despite being widely visible on the web, Internet-promoted commercial sex work has so far attracted limited attention from the side of researchers. Current studies outline the issues that new forms of sex work are associated with, however, very little is known to date about their spatial manifestation. In this research we follow the environmental perspective in spatial analysis of crime and deviance with the assumption that the location of venues for provision of commercial sex work can be modeled via the algorithms trained on the distribution of possible correlates in the proximity to the existing venues. Visualization of the acquired results is presented herein along with the errors and score metrics for evaluation of the applicability of specific methods of machine learning. The paper is concluded with the estimation of potential extensions and peculiarities of data used in the research.	Daniil Voloshin, Ivan Derevitskiy, Ksenia Mukhina, Vladislav Karbovskii
21	RTPMF: Leveraging User and Message Embeddings for Retweeting Behavior Prediction [abstract] Abstract: Understanding retweeting mechanism and predicting retweeting behavior is an important and valuable task in user behavior analysis. In this paper, aiming at providing a general method for improving retweeting behavior prediction performance, we propose a probabilistic matrix factorization model (RTPMF) incorporating user social network information and message semantic relationship. The contributions of this paper are three-fold: (1) We convert predicting user retweeting behavior problem to solve a probabilistic matrix factorization problem; (2) Following the intuition that user social network relationship will affect the retweeting behavior, we extensively study how to model social information to improve the prediction performance; and (3) We also incorporate message semantic embedding to constrain the objective function by making a full use of additional the messages' content-based and structure-based features. The empirical results and analysis demonstrate that our method significantly outperform the state-of-the-art approaches.	Jiguang Liang, Bo Jiang, Rongchao Yin, Chonghua Wang, Jianlong Tan, Shuo Bai
75	Leveraging Latent Sentiment Constraint in Probabilistic Matrix Factorization for Cross-domain Sentiment Classification [abstract] Abstract: Sentiment analysis is concerned with classifying a subjective text into positive or negative according to the opinion expressed in it. The performance of traditional sentiment classification algorithms rely heavily on manually labeled training data. However, not every domain has the labeled data because the labeling work is time-consuming and expensive. In this paper, we propose a latent sentiment factorization (LSF) algorithm based on probabilistic matrix factorization technique for cross-domain sentiment classification. LSF works in the setting where there are only labeled data in the source domain and unlabeled data in the target domain. It bridges the gap between domains by exploiting the sentiment correlations between domain-shared and domain-specific words in a two-dimensional sentiment space. Experimental results demonstrate the superiority of our method over the state-of-the-art approaches.	Jiguang Liang, Kai Zhang, Xiaofei Zhou, Yue Hu, Jianlong Tan, Shuo Bai
91	Identifying Users across Different Sites using Usernames [abstract] Abstract: Identifying users across different sites is to find the accounts that belong to the same individual. The problem is fundamental and important, and its results can benefit many applications such as social recommendation. Observing that 1) usernames are essential elements for all sites; 2) most users have limited number of usernames on the Internet; 3) usernames carries information that reflect an individual’s characteristics and habits etc., this paper tries to identify users based on username similarity. Specifically, we introduce the self-information vector model to integrate our proposed content and pattern features extracted from usernames into vectors. In this paper, we define two usernames’ similarity as the cosine similarity between their self-information vectors. We further propose an abbreviation detection method to discover the initialism phenomenon in usernames, which can improve our user identification results. Experimental results on real-world username sets show that we can achieve 86.19% precision rate, 68.53% recall rate and 76.21% F1-measure in average, which is better than the state-of-the-art work.	Yubin Wang, Tingwen Liu, Qingfeng Tan, Jinqiao Shi, Li Guo
441	A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature [abstract] Abstract: A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a particularly challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.	Roselyne Tchoua, Kyle Chard, Debra Audus, Jian Qin, Juan de Pablo, Ian Foster

ICCS 2016 Main Track (MT) Session 10

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Toucan

Chair: Ian Foster

453	An Exploratory Sentiment and Facial Expressions Analysis of Data from Photo-sharing Social Media: The Case of Football Violence [abstract] Abstract: In this article we propose the possibility to increase the level of security during football matches due to analysis of data that are placed on the social networks of these events visitors. We considered different ways to recognize emotions from photographs and evaluate the tone of texts to trace the changes in the level of emotions in the photos depending on, took these pictures during the game with fights in the stands or during normal games. We tested this assumption and our hypothesis is partially confirmed. The software solution for emotion recognition from Microsoft Oxford showed that the level of emotion anger is almost 5 times higher in the photographs taken during the match with fights. In addition, other curious results were obtained, including after an analysis of the key of the comments left by events visitors’ photos.	Vasiliy Boychuk, Kirill Sukharev, Daniil Voloshin, Vladislav Karbovskii
86	Hybrid Computational Steering for Dynamic Data-Driven Application Systems [abstract] Abstract: We consider steering of Dynamic Data-Drive Application Systems from two sources, firstly from dynamic data and secondly via human intervention to change parameters of the system. We propose an architecture for such hybrid steering and identify a Time Manager as an essential component. We perform experiments on an actual realisation of such a system, modelling a wa- ter distribution network, to show how the parameters of the Time Manager can be determined.	Junyi Han, John Brooke
298	Error Function Impact in Dynamic Data-Driven Framework Applied to Forest Fire Spread Prediction [abstract] Abstract: In order to use environmental models effectively for management and decision-making, it is vital establish an appropriate level of confidence in their performance. There are different ways and different methodologies to establish the confidence of the models. For this reason an adequate error formula is a very important thing, because the results of the model can vary substantially. In this paper, we focus on the forest fire spread prediction. Several models have been developed to determine the forest fire propagation. Simulators implementing such models require diverse input parameters to deliver predictions about fire propagation. However, the data describing the actual scenario where the fire is taking place are usually subject to high levels of uncertainty. In order to minimize the impact of the input-data uncertainty a two-stage methodology was developed to calibrate the input parameters in an adjustment stage so that the calibrated parameters are used in the prediction stage to improve the quality of the predictions. Is in the adjustment stage where the error formula plays a crucial role, because different formulas implies different adjustments and, in consequence, different spread predictions. In this paper, different formulas are compared to show the impact in terms of prediction quality in DDDAS for forest fire spread prediction. These formulas have been tested using a real forest fire that took place in Arkadia (Greece) in 2011.	Carlos Carrillo, Tomàs Artés, Ana Cortes, Tomàs Margalef
456	Data-driven Forecasting Paradigms for Wildland Fires using the CAWFE® modeling system and Fire Detection Data [abstract] Abstract: Large wildfires can cover hundreds of thousands of acres and continue for months, varying in intensity as they encounter different environmental conditions, which may vary dramatically in time and space during a single fire. They can produce extreme behaviors such as fire whirls, blow-ups, bursts of flame along the surface, and winds ten times stronger than ambient conditions, all of which result from the interactions between a fire and its atmospheric environment and are beyond the capabilities of current operational tools. Coupled weather-wildland fire models tie numerical weather prediction models to wildland fire behavior modules to simulate the impact of a fire on the atmosphere and the subsequent feedback of these fire-induced winds on fire behavior, i.e. how a fire “creates it’s own weather”. The methodology uses one such coupled model, the Coupled Atmosphere-Wildland Fire Environment (CAWFETM) Model, which contains two-way coupling between two components: (1) a numerical weather prediction model formulated for and with numerical methods optimized for simulating airflow at 100s of m in very complex terrain, and (2) a wildland fire component that is based upon semi-empirical relationships for surface fire rate of spread, post-frontal heat release, and a canopy fire model. The fire behavior is coupled to the atmospheric model such that low level winds drive the spread of the surface fire, which in turn release sensible heat, latent heat, and smoke fluxes into the lower atmosphere, in turn feeding back to affect the winds directing the fire. CAWFE been used to explain basic examples of fire behavior and, in retrospective simulations, to reproduce large wildland fire events. Over a wide range of conditions, model results show rough agreement in area, shape, and direction of spread at periods for which fire location data is available; additional events unique to each fire such as locations of sudden acceleration, flank runs up canyons, and bifurcations of a fire into two heads; and locations favorable to formation of phenomena such as fire whirls and horizontal roll vortices. The duration of such events poses a prediction challenge, as meteorological models lose skill over time after initialization, firefighting may impact the fire, and processes such as spotting, in which burning embers are lofted ahead of the fire, are not readily represented with deterministic models. Moreover, validation data for such models is limited and fire mapping and monitoring has been done piecemeal with infrared imaging sensors producing 12-hourly maps of active fires with nominal 1 km pixels, complemented by sub-hourly observations from geostationary satellites at coarser resolution and other valuable but non-routine tools such as airborne infrared mapping. Thus, in recent work, CAWFE has been integrated with with spatially refined (375 m) satellite active fire data derived from the Visible Infrared Imaging Radiometer Suite (VIIRS), which is used for initialization of a wildfire already in progress in the model and evaluation of its simulated progression at the time of the next pass. This work develops and applies Dynamic Data System techniques to create innovative approaches to wildfire growth forecasting based on a more symbiotic data-model system.	Janice L. Coen and Wilfrid Schroeder
151	D-STHARk: Evaluating Dynamic Scheduling of Tasks in Hybrid Simulated Architectures [abstract] Abstract: The emergence of applications that demand to handle efficiently growing amounts of data has stimulated the development of new computing architectures with several Processing Units (PUs), such as CPUs core, graphics processing units (GPUs) and Intel Xeon Phi (MIC). Aiming to better exploit these architectures, recent works focus on proposing novel runtime environments that offer a variety of methods for scheduling tasks dynamically on different PUs. A main limitation of such proposals refers to the constrained system configurations, usually adopted to tune and test the proposals, since setting more complete and diversified evaluation environments is costly. In this context, we present D-STHARk, a GUI tool for evaluating Dynamic Scheduling of Tasks in Hybrid Simulated ARchitectures. D-STHARk provides a complete simulated execution environment that allows evaluating dynamic scheduling strategies on simulated applications and hybrid architectures. We evaluate our tool by simulating the dynamic scheduling strategies presented in~\cite{sbac2014}, using the same architecture and application. {\it D-STHARk} was able to achieve the same conclusions originally reported by the authors. Moreover, we performed an experiment varying the number of coprocessors, which was not previously verified due to lack of real architectures, showing that we may reduce the energy consumption, while keeping the same performance.	Leonardo Rocha, Fernando Mourão, Guilherme Andrade, Renato Ferreira, Srinivasan Parthasarathy, Danilo Melo, Sávyo Toledo, Aniket Chakrabarti

ICCS 2016 Main Track (MT) Session 11

Time and Date: 10:15 - 11:55 on 7th June 2016

Room: Toucan

Chair: Raymond de Callafon

43	An Evaluation of Data Stream Processing Systems for Data Driven Applications [abstract] Abstract: Real-time data stream processing technologies play an important role in enabling time-critical decision making in many applications. This paper aims at evaluating the performance of platforms that capable of processing streaming data. Candidate technologies include Storm, Samza, and Spark Streaming. To form the recommendation, a prototype pipeline is designed and implemented in each of the platform using data collected from sensors used in monitoring heavy-haul railway systems. Through the testing and evaluation of each candidate platform, using both quantitative and qualitative metrics, the paper describes the findings.	Jonathan Samosir, Maria Indrawan-Santiago, Pari Delir Haghighi
122	Improving Multivariate Data Streams Clustering [abstract] Abstract: Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.	Christian Bones, Luciana Romani, Elaine de Sousa
465	Network Services and Their Compositions for Network Science Applications [abstract] Abstract: Network science is moving more and more to computing dynamics on networks (so-called contagion processes), in addition to computing structural network features (e.g., key players and the like) and other parameters. Generalized contagion processes impose additional data storage and processing demands that include more generic and versatile manipulations of networked data that can be highly attributed. In this work, we describe a new network services and workflow system called MARS that supports structural network analyses and generalized network dynamics analyses. It is accessible through the internet and can serve multiple simultaneous users and software applications. In addition to managing various types of digital objects, MARS provides services that enable applications (and UIs) to add, interrogate, query, analyze, and process data. We focus on several network services and workflows of MARS in this paper. We also provide a case study using a web-based application that MARS supports, and several performance evaluations of scalability and work loads. We find that MARS efficiently processes networks of hundreds of millions of edges from many hundreds of simultaneous users.	Sherif Abdelhamid, Chris Kuhlman, Madhav Marathe, S. S. Ravi

ICCS 2016 Main Track (MT) Session 12

Time and Date: 14:10 - 15:50 on 7th June 2016

Room: Toucan

Chair: Ryan Milkovits

109	Competing Energy Lookup Algorithms in Monte Carlo Neutron Transport Calculations and Their Optimization on CPU and Intel MIC Architectures [abstract] Abstract: The Monte Carlo method is a common and accurate way to model neutron transport with minimal approximations. However, such method is rather time-consuming due to its slow convergence rate. More specifically, the energy lookup process for cross sections can take up to 80% of overall computing time and therefore becomes an important performance hotspot. Several optimization solutions have been already proposed: unionized grid, hashing and fractional cascading methods. In this paper we revisit those algorithms for both CPU and manycore (Intel MIC) architectures and introduce vectorized versions. Tests are performed with the PATMOS Monte Carlo prototype, and algorithms are evaluated and compared in terms of time performance and memory usage. Results show that significant speedup can be achieved over the conventional binary search on both CPU and Intel MIC. Further optimization with vectorization instructions has been proved very efficient on Intel MIC architecture due to its 512-bit Vector Processing Unit (VPU); on CPU this improvement is limited by the smaller VPU width.	Yunsong Wang, Emeric Brun, Fausto Malvagi, Christophe Calvin
280	An Ensemble Approach to Weak-Constraint Four-Dimensional Variational Data Assimilation [abstract] Abstract: This article presents a framework for performing ensemble and hybrid data assimilation in a weak-constraint four-dimensional variational data assimilation system (w4D-Var). A practical approach is considered that relies on an ensemble of w4D-Var systems solved by the incremental algorithm to obtain flow-dependent estimates to the model error statistics. A proof-of-concept is presented in an idealized context using the Lorenz multi-scale model. A comparative analysis is performed between the weak- and strong-constraint ensemble-based methods. The importance of the weight coefficients assigned to the static and ensemble-based components of the error covariances is also investigated. Our preliminary numerical experiments indicate that an ensemble-based model error covariance specification may significantly improve the quality of the analysis.	Jeremy Shaw, Dacian Daescu
389	Combining MSM and ABM for Micro-Level Population Dynamics [abstract] Abstract: Population dynamics illustrates the changes of the size and age composition of populations. Modeling and simulation techniques have been used to model those population dynamics, and the developed models are utilized to design and analyze public polices. One classic method to model population dynamics is microsimulation. The microsimulation describes population dynamics in an individual-level, and an individual acts depending on stochastic process. An emerging method is agent-based model which rather focuses on interactions among individuals and expects to see unexpected situations generated by the interactions. Their different attentions on individuals can make them to complement the weak point of the opponent in the development of population dynamics model. From this perspective, This paper proposes a hybrid model structure combining microsimulation and agent-based model for modeling population dynamics. In the proposed model, the microsimulation model takes a role to depict how an individual chooses its behavior based on stochastic process parameterized by real data; the agent-based model incorporates interactions among individuals considering their own states and rules. The case study introduces Korean population dynamics model developed by the proposed approach, and its simulation results show the population changes triggered by a variance of behavior and interaction factors.	Jang Won Bae, Euihyun Paik, Kiho Kim, Karandeep Singh, Mazhar Sajjad
451	Complex data-driven predictive modeling in personalized clinical decision support for acute coronary syndrome episodes [abstract] Abstract: The paper presents the idea of a complex model of clinical episode applied, based on data-driven approach for decision support in treatment of ACS (Acute Coronary Syndrome). The idea is aimed towards improvement of predictive capability of a data-driven model by combination of different models within a composite data-driven model. It can implement either hierarchical or alternative combination of models. Three examples of data-driven models are described: simple classifier, outcome prediction based on reanimation time and states-based prediction model to be used as a part of complex model of episodes. To implement the proposed approach a generalized architecture of data-driven clinical decision support systems was developed. The solution is developed as a part of complex clinical decision support system for cardiac diseases for Federal Almazov North-West Medical Research Centre in Saint Petersburg, Russia.	Alexey V. Krikunov, Ekaterina V. Bolgova, Evgeniy Krotov, Tesfamariam M. Abuhay, Alexey N. Yakovlev, Sergey V. Kovalchuk
452	Agent-based Modelling Using Ensemble Approach with Spatial and Temporal Composition [abstract] Abstract: Crowd behavior and its movement has been an actively studied domain during last three decades. There are microscopic models used for realistic simulation of crowds in different conditions. Such models reproduce pedestrian movement quite well, however, their efficiency can vary depending on the conditions of simulation. For instance, some models show realistic results in high density of pedestrians and vice versa in low density. This work describes an early study aimed at developing an approach to combine several microscopic models using an ensemble approach to overcome individual weaknesses of the models. Possible ways to build hybrid models, as well as the main classes of ensembles are described. A prior calibration procedure was implemented using the evolutionary approach to create an ensemble of the most suitable models using dynamical macro-parameters such as density and speed as the optimization objectives. Several trial experiments and comparisons with single models were carried out for selected types of hybridization.	Andrey Kiselev, Vladislav Karbovskii, Sergey Kovalchuk

ICCS 2016 Main Track (MT) Session 13

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Toucan

Chair: Daniel Crawl

461	Success Rate of Creatures Crossing a Highway as a Function of Model Parameters [abstract] Abstract: In modeling swarms of autonomous robots, individual robots may be identified as cognitive agents. We describe a model of population of simple cognitive agents, naïve creatures, learning to safely cross a cellular automaton based highway. These creatures have the ability to learn from each other by evaluating if creatures in the past were successful in crossing the highway for their current situation. The creatures use “observational social learning” mechanism in their decision to cross the highway or not. The model parameters heavily influence the learning outcomes examined through the collected simulation metrics. We study how these parameters, in particular the knowledge base, influence the creatures’ success rate of crossing the highway.	Anna T. Lawniczak, Leslie Ly, Fei Yu
10	Using Analytic Solution Methods on Unsaturated Seepage Flow Computations [abstract] Abstract: This paper describes a change of variables applied to Richards’ equation for steady-state unsaturated seepage flow that makes the numerical representation of the new version of this highly nonlinear partial differential equation (PDE) much easier to solve, and the solution is significantly more accurate. The method is applied to two-dimensional unsaturated steady-state flow in a block of soil that is initially very dry until water is applied at the top. Both a quasi-linear version of relative hydraulic conductivity for which an analytic solution exists and a van Genuchten version of relative hydraulic conductivity are numerically solved using the original and new versions of the governing PDE. Finally, results of this research will be presented in this paper. It was found that for the test problem, the change-of-variables version of the governing PDE was significantly easier to solve and resulted in more accurate solutions than the original version of the PDE.	Fred Tracy
188	Predictor Discovery for Early-Late Indian Summer Monsoon Using Stacked Autoencoder [abstract] Abstract: Indian summer monsoon has distinct behaviors in its early and late phase. The influencing climatic factors are also different. In this work we aim to predict the national rainfall in these phases. The predictors used by the forecast models are discovered using a stacked autoencoder deep neural network. A fitted regression tree is used as the forecast model. A superior accuracy to state of art method is achieved. We also observe that the late monsoon can be predicted with higher accuracy than early monsoon rainfall.	Moumita Saha, Pabitra Mitra, Ravi S. Nanjundiah

ICCS 2016 Main Track (MT) Session 14

Time and Date: 10:15 - 11:55 on 8th June 2016

Room: Toucan

Chair: Marcin Plociennik

408	Efficient Sphere Detector Algorithm for Massive MIMO using GPU Hardware Accelerator [abstract] Abstract: To further enhance the capacity of next generation wireless communication systems, massive MIMO has recently appeared as a necessary enabling technology to achieve high performance signal processing for large-scale multiple antennas. However, massive MIMO systems inevitably generate signal processing overheads, which translate into ever-increasing rate of complexity, and therefore, such system may not maintain the inherent real-time requirement of wireless systems. We redesign the non-linear sphere decoder method to increase the performance of the system, cast most memory-bound computations into compute-bound operations to reduce the overall complexity, and maintain the real-time processing thanks to the GPU computational power. We show a comprehensive complexity and performance analysis on an unprecedented MIMO system scale, which can ease the design phase toward simulating future massive MIMO wireless systems.	Mohamed-Amine Arfaoui, Hatem Ltaief, Zouheir Rezki, Mohamed-Slim Alouini, David Keyes
126	In-situ Data Infrastructure for Scientific Unit Testing Platform [abstract] Abstract: Testing is a significant software development process for the management of software system and scientific code. However, many scientific codes have become much more complicated, which means there are extra needs to check the changes positively impact dependent modules and additional needs to verify the system constraints. The software complexity also impediments module developers and software engineers to rapidly develop and extend their code. Recently, we have developed an automatic methodology and prototype platform to facilitate scientific verification of individual functions within complex scientific codes, so that, the science module builders are able to track variables conveniently in one module or track variables changes among different modules. In this paper, we present a procedure for automatic unit testing generation. For the interest of general audience of this conference, we place emphasis on the technical details of integrating the In Situ data Infrastructure into our platform. At the end of this paper, we have included an implementation of the unit testing for ACME Land Model (ALM) to demonstrate the usefulness and correctness of the platform. We’ve also used single point check and multi point check to demonstrate the easy variable tracking capability of this platform.	Zhuo Yao, Yulu Jia, Dali Wang, Chad Steed, Scott Atchley
213	Recovering the MSS-sequence via CA [abstract] Abstract: A cryptographic sequence generator, the modified self-shrinking generator (MSSG), was recently designed as a novel version of the self-shrinking generator. Taking advantage of the cryptographic properties of the irregularly decimated generator class, the MSSG was mainly created to be used in stream cipher applications and hardware implementations. Nevertheless, in this work it is shown that the MSSG output sequence, the so-called modified self-shrunken sequence, is generated as one of the output sequences of a linear model based on Cellular Automata that use rule 60 for their computations. Thus, the linearity of these structures can be advantageous exploited to recover the complete modified self-shrunken sequence from a number of intercepted bits.	Sara D. Cardell, Amparo Fúster-Sabater
328	Accelerated graph-based non-linear denoising filters [abstract] Abstract: Denoising filters, such as bilateral, guided, and total variation filters, applied to images on general graphs may require repeated Application if noise is not small enough. We formulate two acceleration techniques of the resulted iterations: conjugate gradient method and Nesterov's acceleration. We numerically show efficiency of the accelerated nonlinear filters for image denoising and demonstrate 2-12 times speed-up, i.e., the acceleration techniques reduce the number of iterations required to reach a given peak signal-to-noise ratio (PSNR) by the above indicated factor of 2-12.	Andrew Knyazev, Alexander Malyshev

ICCS 2016 Main Track (MT) Session 15

Time and Date: 13:25 - 15:05 on 8th June 2016

Room: KonTiki Ballroom

Chair: Craig Douglas

92	Detecting frog calling activity based on acoustic event detection and multi-label learning [abstract] Abstract: Frog population has been declining the past decade for habitat loss, invasive species, climate change, and so forth. Therefore, it is becoming ever more important to monitor the frog population. Recent advances in acoustic sensors make it possible to collect frog vocalizations over large spatio-temporal scale. Through the detection of frog calling activity with collected acoustic data, frog population can be predicted. In this paper we propose a novel method for detecting frog calling activity using acoustic event detection and multi-label learning. Here, frog calling activity consists of frog abundance and frog species richness, which denotes number of individual frog calls and number of frog species respectively. To be specific, each segmented recording is first transformed to a spectrogram. Then, acoustic event detection is used to calculate frog abundance. Meanwhile, those recordings without frog calls are filtered out. For frog species richness, three acoustic features, linear predictive coefficients, Mel-frequency Cepstral coefficients and wavelet-based features are calculated. Then, multi-label learning is used to predict frog species richness. Lastly, statistical analysis is used to reflect the relationship between frog calling activity (frog abundance and frog species richness) and weather variables. Experiment results show that our proposed method can accurately detect frog calling activity and reflect its relationship with weather variables.	Jie Xie, Michael Towsey, Jinglan Zhang, Paul Roe
228	Genome-Wide Association Interaction Studies with MB-MDR and maxT multiple testing correction on FPGAs [abstract] Abstract: In the past few years massive amounts of data have been generated for genetic analysis. Existing solutions to analyze this data concerning genome-wide gene interactions are either not powerful enough or can barely be managed with standard computers due to the tremendous amount of statistical tests to be performed. Also, common approaches using cluster or cloud technologies for parallel analysis are operating at the edge of what is currently possible. This work demonstrates how FPGAs are able to address this problem. We present a highly parallel, hardware oriented solution for genome-wide association interaction studies (GWAIS) with MB-MDR and the maxT multiple testing correction on an FPGA-based architecture. We achieve a more than 300-fold speedup over an AMD Opteron cluster with 160 cores on an FPGA-system equipped with 128 Xilinx Spartan6 LX150 low-cost FPGAs when analyzing a WTCCC-like dataset with 500,000 markers and 5,000 samples. Furthermore, we are able to keep pace with a 256-core Intel Xeon cluster running MB-MDR~4.2.2 with an approximative version of maxT, while we achieve a 190-fold speedup over the sequential execution of this version on one Xeon core.	Sven Gundlach, Jan Christian Kässens, Lars Wienbrandt
540	Biological Systems Through the Informational Lens [abstract] Abstract: Computation is often seen as information processing. Many biological systems may be investigated in terms of information storage, signaling, and data processing networks. Much of this data processing activity is embodied in structural transformations in spatial scales ranging from the molecular to cellular networks. The biomedical sciences make use of an increasingly powerful arsenal of tools and technologies for obtaining structural data as well as details of mass transport and the chemical and electrical signals that underlie these fundamental biological processes. For example, new staining techniques combined with computer-based electron microscope tomography, permit the clear imaging of chromatin filaments in the cell nucleus and filament networks in the cytoplasmic and extracellular space via the electron microscope. The application of tomographic reconstruction software developed at the National Center for Microscopy and Imaging Research (NCMIR) enables detailed 3D reconstructions of the relevant biological structures and processes. In order to deal with fundamental issues related to information processing in biological systems, new data processing methods as well as advances in chemically sensitive probes and imaging technology must be applied across a wide range of spatial and temporal scales. One class of increasingly useful tools for modeling biological systems, evaluating imaging technologies and characterizing the fidelity of digital processing has its roots in theoretical investigations in statistical mechanics, which arise from the concepts of information and entropy. We review how concepts of information and entropy may give new perspectives on the flow of information within biological systems, as well as our instrumentation and computer processing.	Albert Lawrence, Tsvi Katchalski, Alex Perez, Mark Ellisman
127	A new Approach for Automatic Detection of Tactile Paving Surfaces in Sidewalks [abstract] Abstract: In recent years increased the research interest in the development of different approaches to support the mobility of the visually impaired. The automatic detection of tactile paving surface is one important topic of research, not only to help the mobility of visually impaired persons, but also for use in the displacement of autonomous robots, providing a safely route and warnings. In this paper we propose an approach for tactile paving surface detection in real-time with the purpose to assist visually impaired persons. It uses computer vision algorithms combined with decision tree to eliminate some possible false alarms. We assume the visually impaired persons holds a smartphone, which is used to obtain images, as well as to assist him by audio feedback to keep it on the tactile paving surface. This problem is very challenging, mainly due to illumination changes, occlusion, image noise and resolution, as well as different possible colors of the tactile paving surfaces. Experimental results indicate that the proposed approach works well in low resolution images, effectively detecting the tactile paving surfaces in real test scenarios.	Marcelo C. Ghilardi, Rafael C. O. Macedo, Isabel H. Manssour
108	Particle Swarm Optimization Simulation via Optimal Halton Sequences [abstract] Abstract: Inspired by the social behavior of the bird flocking or fish schooling, the particle swarm optimization (PSO) is a population based stochastic optimization method developed by Eberhart and Kennedy in 1995. It has been used across a wide range of applications. Faure, Halton and Vander Corput sequences have been used for initializing the swarm in PSO. Quasirandom(or low-discrepancy) sequences such as Faure, Halton, Vander Corput etc are deterministic and suffers from correlations between radical inverse functions with different bases used for different dimensions. In this paper, we investigate the effect of initializing the swarm with scrambled optimal Halton sequence, which is a randomized quasirandom sequence. This ensures that we still have the uniformity properties of quasirandom sequences while preserving the stochastic behavior for particles in the swarm. Numerical experiments are conducted with benchmark objective functions with high dimensions to verify the convergence and effectiveness of the proposed initialization of PSO	Ganesha Weerasinghe, Hongmei Chi, Yanzhao Cao

ICCS 2016 Main Track (MT) Session 16

Time and Date: 16:40 - 18:20 on 6th June 2016

Room: Rousseau Center

Chair: Christopher Monterola

357	Distributed Multi-authority Attribute-based Encryption Scheme for Friend Discovery in Mobile Social Networks [abstract] Abstract: In recent years, the rapid expansion of the capability of portable devices, cloud servers and cellular network technologies is the wind beneath the wing of mobile social networks. Compared to traditional web-based online social networks, the mobile social networks can assist users to easily discover and make new social interaction with others. A challenging task is to protect the privacy of the users' profiles and communications. Existing works are mainly based on traditional cryptographic methods, such as homomorphic and group signatures, which are very computationally costly. In this paper, we propose a novel distributed multi-authority attribute-based encryption scheme to efficiently achieve privacy-preserving without additional special signatures. In addition, the proposed scheme can achieve fine-grained and flexible access control. Detailed analysis demonstrates the effectiveness and practicability of our scheme.	Wenbo Wang, Fang Qi, Xiaoqiang Wu, Zhe Tang
58	ADAMANT: tools to capture, analyze, and manage data movement [abstract] Abstract: In the converging world of High Performance Computing and Big Data, moving data is becoming a critical aspect of performance and energy efficiency. In this paper we present the Advanced DAta Movement Analysis Toolkit (ADAMANT), a set of tools to capture and analyze data movement within an application, and to aid in understanding performance and efficiency in current and future systems. ADAMANT identifies all the data objects allocated by an application and uses instrumentation modules to monitor relevant events (e.g. cache misses). Finally, ADAMANT produces a per-object performance profile. In this paper we demonstrate the use of ADAMANT in analyzing three applications, BT, BFS, and Velvet, and evaluate the impact of different memory technology. With the information produced by ADAMANT we were able to model and compare different memory configurations and object placement solutions. In BFS we devised a placement which outperforms caching, while in the other two cases we were able to point out which data objects may be problematic for the configurations explored, and would require refactoring to improve performance.	Pietro Cicotti, Laura Carrington
67	Urgent Computing - A General Makespan Robustness Model for Ensembles of Forecasts [abstract] Abstract: Urgent computing requires computations to commence in short order and complete within a stipulated deadline so as to support mitigation activities in preparation, response and recovery from an event that requires immediate attention. Missing an urgent deadline can lead to dire consequences where avoidable human and financial losses ensued. Allocation of resources such that the deadline can be met is thus crucial. Robustness is of great importance to ensure that small perturbations on the computing systems do not affect the makespan of computations such that the deadline is missed. This work focuses on developing a general mathematical makespan model for urgent computing to enable a robust allocation of ensemble forecasts while minimising the makespan. Three patterns of different resource allocation will be investigated to illustrate the model. The result will aid in satisfying the most crucial requirement, the time criterion, of urgent computing.	Siew Hoon Leong, Dieter Kranzlmüller