ICCS 2019 Main Track (MT) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2019

Room: 1.5

Chair: Howard Stamato

67 Efficient Computation of Sparse Higher Derivative Tensors [abstract]
Abstract: The computation of higher derivatives tensors is expensive even for adjoint algorithmic differentiation methods. In this work we introduce methods to exploit the symmetry and the sparsity structure of higher derivatives to considerably improve the efficiency of their computation. The proposed methods apply coloring algorithms to two-dimensional compressed slices of the derivative tensors. The presented work is a step towards feasibility of higher-order methods which might benefit numerical simulations in numerous applications of computational science and engineering.
Jens Deussen and Uwe Naumann
120 Being Rational about Approximating Scientific Data [abstract]
Abstract: Scientific datasets are becoming increasingly challenging to transfer, analyze, and store. There is a need for methods to transform these datasets into compact representations that facilitate their downstream management and analysis, and ideally model the underlying scientific phenomena with defined numerical fidelity. To address this need, we propose nonuniform rational B-splines (NURBS) for modeling discrete scientific datasets; not only to compress input data points, but also to enable further analysis directly on the continuous fitted model, without the need for decompression. First, we evaluate three different methods for NURBS fitting, and compare their performance relative to unweighted least squares approximation (B-splines). We then extend current state-of-the-art B-spline adaptive approximation to NURBS; that is, adaptively determining optimal rational basis functions and weighted control point locations that approximate given input data points to prespecified accuracy. Additionally, we present a novel local adaptive algorithm to iteratively approximate large data input domains. This method takes advantage of NURBS local support to refine regions of the approximated model, acting locally on both input and model subdomains, without affecting other regions of the global approximation. We evaluate our methods in terms of approximated model compactness, achieved accuracy, and computational cost on both synthetic smooth functions and real-world scientific data.
Youssef Nashed, Tom Peterka, Vijay Mahadevan and Iulian Grindeanu
336 Design of a High-Performance Tensor-Vector Multiplication with BLAS [abstract]
Abstract: Tensor contraction is an important mathematical operation for many scientific computing applications that use tensors to store massive multidimensional data. Based on the Loops-over-GEMMs (LOG) approach, this paper discusses the design of high-performance algorithms for the mode-q tensor-vector multiplication using efficient implementations of the matrix-vector multiplication (GEMV). Given dense tensors with any non-hierarchical storage format, tensor order and dimensions, the proposed algorithms either directly call GEMV with tensors or recursively apply GEMV on higher-order tensor slices multiple times. We analyze strategies for loop-fusion and parallel execution of slice-vector multiplications with higher-order tensor slices. Using OpenBLAS, our implementations attain up to 113% of the GEMV's peak performance. Our parallel version of the tensor-vector multiplication achieves speedups of up to 12.6x over other state-of-the-art approaches.
Cem Bassoy
388 High Performance Partial Coherent X-ray Ptychography [abstract]
Abstract: During the last century, X-ray science has enabled breakthrough discoveries in fields as diverse as medicine, material science or electronics, and recently, ptychography has risen as a reference imaging technique in the field. It provides resolutions of a billionth of a meter, macroscopic field of view, or the capability to retrieve chemical or magnetic contrast, among other features. The goal of ptychography is to reconstruct a 2D visualization of a sample from a collection of diffraction patterns generated from the interaction of a light source with the sample. Reconstruction involves solving a nonlinear optimization problem employing a large amount of measured data —typically two orders of magnitude bigger than the reconstructed sample— so high performance solutions are normally required. A common problem in ptychography is that the majority of the flux from the light sources is often discarded to define the coherence of an illumination. Gradient Decomposition of the Probe (GDP) is a novel method devised to address this issue. It provides the capability to significantly improve the quality of the image when partial coherence effects take place, at the expense of a three-fold increase of the memory requirements and computation. This downside, along with the fine-grained degree of parallelism of the operations involved in GDP, makes it an ideal target for GPU acceleration. In this paper we propose the first high performance implementation of GDP for partial coherence X-ray ptychography. The proposed solution exploits an efficient data layout and multi-gpu parallelism to achieve massive acceleration and efficient scaling. The experimental results demonstrate the enhanced reconstruction quality and performance of our solution, able process up to 4 million input samples per second on a single high-end workstation, and compare its performance with a reference HPC ptychography pipeline.
Pablo Enfedaque, Stefano Marchesini, Huibin Chang, Bjoern Enders and David Shapiro
452 Monte Carlo Analysis of Local Cross-Correlation ST-TBD Algorithm [abstract]
Abstract: The Track-Before-Detect (TBD) algorithms allow the estimation of the state of an object, even if the signal is hidden in the background noise. The application of local cross-correlation for modified Information Update formula improves this estimation for extended objects (tens of cells in the measurement space) compared to direct application of the Spatio-Temporal TBD (ST-TBD) algorithm. Monte Carlo test was applied to evaluate algorithms by using a variable standard deviation of additive Gaussian noise. Proposed solution does not require prior knowledge of the size or measured values of the object.
Przemyslaw Mazurek and Robert Krupinski

ICCS 2019 Main Track (MT) Session 2

Time and Date: 14:40 - 16:20 on 12th June 2019

Room: 1.5

Chair: Pablo Enfedaque

453 Optimization of Demodulation for Air-Gap Data Transmission based on Backlight Modulation of Screen [abstract]
Abstract: Air-gap is an efficient technique for the improving of computer security. Proposed technique uses backlight modulation of monitor screen for data transmission from infected computer. The optimization algorithm for the segmentation of video stream is proposed for the improving of data transmission robustness. This algorithm is tested using Mote Carlo approach with full frame analysis for different values of standard deviations of additive Gaussian noise. Achieved results show improvements for proposed selective image processing for low values of standard deviation about ten times.
Dawid Bak, Przemyslaw Mazurek and Dorota Oszutowska-Mazurek
304 Reinsertion algorithm based on destroy and repair operators for dynamic dial a ride problems [abstract]
Abstract: The Dial-a-Ride Problem (DARP) consists in serving a set of customers who specify their pickup and drop-off locations using a fleet of vehicles. The aim of DARP is designing vehicle routes satisfying requests of customers and minimizing the total traveled distance. In this paper, we consider a real case of dynamic DARP service operated by Padam which offers a high quality transportation service in which customers ask for a service either in advance or in real time and get an immediate answer about whether their requests are accepted or rejected. A fleet of fixed number of vehicles is available during a working period of time to provide a transportation service. The goal is to maximize the number of accepted requests during the service. In this paper, we propose an original and novel online Reinsertion Algorithm based on destroy/repair operators to reinsert requests rejected by the online algorithm used by Padam. When the online algorithm fails to insert a new customer, the proposed algorithm intensively exploits the neighborhood of the current solution using destroy/repair operators to attempt to find a new solution, allowing the insertion of the new client while respecting the constraints of the problem. The proposed algorithm was implemented in the opti- mization engine of Padam and extensively tested on real hard instances up to 1011 requests and 14 vehicles. The results show that our method succeeds in improving the number of accepted requests while keeping similar transportation costs on almost all instances, despite the hardness of the real instances. In half of the cases, reduction of the number of vehicles is attained, which is a huge benefit for the company.
Sven Vallée, Ammar Oulamara and Wahiba Ramdane Cherif-Khettaf
399 Optimization heuristics for computing the Voronoi skeleton [abstract]
Abstract: A skeleton representation of geometrical objects is widely used in computer graphics, computer vision, image processing, and pattern recognition. Therefore, efficient algorithms for computing planar skeletons are of high relevance. In this paper, we focus on the algorithm for computing the Voronoi skeleton of a planar object represented by a set of polygons. The complexity of the considered Voronoi skeletonization algorithm is O(N log N), where N is the total number of polygon’s vertices. In order to improve the performance of the skeletonization algorithm, we proposed theoretically justified shape optimization heuristics basing on polygon simplification algorithms. We evaluated the efficiency of such heuristics using polygons extracted from MPEG 7 CE-Shape-1 dataset and measured the execution time of the skeletonization algorithm, computational overheads related to the introduced heuristics and also the influence of the heuristic onto the accuracy of the resulting skeleton. As a result, we established the criteria allowing us to choose the optimal heuristics for Voronoi skeleton construction algorithm depending on the critical system’s requirements.
Dmytro Kotsur and Vasyl Tereschenko
239 Transfer Learning for Leisure Centre Energy Consumption Prediction [abstract]
Abstract: Demand for energy is ever growing. Accurate prediction of energy demand of large buildings becomes essential for property management to operate these facilitates more efficiently and greener. Various temporal modelling provides reliable yet straightforward paradigm for short term building energy prediction. However newly constructed buildings, newly renovated buildings, or buildings that have energy monitoring systems newly installed do not have sufficient data to build energy demand prediction models. In contrast, established buildings often have vast amounts of data collected. The model learned from these data can be useful if transferred to buildings with little or no data. Two tree-based machine learning algorithms were introduced in this study on transfer learning. Datasets from two leisure centers in Melbourne were used. The results show that transfer learning is a promising technique in predicting accurately under a new scenario as it can achieve similar or even better performance compared to learning on a full dataset.
Paul Banda, Muhammed Bhuiyan, Kevin Zhang and Andy Song

ICCS 2019 Main Track (MT) Session 3

Time and Date: 16:50 - 18:30 on 12th June 2019

Room: 1.5

Chair: Youssef Nashed

12 Forecasting Model for Network Throughput of Remote Data Access in Computing Grids [abstract]
Abstract: Computing grids are one of the key enablers of computational science. Researchers from many fields (High Energy Physics, Bioinformatics, Climatology, etc.) employ grids for execution of distributed computational jobs. Such computing workloads are typically data-intensive. The current state of the art approach for data access in grids is data placement: a job is scheduled to run at a specific data center, and its execution commences only when the complete input data has been transferred there. An alternative approach is remote data access: a job may stream the input data directly from storage elements. Remote data access brings two innovative benefits: (1) the jobs can be executed asynchronously with respect to the data transfer; (2) when combined with data placement on the policy level, it may help to optimize the network load grid-wide, since these two data access methodologies partially exhibit nonoverlapping bottlenecks. However, in order to employ such a technique systematically, the properties of its network throughput need to be studied carefully. This paper presents results of experimental identification of parameters influencing the throughput of remote data access, a statistically tested formalization of these parameters and a derived throughput forecasting model. The model is applicable to large computing workloads, robust with respect to arbitrary dynamic changes in the grid infrastructure and exhibits a long-term forecasting horizon. Its purpose is to assist various stakeholders of the grid in decision-making related to data access patterns. This work is based on measurements taken on the Worldwide LHC Computing Grid at CERN.
Volodimir Begy, Martin Barisits, Mario Lassnig and Erich Schikuta
408 Collaborative Simulation Development Accelerated by Cloud Based Computing and Software as a Service Model [abstract]
Abstract: Simulations are increasingly used in pharmaceutical development to deliver medicines to patients more quickly; more efficiently; and with better designs, safety, and effect. These simulations need high performance computing resources as well as a variety of software to model the processes and effects on the pharmaceutical product at various scales of scrutiny: from the atomic scale to the entire production process. The demand curve for these resources has many peaks and can shift in a time scale much faster than a typical procurement process. Both on-demand cloud based computing capability and software as a service models have been growing in use. This presentation describes the efforts of the Enabling Technology Consortium to apply these information technology models to pharmaceutical simulations which have special needs of documentation and security. It is expected that the environment will have more benefits as the cloud can be configured for collaborative work among companies in the non-competitive space and all the work can be made available for use by contract service vendors or health authorities. The expected benefits of this computing environment include economies of scale for both the providers and the consumer, increased resources and for consumers by the information available to accelerate and improve delivery of pharmaceutical products.
Howard Stamato
487 Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows [abstract]
Abstract: While distributed computing infrastructures can provide infrastructure-level techniques for managing energy consumption, application-level energy consumption models have also been developed to support energy-efficient scheduling and resource provisioning algorithms. In this work, we analyze the accuracy of application-level models that have been developed and used in the context of scientific workflow executions. To this end, we profile two production scientific workflows on a distributed platform instrumented with power meters. We then conduct an analysis of power and energy consumption measurements. This analysis shows that power consumption is not linearly related to CPU utilization and that I/O operations significantly impact power, and thus energy, consumption. We then propose a power consumption model that accounts for I/O operations, including the impact of waiting for these op- erations to complete, and for concurrent task executions on multi-socket, multi-core compute nodes. We implement our proposed model as part of a simulator that allows us to draw direct comparisons between real-world and modeled power and energy consumption. We find that our model has high accuracy when compared to real-world executions. Furthermore, our model improves accuracy by about two orders of magnitude when compared to the traditional models used in the energy-efficient workflow scheduling literature.
Rafael Ferreira Da Silva, Anne-Cécile Orgerie, Henri Casanova, Ryan Tanaka, Ewa Deelman and Frédéric Suter
62 Exploratory Visual Analysis of Anomalous Runtime Behavior in Streaming High Performance Computing Applications [abstract]
Abstract: Online analysis of runtime behavior is essential for performance tuning in streaming scientific workflows. Integration of anomaly detection and visualization is necessary to support human-centered analysis, such as verification of candidate anomalies utilizing domain knowledge. In this work, we propose an efficient and scalable visual analytics system for online performance analysis of scientific workflows toward the exascale scenario. Our approach uses a call stack tree representation to encode the structural and temporal information of the function executions. Based on the call stack tree features (e.g., execution time of the root function or vector representation of the tree structure), we employ online anomaly detection approaches to identify candidate anomalous function executions. We also present a set of visualization tools for verification and exploration in a level-of-detailed manner. General information, such as distribution of execution times, are provided in an overview visualization. The detailed structure (e.g., function invocation relations) and the temporal information (e.g., message communication) of the execution call stack of interest are also visualized. The usability and efficiency of our methods are verified in the NWChem use case.
Cong Xie, Wonyong Jeong, Gyorgy Matyasfalvi, Hubertus Van Dam, Klaus Mueller, Shinjae Yoo and Wei Xu

ICCS 2019 Main Track (MT) Session 4

Time and Date: 10:15 - 11:55 on 13th June 2019

Room: 1.5

Chair: Jens Weismüller

15 Analysis of the construction of similarity matrices on multi-core and many-core platforms using different similarity metrics [abstract]
Abstract: Similarity matrices are 2D representations of the degree of similarity between points of a given dataset which are employed in different fields such as data mining, genetics or machine learning. However, their calculation presents quadratic complexity and, thus, it is specially expensive for large datasets. MPICorMat is able to accelerate the construction of these matrices through the use of a hybrid parallelization strategy based on MPI and OpenMP. The previous version of this tool achieved high performance and scalability, but it only implemented one single similarity metric, the Pearson’s correlation. Therefore, it was suitable only for those problems where data are normally distributed and there is a linear relationship between variables. In this work, we present an extension to MPICorMat that incorporates eight additional metrics for similarity so that the users can choose the one that best adapts to their problem. The performance and energy consumption of each metric is measured in two platforms: a multi-core platform with two Intel Xeon Sandy-Bridge processors and a many-core Intel Xeon Phi KNL. Results show that MPICorMat executes faster and consumes less energy on the many-core architecture. The new version of MPICorMat is publicly available to download from its website: https://sourceforge.net/projects/mpicormat/
Uxía Casal, Jorge González-Domínguez and María J. Martín
16 High Performance Algorithms for Counting Collisions and Pairwise Interactions [abstract]
Abstract: The problem of counting collisions or interactions is common in areas as computer graphics and scientific simulations. Since it is a major bottleneck in applications of these areas, a lot of research has been done on such subject, mainly focused on techniques that allow calculations to be performed within pruned sets of objects. This paper focuses on how interaction calculation (such as collisions) within these sets can be done more efficiently than existing approaches. Two algorithms are proposed: a sequential algorithm that has linear complexity at the cost of high memory usage; and a parallel algorithm, mathematically proved to be correct, that manages to use GPU resources more efficiently than existing approaches. The proposed and existing algorithms were implemented, and experiments show a speedup of 21.7 for the sequential algorithm (on small problem size), and 1.12 for the parallel proposal (large problem size). By improving interaction calculation, this work contributes to research areas that promote interconnection in the modern world, such as computer graphics and robotics.
Matheus Saldanha and Paulo Souza
206 Comparing domain-decomposition methods for the parallelization of distributed land surface models [abstract]
Abstract: Current research challenges in hydrology require models with a high resolution on a global scale. These requirements stand in great contrast to the current capabilities of distributed land surface mod- els. Hardly any literature noting efficient scalability past approximately 64 processors could be found. Porting these models to supercomputers is no simple task, because the greater part of the computational load stems from the evaluation of highly parametrized equations. Further- more, the load is heterogeneous in both spatial and temporal dimension, and considerable load-imbalances are triggered by input data. We inves- tigate different domain-decomposition methods for distributed land sur- face models and focus on their properties concerning load balancing and communication minimizing partitionings. Artificial strong scaling exper- iments from a single core to 8, 192 cores show that graph-based methods can distribute the computational load of the application almost as ef- ficiently as coordinate-based methods, while the partitionings found by the graph-based methods significantly reduce communication overhead.
Alexander von Ramm, Jens Weismüller, Wolfgang Kurtz and Tobias Neckel
228 Analysis and Detection on Abused Wildcard Domain Names Based on DNS Logs [abstract]
Abstract: Wildcard record is a type of resource records (RRs) in DNS, which can allow any domain name in the same zone to map to a single record value. Former works have made use of DNS zone file data and domain name blacklists to understand the usage of wildcard domain names. In this paper, we analyze wildcard domain names in real network DNS logs, and present some novel findings. By analyzing web contents, we found that the proportion of domain names related to pornography and online gambling contents (referred as abused domain names in this work) in wildcard domain names is much higher than that in non-wildcard domain names. By analyzing behaviors of registration, resolution and maliciousness, we found that abused wildcard domain names have remarkably higher risks in security than normal wildcard domain names. Then, based on the analysis, we proposed GSCS algorithm to detect abused wildcard domain names. GSCS is based on a domain graph, which can give insights on the similarities of abused wildcard domain names’ resolution behaviors. By applying spectral clustering algorithm and seed domains, GSCS can distinguish abused wildcard domain names from normal ones effectively. Experiments on real datasets indicate that GSCS can achieve about 86% detection rates with 5% seed domains, performing much better than BP algorithm.
Guangxi Yu, Yan Zhang, Huajun Cui, Xinghua Yang, Yang Li and Huiran Yang
275 XScan: An Integrated Tool for Understanding Open Source Community-based Scientific Code [abstract]
Abstract: Many scientific communities have adopted community-based models that integrate multiple components to simulate whole system dynamics. The community-based models' software complex, stems from the integration of multiple individual software components that were developed under different application requirements and various machine architectures, has become a challenge for effective software system understanding and continuous software development. The paper presents an integrated software toolkit called X-ray Software Scanner (in abbreviation, XScan) for a better understanding of large-scale community-based scientific codes. Our software tool provides support to quickly summarize the overall information of scientific codes, including the number of lines of code, programming languages, external library dependencies, as well as architecture-dependent parallel software features. The XScan toolkit also realizes a static software analysis component to collect detailed structural information and provides an interactive visualization and analysis of the functions. We use a large-scale community-based Earth System Model to demonstrate the workflow, functions, and visualization of the toolkit. We also discuss using advanced graph analytics techniques to assist software modular design and component refactoring.
Weijian Zheng, Dali Wang and Fengguang Song

ICCS 2019 Main Track (MT) Session 5

Time and Date: 14:20 - 16:00 on 13th June 2019

Room: 1.5

Chair: Jorge González-Domínguez

367 An On-line Performance Introspection Framework for Task-based Runtime Systems [abstract]
Abstract: The expected high levels of parallelism together with the heterogeneity of new computing systems pose many challenges to current performance monitoring frameworks. Classical post-mortem approaches will not be sufficient for such dynamic, complex and highly concurrent environments. First, the amounts of data that can be generated from such systems will be impractical. And second, the access to real-time performance data to orchestrate program execution will be a necessity. In this paper, we present a lightweight monitoring infrastructure developed within the AllScale Runtime System, a task-based runtime system for extreme scale. This monitoring component provides on-line introspection capabilities that help the runtime scheduler in its decision making process and adaptation, while introducing minimum overhead. In addition, the monitoring component provides several post-mortem reports as well as real-time data visualisation that can be of great help in the task of performance debugging.
Xavier Aguilar, Herbert Jordan, Thomas Heller, Alexander Hirsch, Thomas Fahringer and Erwin Laure
405 Productivity-aware Design and Implementation of Distributed Tree-based Search Algorithms [abstract]
Abstract: Parallel tree-based search algorithms are present in different areas, such as operations research, machine learning and artificial intelligence. This class of algorithms is highly compute-intensive, irregular and usually relies on context-specific data structures and hand-made code optimizations. Therefore, C and C++ are the languages often employed, due to their low-level features and performance. In this work, we investigate the use of Chapel high-productivity language for the design and implementation of distributed tree search algorithms for solving combinatorial problems. The experimental results show that Chapel is a suitable language for this purpose, both in terms of performance and productivity. Despite the use of high-level features, the distributed tree search in Chapel is on average 16% slower and reaches up to 85% of the scalability observed for its MPI+OpenMP counterpart.
Tiago Carneiro Pessoa and Nouredine Melab
462 Development of Element-by-Element Kernel Algorithms in Unstructured Implicit Low-Order Finite-Element Earthquake Simulation for Many-Core Wide-SIMD CPUs [abstract]
Abstract: Acceleration of the Element-by-Element (EBE) kernel in matrix-vector products is essential for high-performance in unstructured implicit finite-element applications. However, the EBE kernel is not straight forward to attain high performance due to random data access with data recurrence. In this paper, we develop methods to circumvent these data races for high performance on many-core CPU architectures with wide SIMD units. The developed EBE kernel attains 16.3% and 20.9% of FP32 peak on Intel Xeon Phi Knights Landing based Oakforest-PACS and Intel Skylake Xeon Gold processor based system, respectively. This leads to 2.88-fold speedup over the baseline kernel and 2.03-fold speedup of the whole finite-element application on Oakforest-PACS. An example of urban earthquake simulation using the developed finite-element application is shown.
Kohei Fujita, Masashi Horikoshi, Tsuyoshi Ichimura, Larry Meadows, Kengo Nakajima, Muneo Hori and Lalith Maddegedara
516 A High-productivity Framework for Adaptive Mesh Refinement on Multiple GPUs [abstract]
Abstract: Recentlygrid-basedphysicalsimulationswithmultipleGPUs require effective methods to adapt grid resolution to certain sensitive regions of simulations. In the GPU computation, an adaptive mesh re- finement (AMR) method is one of the effective methods to compute certain local regions that demand higher accuracy with higher resolu- tion. However, the AMR methods using multiple GPUs demand compli- cated implementation and require various optimizations suitable for GPU computation in order to obtain high performance. Our AMR framework provides a high-productive programming environment of a block-based AMR for grid-based applications. Programmers just write the stencil functions that update a grid point on Cartesian grid, which are executed over a tree-based AMR data structure effectively by the framework. It also provides the efficient GPU-suitable methods for halo exchange and mesh refinement with a dynamic load balance technique. The framework- based application for compressible flow has achieved to reduce the com- putational time to less than 15% with 10% of memory footprint in the best case compared to the equivalent computation running on the fine uniform grid. It also has demonstrated good weak scalability with 84% of the parallel efficiency on the TSUBAME3.0 supercomputer.
Takashi Shimokawabe and Naoyuki Onodera
197 Harmonizing Sequential and Random Access to Datasets in Organizationally Distributed Environments [abstract]
Abstract: Computational science is rapidly developing, which pushes the boundaries in data management concerning the size and structure of datasets, data processing patterns, geographical distribution of data and performance expectations. In this paper, we present a solution for harmonizing data access performance, i.e. finding a compromise between local and remote read/write efficiency that would fit those evolving requirements. It is based on variable-size logical data-chunks (in contrast to fixed-size blocks), direct storage access and several mechanisms improving remote data access performance. The solution is implemented in the Onedata system and suited to its multi-layer architecture, supporting organizationally distributed environments -- with limited trust between data providers. The solution is benchmarked and compared to XRootD + XCache, which offers similar functionalities. The results show that the performance of both systems is comparable, although overheads in local data access are visibly lower in Onedata.
Michał Wrzeszcz, Łukasz Opioła, Bartosz Kryza, Łukasz Dutka, Renata Słota and Jacek Kitowski

ICCS 2019 Main Track (MT) Session 6

Time and Date: 16:30 - 18:10 on 13th June 2019

Room: 1.5

Chair: Andrew Lewis

18 Towards Unknown Traffic Identification Using Deep Auto-Encoder and Constrained Clustering [abstract]
Abstract: Nowadays, network traffic identification, as the fundamental technique in the field of cybersecurity, suffers from a critical problem, namely “unknown traffic”. The unknown traffic refers to network traffic generated by previously unknown applications (i.e., zero-day applications) in a pre-constructed traffic classification system. The ability to divide the mixed unknown traffic into multiple clusters, each of which contains only one application traffic as far as possible, is the key to solve this problem. In this paper, we propose the DePCK to improve the clustering purity. There are two main innovations in our framework: (i) It learns to extract bottleneck features via deep auto-encoder from traffic statistical characteristics; (ii) It uses the flow correlation to guide the process of pairwise constrained k-means. To verify the effectiveness of our framework, we make contrast experiments on two real-world datasets. The experimental results show that the clustering purity rate of DePCK can exceed 94.81% on the ISP-data and 91.48% on the WIDE-data, which outperform the state-of-the-art methods: RTC, and k-means with log data.
Shuyuan Zhao, Yafei Sang and Yongzheng Zhang
41 How to compose product pages to enhance the new users’ interest in the item catalog? [abstract]
Abstract: Converting first-time users into recurring ones is key for the success of Web-based applications. This problem is known as Pure Cold-Start and it refers to the capability of Recommender Systems (RSs) to provide useful recommendations to users without historical data. Traditionally, RSs assume that non-personalized recommendation can mitigate this problem. However, several users are not interested in consuming just biased-items, such as popular or best-rated items. Then, we introduce two new approaches inspired in user coverage maximization to deal with this problem. These coverage-based RSs reached a high number of distinct first-time users. Thus, we proposed to compose the product’s page by mixing complementary non-personalized RSs. An online study, conducted with 204 real users confirmed that we should diversify the RSs used to conquer first-time users.
Nícollas Silva, Diego Carvalho, Fernando Mourão, Adriano Pereira and Leonardo Rocha
91 Rumor Detection on Social Media: A Multi-View Model using Self-Attention Mechanism [abstract]
Abstract: With the unprecedented prevalence of social media, rumor detection has become increasingly important since it can prevent misin- formation from spreading in the public. Traditional approaches extract features from the source tweet, the replies, the user profiles as well as the propagation path of a rumor event. However, these approaches do not take the sentiment view of the users into account. The conflicting affirmative or denial stances of users can provide crucial clues for rumor detection. Besides, the existing work attach the same importance to all the words in the source tweet, but actually these words are not equally informative. To address these problems, we propose a simple but effec- tive multi-view deep learning model that is supposed to excavate stances of users and assign weights for different words. Experimental results on a social-media based dataset reveal that the multi-view model we pro- posed are useful, which achieves the state-of-the-art performance on the accuracy of automatic rumor detection. Our three-view model achieves 95.6% accuracy and our four-view model using BERT as a view also reaches an improvement of detection accuracy.
Yue Geng, Zheng Lin, Peng Fu, Weiping Wang and Dan Meng
156 EmoMix: Building An Emotion Lexicon for Compound Emotion Analysis [abstract]
Abstract: Building a high-quality emotion lexicon is regarded as the foundation of research on emotion analysis. Existing methods have focused on the study of primary categories (i.e., anger, disgust, fear, happiness, sadness, and surprise). However, there are many emotions expressed in texts that are difficult to be mapped to primary emotions, which poses a great challenge in emotion annotation for big data analysis. For instance, "despair" is a combination of "fear" and "sadness," and thus it is difficult to divide into each of them. To address this problem, we propose an automatic building method of emotion lexicon based on the psychological theory of compound emotion. This method could map emotional words into an emotion space, and annotate different emotion classes through a cascade clustering algorithm. Our experimental results show that our method outperforms the state-of-the-art methods in both word and sentence-level primary classification performance, and also offer us some insights into compound emotion analysis.
Ran Li, Zheng Lin, Peng Fu, Weiping Wang and Gang Shi
183 Long Term Implications of Climate Change on Crop Planning [abstract]
Abstract: The eects of climate change have been much speculated on in the past few years. Consequently, there has been intense interest in one of its key issues of food security into the future. This is particularly so given population increase, urban encroachment on arable land, and the degradation of the land itself. Recently, work has been done on predicting precipitation and temperature for the next few decades as well as developing optimisation models for crop planning. Combining these together, this paper examines the eects of climate change on a large food producing region in Australia, the Murrumbidgee Irrigation Area. For time periods between 1991 and 2071 for dry, average and wet years, an analysis is made about the way that crop mixes will need to change to adapt for the eects of climate change. It is found that sustainable crop choices will change into the future, particularly those that require large amounts of water, such as cotton
Andrew Lewis, Marcus Randall, Sean Elliott and James Montgomery

ICCS 2019 Main Track (MT) Session 7

Time and Date: 10:15 - 11:55 on 14th June 2019

Room: 1.5

Chair: Pedro Silva

247 Representation Learning of Taxonomies for Taxonomy Matching [abstract]
Abstract: Taxonomy matching aims to discover categories alignments between two taxonomies, which is an important operation of knowledge sharing task to benefit many applications. The existing methods for taxonomy matching mostly depend on string lexical features and domain-specific information. In this paper, we consider the method of representation learning of taxonomies, which projects categories and relationships into low-dimensional vector spaces. We propose a method to takes advantages of category hierarchies and siblings, which exploits a low-dimensional semantic space to modeling categories relations by translating operations in the semantic space. We take advantage of maximum weight matching problem on bipartite graphs to model taxonomy matching problem, which runs in polynomial time to generate optimal categories alignments for two taxonomies in a global manner. Experimental results on OAEI benchmark datasets show that our method significantly outperforms the baseline methods in taxonomy matching.
Hailun Lin
311 Creating Training Data for Scientific Named Entity Recognition with Minimal Human Effort [abstract]
Abstract: Scientific Named Entity Referent Extraction is often more complicated than traditional Named Entity Recognition (NER). For ex- ample, in polymer science, chemical structure may be encoded in a variety of nonstandard naming conventions, and authors may refer to polymers with conventional names, commonly used names, labels (in lieu of longer names), synonyms, and acronyms. As a result, accurate scientific NER methods are often based on task-specific rules, which are difficult to develop and maintain, and are not easily generalized to other tasks and fields. Machine learning models require substantial expert-annotated data for training. Here we propose polyNER: a semi-automated system for efficient identification of scientific entities in text. PolyNER applies word embedding models to generate entity-rich corpora for productive expert labeling, and then uses the resulting labeled data to bootstrap a context-based word vector classifier. Evaluation on materials science publications shows that polyNER’s combination of automated analysis with minimal expert input enables noticeably improved precision or re- call relative to a state-of-the-art chemical entity extraction system. This remarkable result highlights the potential for human-computer partner- ship for constructing domain-specific scientific NER systems.
Roselyne Tchoua, Aswathy Ajith, Zhi Hong, Logan Ward, Kyle Chard, Alexander Belikov, Debra Audus, Shrayesh Patel, Juan de Pablo and Ian Foster
366 Evaluating the benefits of Key-Value databases for scientific applications [abstract]
Abstract: The convergence of Big Data applications with High-Performance Computing requires new methodologies to store, manage and process large amounts of information. Traditional storage solutions are unable to scale and that results in complex coding strategies. For example, the brain atlas of the Human Brain Project has the challenge to process large amounts of high-resolution brain images. Given the computing needs, we study the effects of replacing a traditional storage system with a distributed key-value database on a cell segmentation application. The original code uses HDF5 files on GPFS through a complex interface and imposes synchronizations. On the other hand, by using Apache Cassandra or ScyllaDB through Hecuba, the application code is greatly simplified. Also, thanks to the key-value data model the number of synchronizations is reduced and the time dedicated to I/O scales when increasing the number of nodes.
Pol Santamaria, Lena Oden, Yolanda Becerra, Eloy Gil, Raül Sirvent, Philipp Glock and Jordi Torres
427 Scaling the Training of Recurrent Neural Networks on Sunway TaihuLight Supercomputer [abstract]
Abstract: The recurrent neural network (RNN) models require longer training time with larger datasets and bigger number of parameters. Distributed training with large mini-batch size is a potential solution to accelerate the whole training process. This paper proposes a framework for large-scale training RNN/LSTM on the Sunway TaihuLight (SW) supercomputer. We take series of architecture-oriented optimizations for the memory-intensive kernels in RNN models to improve the computing performance. The lazy communication scheme with improved communication implementation and the distributed training and testing scheme are proposed to achieve high scalability for distributed training. Furthermore, we explore the training algorithm with large mini-batch size, in order to improve convergence speed without losing accuracy. The framework supports training RNN models with large size of parameters with at most 800 training nodes. The evaluation results show that, compared to training with single computing node, training based on proposed framework can achieve a 100-fold convergence rate with 8,000 mini-batch size.
Ouyi Li, Wenlai Zhao, Xuancheng Huang, Yushu Chen, Lin Gan, Hongkun Yu, Jiacheng Zhang, Yang Liu, Haohuan Fu and Guangwen Yang
370 Future ramifications of age-dependent immunity levels for measles: explorations in an individual-based model [abstract]
Abstract: When a high population immunity already exists for a dis- ease, heterogeneities become more important to understand the spread of this disease. Individual-based models are suited to investigate the ef- fects of these heterogeneities. Measles is a disease for which, in many regions, high population immunity exists. However, different levels of immunity are observed for different age groups. For example, the gen- eration born between 1985 and 1995 in Flanders is incompletely vacci- nated, and thus has a higher level of susceptibility. As time progresses, this peak in susceptibility will shift to an older age category. Simultane- ously, susceptibility will increase due to the waning of vaccine-induced immunity. Older generations, with a high degree of natural immunity, will, on the other hand, eventually disappear from the population. Us- ing an individual-based model, we investigate the impact of changing age-dependent immunity levels (projected for Flanders, for years 2013 to 2040) on the risk for measles outbreaks. We find that, as time pro- gresses, the risk for measles outbreaks increases, and outbreaks tend to be larger. As such, it is important to not only consider infants when designing strategies for measles elimination, but to also take other age categories into account.
Elise Kuylen, Lander Willem, Niel Hens and Jan Broeckhove

ICCS 2019 Main Track (MT) Session 8

Time and Date: 14:20 - 16:00 on 14th June 2019

Room: 1.5

Chair: Eisha Nathan

225 Immersed boundary method halo exchange in a hemodynamics application [abstract]
Abstract: In recent years, highly parallelized simulations of blood flow resolving individual blood cells have been demonstrated. Simulating such dense suspensions of deformable particles in flow often involves a partitioned fluid-structure interaction (FSI) algorithm, with separate solvers for Eulerian fluid and Lagrangian cell grids, plus a solver - e.g., immersed boundary method - for their interaction. Managing data motion in parallel FSI implementations is increasingly important, particularly for inhomogeneous systems like vascular geometries. In this study, we evaluate the influence of Eulerian and Lagrangian halo exchanges on efficiency and scalability of a partitioned FSI algorithm for blood flow. We describe an MPI+OpenMP implementation of the immersed boundary method coupled with lattice Boltzmann and finite element methods. We consider how communication and recomputation costs influence the optimization of halo exchanges with respect to three factors: immersed boundary interaction distance, cell suspension density, and relative fluid/cell solver costs.
John Gounley, Erik W. Draeger and Amanda Randles
386 Evolution of Hierarchical Structure & Reuse in iGEM Synthetic DNA Sequences [abstract]
Abstract: Many complex systems, both in technology and nature, exhibit hierarchical modularity: smaller modules, each of them providing a certain function, are used within larger modules that perform more complex functions. Previously, we have proposed a modeling framework, referred to as Evo-Lexis, that provides insight to some fundamental questions about evolving hierarchical systems. The predictions of the Evo-Lexis model should be tested using real data from evolving systems in which the outputs can be well represented by sequences. In this paper, we investigate the time series of iGEM synthetic DNA dataset sequences, and whether the resulting iGEM hierarchies exhibit the qualitative properties predicted by the Evo-Lexis framework. Contrary to Evo-Lexis, in iGEM the amount of reuse decreases during the timeline of the dataset. Although this results in development of less cost-efficient and less deep Lexis-DAGs, the dataset exhibits a bias in reusing specific nodes more often than others. This results in the Lexis-DAGs to take the shape of an hourglass with relatively high H-score values and stable set of core nodes. Despite the reuse bias and stability of the core set, the dataset presents a high amount of diversity among the targets which is in line with modeling of Evo-Lexis.
Payam Siyari, Bistra Dilkina and Constantine Dovrolis
475 Computational design of superhelices by local change of the intrinsic curvature [abstract]
Abstract: Helices appear in nature at many scales, ranging from molecules to tendrils in plants. Organisms take advantage of the helical shape to fold, propel and assemble. For this reason, several applications in micro and nanorobotics, drug delivery and soft-electronics have been suggested. On the other hand, biomolecules can form complex tertiary structures made with helices to accomplish many different functions. A particular well-known case takes place during cell division when DNA, a double helix, is packaged into a super-helix – i.e., a helix made of helices – to prevent DNA entanglement. DNA super-helix formation requires auxiliary histone molecules, around which DNA is wrapped, in a "beads on a string" structure. The idea of creating superstructures from simple elastic filaments served as the inspiration to this work. Here we report a method to produce ribbons with complex shapes by periodically creating strains along the ribbons. Ribbons can gain helical shapes, and their helicity is ruled by the asymmetric contraction along the main axis. If the direction of the intrinsic curvature is locally changed, then a tertiary structure results, similar to the DNA wrapped structure. In this process, auxiliary structures are not required and therefore new methodologies to shape filaments, of interest to nanotechnology and biomolecular science, are proposed.
Pedro E. S. Silva, Maria Helena Godinho and Fernão Vístulo de Abreu
493 Spatial modeling of influenza outbreaks in Saint Petersburg using synthetic populations [abstract]
Abstract: In this paper, we model influenza propagation in the Russian setting using a spatially explicit model and a detailed human agent database as its input. The aim of the research is to assess the applicability of this modeling method using influenza incidence data for 2010-2011 epidemic outbreak in Saint Petersburg and to compare the simulation results with the output of the compartmental SEIR model for the same outbreak. For this purpose, a synthetic population of Saint Petersburg was built and used for the simulation via FRED open source modeling framework. The parameters related to the outbreak (background immunity level and effective contact rate) are assessed by calibrating the compartmental model to incidence data. We show that the current version of synthetic population allows the agent-based model to reproduce real disease incidence.
Vasiliy Leonenko, Alexander Lobachev and Georgiy Bobashev

ICCS 2019 Main Track (MT) Session 9

Time and Date: 10:35 - 12:15 on 12th June 2019

Room: 1.3

Chair: Gabriela Schütz

30 A Deep Surrogate Model for Estimating Water Quality Parameters [abstract]
Abstract: For large-scale automated, water quality monitoring, some physical or chemical parameters are unable to be measured directly due to financial or environmental limitations. As an example, the excess nitrogen run-off can cause severe ecological damage to ecosystems. However, the cost of high accuracy measurement of nitrogen is prohibitive, and one can only measure nitrogen in creeks and rivers at selected locations. If nitrate concentrations are related to some other, more readily measured water parameters, it may be possible to use these parameters (“surrogates”) to estimate nitrogen concentrations. Though one can estimate water quality parameters based on some different, but simultaneously monitored parameters, most surrogate models lack the consideration of spatial variation among monitoring stations. Those models are usually developed based on water quality data from a single station and applied to target stations in different locations for estimating water quality properties. In this case, the different weather, geophysical or biological conditions may reduce the effectiveness of the surrogate model’s performance because the surrogate relationship may not be strong between the source and target stations. We propose a deep surrogate model (DSM) for indirect nitrogen measurement in large-scale water quality monitoring networks. The DSM applies a stacked denoising autoencoder to extract the features of the water quality surrogates. This strategy allows one to utilize all the sensory data across the monitoring network, which can significantly extend the size of training data. For data-driven modeling, large amounts of training data collected from various monitoring stations can substantially improve the generalization of the DSM. Furthermore, instead of only learning the regression relationship between water quality surrogates and the nitrogen concentration in the source stations, the DSM is designed to gain the sensor data distribution differences between the source and target stations by calculating the Kullback-Leibler divergence. In this approach, the training of DSM can be guided by acknowledging the information from the target station. Therefore, the performance of the DSM approached will be significantly higher than source station-based approaches. It is because of that the surrogate relationship learned by the DSM includes the diversity among monitoring stations. We evaluate the DSM by using real-world time series data from a wireless water quality monitoring network in Australia. Compared to models based on Support Vector Machine and Artificial Neural Network, the DSM achieves up to 49.0\% and 42.4\% improvements regarding the RMSE and MAE respectively. Hence, the DSM is an attractive strategy for generating the estimated nitrogen concentration for large-scale environmental monitoring projects.
Yifan Zhang, Peter Thorburn and Peter Fitch
103 Six Degrees of Freedom Numerical Simulation of Tilt-Rotor Plane [abstract]
Abstract: Six degrees of freedom coupled simulation is presented for a tilt-rotor plane represented by V-22 Osprey. The Moving Computational Domain (MCD) method is used to compute a flow field around aircraft and the movement of the body with high accuracy. This method enables to move a plane through space without restriction of computational ranges. Therefore it is different from computation of such the flows by using conventional methods that calculate a flow field around a static body placing it in a uniform flow like a wind tunnel. To calculate with high accuracy, no simplification for simulating propeller was used. Fluid flows are created only by moving boundaries of an object. A tilt-rotor plane has a hovering function like a helicopter by turning ax-es of rotor toward the sky during takeoff or landing. On the other hand in flight, it behaves as a reciprocating aircraft by turning axes of rotor forward. To per-form such two flight modes in the simulation, multi-axis sliding mesh approach was proposed which is a computational technique to enable us to deal with multiple axes of different direction. Moreover, using in combination with the MCD method, the approach has been able to be applied to the simulation which has more complicated motions of boundaries.
Ayato Takii, Masashi Yamakawa and Shinichi Asao
300 A Macroscopic Study on Dedicated Highway Lanes for Autonomous Vehicles [abstract]
Abstract: The introduction of AVs will have far-reaching effects on road traffic in cities and on highways. The implementation of AHS, possibly with a dedicated lane only for AVs, is believed to be a requirement to maximise the benefit from the advantages of AVs. We study the ramifications of an increasing percentage of AVs on the whole traffic system with and without the introduction of a dedicated highway AV lane. We conduct a macroscopic simulation of the city of Singapore under user equilibrium conditions with realistic traffic demand. We present findings regarding average travel time, throughput, road usage, and lane-access control. Our results show a reduction of average travel time as a result of increasing the portion of AVs in the system. We show that the introduction of an AV lane is not beneficial in terms of average commute time. Furthermore a notable shift of travel demand away from the highways towards major and small roads is noticed in early stages of AV penetration of the system. Finally, our findings show that after a certain threshold percentage of AVs the differences between AV and no AV lane scenarios become negligible.
Jordan Ivanchev, Alois Knoll, Daniel Zehe, Suraj Nair and David Eckhoff
355 An Agent-Based Model for Evaluating the Boarding and Alighting Efficiency of Public Transport Vehicles [abstract]
Abstract: A key metric in the design of interior layouts of public transport vehicles is the dwell time required to allow passengers to board and alight. Real-world experimentation using physical vehicle mock-ups and involving human participants can be performed to compare dwell times among vehicle designs. However, the associated costs limit such experiments to small numbers of trials. In this paper, we propose an agent-based simulation model of the behavior of passengers during boarding and alighting. High-level strategical behavior is modeled according to the Recognition-Primed Decision paradigm, while the low-level collision-avoidance behavior relies on an extended Social Force Model tailored to our scenario. To enable successful navigation within the confined space of the vehicle, we propose a mechanism to emulate passenger turning while avoiding complex geometric computations. We validate our model against real-world experiments from the literature, demonstrating deviations of less than 11%. In a case study, we evaluate the boarding and alighting times required by three autonomous vehicle interior layouts proposed by industrial designers.
Boyi Su, Philipp Andelfinger, David Eckhoff, Henriette Cornet, Goran Marinkovic, Wentong Cai and Alois Knoll Knoll
243 MLP-IA: Multi-Label User Profile Based on Implicit Association Labels [abstract]
Abstract: Multi-Label user profile is widely used and have made great contributions in the field of recommendation systems, personalized searches, etc. Current researches on multi-label user profile either ignore the associations among labels or only consider the explicit associations among them, which are not sufficient to take full advantage of the internal associations. In this paper, a new insight is presented to mine the internal correlation among implicit association labels. To take advantage of this insight, a multi-label propagation method with implicit associations (MLP-IA) is proposed to get user profile. A probability matrix is first designed to rec-ord the implicit associations and then combine the multi-label propagation method with this probability matrix to get more accurate user profile. Finally, this method proves to be convergent and faster than traditional label propagation algorithm. Experiments on six real-world datasets in Weibo show that, compared with state-of-the-art methods, our approach can accelerate the convergence and its perfor-mance is significantly better than the previous ones.
Lingwei Wei, Wei Zhou, Jie Wen, Jizhong Han and Songlin Hu

ICCS 2019 Main Track (MT) Session 10

Time and Date: 14:40 - 16:20 on 12th June 2019

Room: 1.3

Chair: Yifan Zhang

175 Estimating agriculture NIR images from aerial RGB data [abstract]
Abstract: Remote Sensing in agriculture makes possible the acquisition of large amount of data without physical contact, providing diagnostic tools with important impacts on costs and quality of production. Hyperspectral imaging sensors attached to airplanes or unmanned aerial vehicles (UAVs) can obtain spectral signatures, that makes viable assessing vegetation indices and other charac-teristics of crops and soils. However, some of these imaging technologies are expensive and therefore less attractive to familiar and/or small producers. In this work a method for estimating Near Infrared (NIR) bands from a low-cost and well-known RGB camera is presented. The method is based on a weighted sum of NIR previously acquired from pre-classified uniform areas, using hyperspectral images. Weights (belonging degrees) for NIR spectra were obtained from outputs of K-nearest neighbor classification algorithm. The results showed that presented method has potential to estimate near in-frared band for agricultural areas by using only RGB images with error less than 9%.
Daniel Caio de Lima, Diego Saqui, Steve Ataky, Lúcio Jorge, Ednaldo José Ferreira and José Hiroki Saito
431 Simulation of Fluid Flow in Induced Fractures in Shale by the Lattice Boltzmann Method [abstract]
Abstract: With increasing interest in unconventional resources, understanding the flow in fractures, the gathering system for fluid production in these reservoirs, becomes an essential building block for developing effective stimulation treatment designs. Accurate determination of stress-dependent permeability of fractures requires time-intensive physical experiments on fractured core samples. Unlike previous attempts to estimate permeability through experiments, we utilize 3D Lattice Boltzmann Method simulations for increased understanding of how rock proper-ties and generated fracture geometries influence the flow. Here, both real induced shale rock fractures and synthetic fractures are studied. Digital representations are characterized for descriptive topological parameters, then duplicated, with the upper plane translated to yield an aperture and variable degree of throw. We pre-sent several results for steady LBM flow in characterized, unpropped fractures, demonstrating our methodology. Results with aperture variation in these com-plex, rough-walled geometries are described with a modification to the theoretical cubic law relation for flow in a smooth slit. Moreover, a series of simulations mimicking simple variation in proppant concentration, both in full and partial monolayers, are run to better understand their effects on the permeability of propped fractured systems.
Rahman Mustafayev and Randy Hazlett
433 Numeric computer simulations of the injection molding process of plastic parts [abstract]
Abstract: The plastics industry is continuously demanding for plastic parts with higher surface quality and improved mechanical properties. The injection molding process is the most widely used, and allows producing a huge variety of parts. Computer simulation can be a valuable tool when needing to optimize manufacturing processes, and along the last couple of decades, software was developed specifically to predict the outcome and optimize parameters in injection molding. However, several non-conventional injection molding processes still lack proper computational techniques/approaches. Such is the case of RHCM (Rapid Heating Cycle Molding), involving both heating and cooling cycles, with the injection of the material into the mold being made with a hot mold surface (against the conventional approach where the mold surface is cold). In this work, we explored the limits of state-of-the-art models for simulating this process, given the necessity to use it in a practical industrial application, and we propose a way to use homogenization theory to solve the heat transfer problem. It provides an assessment that in a periodic medium, the solution of the convection-conduction energy equation is approximated by the solution of a heat equation. In this equation, a new term appears: a homogenized conductivity tensor that includes terms that account for convection. This equation can be used for the first time to: (i) study in great detail one single periodic cell of the microstructure and use its results to characterize the performance of the macroscale domain (ii) serve as a model for material engineering in heat transfer applications (iii) model problems in other fields that possess the same physical and geometric nature. Our results are illustrated with analytical analyses and numerical simulations, proving this model can accurately reconstruct the physics behind the heat transfer process. With this novel approach, we can better understand the process, and improve industrial practice for RHCM in injection molding.
Ricardo Simoes, Luis Correia, Paulo Francisco, Carla L. Simões, Luís Faria and Jorge Laranjeira
457 Incentive Mechanism for Cooperative Intrusion Response: A Dynamic Game Approach [abstract]
Abstract: Multi-hop D2D (Device-to-Device) communication may be exposed to many intrusions for its inherent properties, such as openness and weak security protection. To mitigate the intrusions in time, one of significant approaches is to establish a Cooperative Intrusion Response System (CIRS) to detect and respond to the intrusion activities, i.e., during data transmission, User Equipments that act as relays (RUEs) cooperatively help destination nodes to detect and respond intrusion events. However, the CIRS cannot efficiently work in multi-hop D2D communication because the RUEs are selfish and unwilling to spend extra resources on undertaking the intrusion detection and response tasks. To address this problem, an incentive mechanism is required. In this paper, we formulate an incentive mechanism for CIRS in multi-hop D2D communication as a dynamic game and achieve an optimal solution to help RUEs decide whether to participate in detection or not. Theoretical analysis shows that only Nash equilibrium exists for the proposed game. Simulations demonstrate that our mechanism can efficiently motivate potential RUEs to participate in intrusion detection and response, and can also block intrusion propagation in time.
Yunchuan Guo, Xiao Wang, Liang Fang, Yongjun Li, Fenghua Li and Kui Geng
477 A k-Cover Model for Reliability-Aware Controller Placement in Software-Dened Networks [abstract]
Abstract: The main characteristics of Software-Defined Networks are the separation of the control and data planes, as well as a logically centralized control plane. This emerging network architecture simplifies the data forwarding and allows managing the network in a exible way. Controllers play a key role in SDNs since they manage the whole network. It is crucial to determine the minimum number of controllers and where they should be placed to provide low latencies between switches and their assigned controller. It is worth to underline that, if there are long propagation delays between controllers and switches, their ability of reacting to network events quickly is affected, degrading reliability. Thus, the Reliability-Aware Controller Placement (RCP) problem in Software-Defined Networks (SDNs) is a critical issue. In this work we propose a k-cover based model for the RCP problem in SDNs. It simultaneously optimizes the number and placement of controllers, as well as latencies of primary and backup paths between switches and controllers, providing reliable networks against link, switch and controller failures. Although RCP problem is NP-hard, the simulation results show that reliabilities greater than 97%, satisfying low latencies, were obtained and the model can be used to find the optimum solution for different network topologies, in negligible time.
Gabriela Schütz

ICCS 2019 Main Track (MT) Session 11

Time and Date: 16:50 - 18:30 on 12th June 2019

Room: 1.3

Chair: Ricardo Simões

331 Robust Ensemble-Based Evolutionary Calibration of the Numerical Wind Wave Model [abstract]
Abstract: The adaptation of numerical wind wave models to the local time-spatial conditions is a problem that can be solved by using various calibration techniques. However, the obtained sets of physical parameters become over-tuned to specific events if there is a lack of observations. In this paper, we propose a robust evolutionary calibration approach that allows to build the stochastic ensemble of perturbed models and use it to achieve the trade-off between quality and robustness of the target model. The implemented robust ensemble-based evolutionary calibration (REBEC) approach was compared to the baseline SPEA2 algorithm in a set of experiments with the SWAN wind wave model configuration for the Kara Sea domain. Provided metrics for the set of scenarios confirm the effectiveness of the REBEC approach for the majority of calibration scenarios.
Pavel Vychuzhanin, Nikolay Nikitin and Anna Kalyuzhnaya
438 Approximate Repeated Administration Models for Pharmacometrics [abstract]
Abstract: Employing multiple processes in parallel is a common approach to reduce running-times in high-performance computing applications. However, improving performance through parallelization is only part of the story. At some point, all available parallelism is exploited and performance improvements need to be sought elsewhere. As part of drug development trials, a compound is periodically administered, and the interactions between it and the human body are modeled through pharmacokinetics and pharmacodynamics by a set of ordinary differential equations. Numeric integration of these equations is the most computationally intensive part of the fitting process. For this task, parallelism brings little benefit. This paper describes how to exploit the nearly periodic nature of repeated administration models by numeric application of the method of averaging on the one hand and reusing previous computational effort on the other hand. The presented method can be applied on top of any existing integrator while requiring only a single tunable threshold parameter. Performance improvements and approximation error are studied on two pharmacometrics models. In addition, automated tuning of the threshold parameter is demonstrated in two scenarios. Up to 1.7-fold and 70-fold improvements are measured with the presented method for the two models respectively.
Balazs Nemeth, Tom Haber, Jori Liesenborgs and Wim Lamotte
466 Evolutionary Optimization of Intruder Interception Plans for Mobile Robot Groups [abstract]
Abstract: The task of automated intruder detection and interception is often considered as a suitable application for groups of mobile robots. Realistic versions of the problem include representing uncertainty, which turns it into NP-hard optimization tasks. In this paper we define the problem of indoor intruder interception with probabilistic intruder motion model and uncertainty of intruder detection. We define a model for representing the problem and propose an algorithm for optimizing plans for groups of mobile robots patrolling the building. The proposed evolutionary multi-agent algorithm uses a novel representation of solutions. The algorithm has been evaluated using different problem sizes and compared with other methods.
Wojciech Turek, Agata Kubiczek and Aleksander Byrski
434 Synthesizing quantum circuits via numerical optimization [abstract]
Abstract: We provide a simple framework for the synthesis of quantum circuits based on a numerical optimization algorithm. This algorithm is used in the context of the trapped-ions technology. We derive theoretical lower bounds for the number of quantum gates required to implement any quantum algorithm. Then we present numerical experiments with random quantum operators where we compute the optimal parameters of the circuits and we illustrate the correctness of the theoretical lower bounds. We finally discuss the scalability of the method with the number of qubits.
Timothée Goubault de Brugière, Marc Baboulin, Benoît Valiron and Cyril Allouche
455 Application of continuous time quantum walks to image segmentation [abstract]
Abstract: This paper provides the new algorithm that applies concept of continuous time quantum walks to image segmentation problem. The work, inspired by results from its classical counterpart, presents and compares two versions of the solution regarding calculation of pixel-segment association: the version using limiting distribution of the walk and the version using last step distribution. The obtained results vary in terms of accuracy and possibilities to be ported to a real quantum device. The described results were obtained by simulation on classical computer, but the algorithms were designed in a way that will allow to use a real quantum computer, when ready.
Michał Krok, Katarzyna Rycerz and Marian Bubak

ICCS 2019 Main Track (MT) Session 12

Time and Date: 10:15 - 11:55 on 13th June 2019

Room: 1.3

Chair: Katarzyna Rycerz

47 Synchronized Detection and Recovery of Steganographic Messages with Adversarial Learning [abstract]
Abstract: In this work, we mainly study the mechanism of learning the steganographic algorithm as well as combining the learning process with adversarial learning to learn a good steganographic algorithm. To handle the problem of embedding secret messages into the specific medium, we design a novel adversarial modules to learn the steganographic algorithm, and simultaneously train three modules called generator, discriminator and steganalyzer. Different from existing methods, the three modules are formalized as a game to communicate with each other. In the game, the generator and discriminator attempt to communicate with each other using secret messages hidden in an image. While the steganalyzer attempts to analyze whether there is a transmission of confidential information. We show that through unsupervised adversarial training, the adversarial model can produce robust steganographic solutions, which act like an encryption. Furthermore, we propose to utilize supervised adversarial training method to train a robust steganalyzer, which is utilized to discriminate whether an image contains secret information. Numerous experiments are conducted on publicly available dataset to demonstrate the effectiveness of the proposed method.
Haichao Shi, Xiao-Yu Zhang, Shupeng Wang, Ge Fu and Jianqi Tang
64 Multi-Source Manifold Outlier Detection [abstract]
Abstract: Outlier detection is an important task in data mining, with many practical applications ranging from fraud detection to public health. However, with the emergence of more and more multi-source data in many real-world scenarios, the task of outlier detection becomes even more challenging as traditional mono-source outlier detection techniques can no longer be suitable for multi-source heterogeneous data. In this paper, a general framework based the consistent representations is proposed to identify multi-source heterogeneous outlier. According to the information compatibility among different sources, Manifold learning are combined in the proposed method to obtain a shared representation space, in which the information-correlated representations are close along manifold while the semantic-complementary instances are close in Euclidean distance. Furthermore, the multi-source outliers can be effectively identified in the affine subspace which is learned through affine combination of shared representations from different sources in the feature-homogeneous space. Comprehensive empirical investigations are presented that confirm the promise of our proposed framework.
Lei Zhang and Shupeng Wang
155 A Fast NN-based Approach for Time Sensitive Anomaly Detection over Data Streams [abstract]
Abstract: Anomaly detection is an important data mining method aiming to discover outliers that show significant diversion from their expected behavior. A widely used criteria for determining outliers is based on the number of their neighboring elements, which are referred to as Nearest Neighbors (NN). Existing NN-based Anomaly Detection (NN-AD) algorithms cannot detect streaming outliers, which present time sensitive abnormal behavior characteristics in different time intervals. In this paper, we propose a fast NN-based approach for Time Sensitive Anomaly Detection (NN-TSAD), which can find outliers that present different behavior characteristics, including normal and abnormal characteristics, within different time intervals. The core idea of our proposal is that we combine the model of sliding window with Locality Sensitive Hashing (LSH) to monitor streaming elements distribution as well as the number of their Nearest Neighbors as time progresses. We use an ϵ-approximation scheme to implement the model of sliding window to compute Nearest Neighbors on the fly. We conduct widely experiments to examine our approach for time sensitive anomaly detection using three real-world data sets. The results show that our approach can achieve significant improvement on recall and precision for anomaly detection within different time intervals. Especially, our approach achieves two orders of magnitude improvement on time consumption for streaming anomaly detection, when compared with traditional NN-based anomaly detection algorithms, such as exact-Storm, approx-Storm, MCOD etc, while it only uses 10 percent of memory consumption.
Guangjun Wu, Zhihui Zhao, Ge Fu and Haiping Wang
199 Causal links between geological attributes of oil and gas reservoir analogues [abstract]
Abstract: Oil and gas reservoirs are distributed across the globe at different depth and geological ages. Although some petroleum deposits are situated spatially far from each other, they may share similar distributions of continuous attributes describing formation and fluid properties as well as categorical attributes describing tectonic regimes and depositional environments. In that case they are called reservoir analogues. Information about thousands of reservoirs from around the world forms a solid basis for uncertainty evaluation and missing data imputation. Besides these routine tasks in the industry, such dataset allows to obtain probabilistic reasoning through frequency analysis. This work presents graphical representation of causal links between geological attributes of reservoir analogues.
Nikita Bukhanov, Arthur Sabirov, Oksana Popova and Stanislav Slivkin
212 n-gram Cache Performance in Statistical Extraction of Relevant Terms in Large Corpora [abstract]
Abstract: Statistical extraction of relevant n-grams in natural language corpora is important for text indexing and classication since it can be language independent. We show how a theoretical model identies the distribution properties of the distinct n-grams and singletons appearing in large corpora and how this knowledge contributes to understanding the performance of an n-gram cache system used for extraction of rel- evant terms. We show how this approach allowed us to evaluate the benets from using Bloom lters for excluding singletons and from using static prefetching of nonsingletons in an n-gram cache. In the context of the distributed and parallel implementation of the LocalMaxs extraction method, we analyze the performance of the cache miss ratio and size, and the eciency of n-gram cohesion calculation with LocalMaxs.
Carlos Goncalves, Joaquim Silva and Jose Cunha

ICCS 2019 Main Track (MT) Session 13

Time and Date: 14:20 - 16:00 on 13th June 2019

Room: 1.3

Chair: Carlos Gonçalves

71 Lung Nodule Diagnosis via Deep Learning and Swarm Intelligence [abstract]
Abstract: Cancer diagnosis is usually an arduous task for medicine, specially when it comes to pulmonary cancer, which is one of the most deadly and hard to treat types of cancer. Early detection of pulmonary cancerous nodules drastically increases surviving chances, but also makes it an even harder problem to solve, as it mostly depends on a visual inspection of tomography scans. To help improving this detection and surviving rates, engineers and scientist have been developing computer-aided diagnosis techniques, as the one presented in this paper. Here, we use computational intelligence to propose a new approach towards solving the problem of detecting pulmonary carcinogenic nodules in computerized tomography scans. The technology applied consists in using Deep Learning and Swarm Intelligence to develop a novel nodule detection and classification model. Seven different Swarm Intelligence algorithms and Convolutional Neural Networks for biomedical image segmentation are used to detect and classify cancerous pulmonary nodules in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). The aim of this work is to train Convolutional Neural Networks using swarm intelligence techniques and demonstrate that this approach is more efficient than the classic training with Back-propagation and Gradient Descent. It improves the average accuracy from 93% to 94%, precision from 92% to 94%, sensitivity from 91% to 93% and specificity from 97% to 98%, which constitute a relevant improvement regarding the statistical T-test.
Cesar Affonso De Pinho Pinheiro, Nadia Nedjah and Luiza de Macedo Mourelle
85 Marrying Graph Kernel with Deep Neural Network: A Case Study for Network Anomaly Detection [abstract]
Abstract: Network anomaly detection has caused widespread concern among researchers and the industry. Existing work mainly focuses on applying machine learning techniques to detect network anomalies. The ability to exploit the potential relationships of communication patterns in network traffic has been the focus of many existing studies. Graph kernels provide a powerful means for representing complex interactions between entities, while deep neural networks break through new foundations for the reason that data representation in the hidden layer is formed by specific tasks and is thus customized for network anomaly detection. However, deep neural networks cannot learn communication patterns among network traffic directly. At the same time, deep neural networks require a large amount of training data and are computationally expensive, especially when considering the entire network flows. For these reasons, we employ a novel method to marry graph kernels to deep neural networks, which exploits the relationship expressiveness among network flows and combines ability of neural networks to mine hidden layers and enhances the learning effectiveness when a limited number of training examples are available. We evaluate the proposed method on two real-world datasets which contains low-intensity network attacks and experimental results reveal that our model achieves significant improvements in accuracies over existing network anomaly detection tasks.
Yepeng Yao, Liya Su, Zhigang Lu and Baoxu Liu
114 Machine learning for performance enhancement of molecular dynamics simulations [abstract]
Abstract: We explore the idea of integrating machine learning with simulations to enhance the performance of the simulation and improve its usability for research and education. The idea is illustrated using hybrid openMP/MPI parallelized molecular dynamics simulations designed to extract the distribution of ions in nanoconfinement. We find that an artificial neural network based regression model successfully learns the desired features associated with the output ionic density profiles and rapidly generates predictions that are in excellent agreement with the results from explicit molecular dynamics simulations. The results demonstrate that the performance gains of parallel computing can be further enhanced by using machine learning.
Jcs Kadupitiya, Geoffrey Fox and Vikram Jadhao
210 2D-Convolution based Feature Fusion for Cross-Modal Correlation Learning [abstract]
Abstract: Cross-modal information retrieval (CMIR) enables users to search for semantically relevant data of various modalities from a given query of one modality. The predominant challenge is to alleviate the "heterogeneous gap" between different modalities. For text-image retrieval, the typical solution is to project text features and image features into a common semantic space and measure the cross-modal similarity. However, semantically relevant data from different modalities usually contains imbalanced information. Aligning all the modalities in the same space will weaken modal-specific semantics and introduce unexpected noise. In this paper, we propose a novel CMIR framework based on multi-modal feature fusion. In this framework, the cross-modal similarity is measured by directly analyzing the fine-grained correlations between the text features and image features without common semantic space learning. Specifically, we preliminarily construct a cross-modal feature matrix to fuse the original visual and textural features. Then the 2D-convolutional networks are proposed to reason about inner-group relationships among features across modalities, resulting in fine-grained text-image representations. The cross-modal similarity is measured by a multi-layer perception based on the fused feature representations. We conduct extensive experiments on two representative CMIR datasets, i.e. English Wikipedia and TVGraz. Experimental results indicate that our model outperforms state-of-the-art methods significantly. Meanwhile, the proposed cross-modal feature fusion approach is more effective in the CMIR tasks compared with other feature fusion approaches.
Jingjing Guo, Jing Yu, Yuhang Lu, Yue Hu and Yanbing Liu
222 DunDi: Improving Robustness of Neural Networks using Distance Metric Learning [abstract]
Abstract: The deep neural networks (DNNs), although highly accurate, are vulnerable to adversarial attacks. A slight perturbation applied to a sample may lead to misprediction of the DNN, even it is imperceptible to humans. This defect makes the DNN lack of robustness to malicious perturbations, and thus limits their usage in many safety-critical systems. To this end, we present DunDi, a metric learning based classication model, to provide the ability to defend adversarial attacks. The key idea behind DunDi is a metric learning model which is able to pull samples of the same label together meanwhile pushing samples of dierent labels away. Consequently, the distance between samples and model's boundary can be enlarged accordingly, so that signicant perturbations are required to fool the model. Then, based on the distance comparison, we propose a two-step classication algorithm that performs eciently for multi-class classication. DunDi can not only build and train a new customized model but also support the incorporation of the available pre-trained neural network models to take full advantage of their capabilities. The results show that DunDi is able to defend 94.39% and 88.91% of adversarial samples generated by four state-of-the-art adversarial attacks on the MNIST dataset and CIFAR-10 dataset, without hurting classication accuracy.
Lei Cui, Rongrong Xi and Zhiyu Hao

ICCS 2019 Main Track (MT) Session 14

Time and Date: 16:30 - 18:10 on 13th June 2019

Room: 1.3

Chair: Nadia Nedjah

278 Function and pattern extrapolation with product-unit networks [abstract]
Abstract: Neural networks are a popular method for function approximation and data classification and have recently drawn much attention because of the success of deep-learning strategies. Artificial neural networks are built from elementary units that generate a piecewise, often almost linear approximation of the function or pattern. To improve the extrapolation of nonlinear functions and patterns beyond the training domain, we propose to augment the fundamental algebraic structure of neural networks by a product unit that computes the product of its inputs raised to the power of their weights, namely $\prod_{i} x_i^{w_i}$. Linearly combining their outputs in a weighted sum allows representing most nonlinear functions known in calculus, including roots, fractions and approximations of power series. We train the network using gradient descent. The enhanced extrapolation capabilities of the network are demonstrated by comparing the results for a function and pattern extrapolation task with those obtained with the nonlinear support vector machine (SVM) and a standard neural network (standard NN). Convergence behavior of stochastic gradient descent is discussed and the feasibility of the approach is demonstrated in a real-world application in image segmentation.
Babette Dellen, Uwe Jaekel and Marcell Wolnitza
333 Fast and Scalable Outlier Detection with Metric Access Methods [abstract]
Abstract: It is well-known that the theoretical models existing for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. With that in mind, this paper describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of high impact, like spotting cyberattacks, clinical pathologies and abnormalities in nature. We present the lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialist’s point of view, which is a shift from the designer-based point of view that is commonly considered. Interestingly, many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel ABOD- based algorithm that makes the analysis up to thousands of times faster, still being in average 12% more accurate than the most accurate related work. This improvement is essential to enable outlier detection in many real-world applications for which the existing methods lead to unexpected results or unfeasible runtime requirements. Finally, we studied two real collections of text data to show that our MetricABOD works also for adimensional, purely metric data.
Altamir Gomes Bispo Junior and Robson Leonardo Ferreira Cordeiro
384 Deep Learning Based LSTM and SeqToSeq Models to Detect Monsoon Spells of India [abstract]
Abstract: Monsoon spells are important climatic phenomenon modulating the quality and quantity of monsoon over an year. India being an agricultural country, identification of monsoonal spells is extremely important to plan agricultural policies following the phases of monsoon to attain maximum productivity. Monsoon spells' detection involve analyzing and predicting monsoon at daily levels which make it more challenging as daily-variability is higher as compared to monsoon over a month or an year. In this article, deep-learning based long short-term memory and sequence-to-sequence models are utilized to classify monsoon days, which are finally assembled to detect the spells. Dry and wet days are classified with precision of 0.95 and 0.87, respectively. Break spells are observed to be forecast with higher accuracy than the active spells. Additionally, sequence-to-sequence model is noted to perform superior to that of long-short term memory model. The proposed models also outperform traditional classification models for monsoon spell detection.
V. Saicharan, Moumita Saha, Pabitra Mitra and Ravi S. Nanjundiah
507 Data Analysis for Atomic Shapes in Nuclear Science [abstract]
Abstract: One of the overarching questions in the field of nuclear science is how simple phenomena emerges from complex systems. A nucleus is composed of both protons and neutrons and while many assume the atomic nucleus adopts a spherical shape, the nuclear shape is, in fact, quite variable. Nuclear physicists seek to understand the shape of the atomic nucleus by probing specific transitions between nuclear energy states which occur at high energy with short timescales. This is achieved through detecting a unique experimental signature in the recorded time-series data in experiments conducted at the National Superconducting Cyclotron Laboratory. The current method involves fitting each sample in the dataset to a given parameterized model function. However, this procedure is computationally expensive due to the nature of the nonlinear curve fitting problem. Since data is skewed towards non-unique signatures, we offer a way to filter out the majority of the uninteresting samples from the dataset by using machine learning methods. By doing so, we decrease the computational costs for detection of the unique experimental signatures in the time-series data. Also, we present a way to generate synthetic training data by estimating the distribution of the underlying parameters of the model function with Kernel Density Estimation. The new workflow that leverages machine learned classifiers trained on the synthetic data are shown to significantly outperform the current procedures used in actual datasets.
Mehmet Kaymak, Hasan Metin Aktulga, Fox Ron and Sean Liddick

ICCS 2019 Main Track (MT) Session 15

Time and Date: 10:15 - 11:55 on 14th June 2019

Room: 1.3

Chair: Koen van der Zwet

106 A Novel Partition Method for Busy Urban Area Based on Spatial-Temporal Information [abstract]
Abstract: Finding the regions where people appear plays a key role in many fields like user behavior analysis, urban planning, etc. Therefore, as the first step, how to partition the world, especially the urban areas where people are crowd and active, into regions is very crucial. In this paper, we propose a novel method called Restricted Spatial-Temporal DBSCAN(RST-DBSCAN). The key idea is to partition busy urban areas based on spatial-temporal information. Arbitrary and separated shapes of regions in urban areas would be then obtained. Besides, we would further get busier region earlier by RST-DBSCAN. Experimental results show that our approach yields significant improvements over existing methods on a real-world dataset extracted from Gowalla, a location-based social network.
Ai Zhengyang, Zhang Kai, Shupeng Wang, Chao Li, Xiao-Yu Zhang and Shicong Li
116 Simple Spatial Scaling Rules behind Complex Cities [abstract]
Abstract: Although most of wealth and innovation have been the result of human interaction and cooperation, we are not yet able to quantitatively predict the spatial distributions of three main elements of cities: population, roads, and socioeconomic interactions. By a simple model mainly based on spatial attraction and matching growth mechanisms, we reveal that the spatial scaling rules of these three elements are in a consistent framework, which allows us to use any single observation to infer the others. All numerical and theoretical results are consistent with empirical data from ten representative cities. In addition, our model can also provide a general explanation of the origins of the universal super- and sub-linear aggregate scaling laws and accurately predict kilometre-level socioeconomic activity. And the theoretical analysis method is original which is based on growth instead of mean-field assumptions. The active population (AP) concept proposed by us is another contribution, which is a mixture of residential and working populations according to the duration of their activities in the region. AP is a more appropriate proxy than simply residential population for estimating socioeconomic activities. The density distribution of AP is ρ(r)∝r^(-β) (R_t^(1+β)-r^(1+β) )~r^(-β) which can also reconcile the conflict between area-size allometry and the exponential decay of population from city centre to urban fringe found in the literature. Our work opens a new avenue for uncovering the evolution of cities in terms of the interplay among urban elements, and it has a broad range of applications.
Ruiqi Li, Xinmiao Sun and Gene Stanley
144 Mention Recommendation with Context-aware Probabilistic Matrix Factorization [abstract]
Abstract: Mention as a key feature on social networks can break through the effect of structural trapping and expand the visibility of a message. Although existing works usually use rank learning as implementation strategy before performing mention recommendation, these approaches may interfere with the influening factor exploration and cause some biases. In this paper, we propose a novel Context-aware Mention recommendation model based on Probabilistic Matrix Factorization (CMPMF). This model considers four important mention contextual factors including topic relevance, mention affinity, user profile similarity and message semantic similarity to measure the relevance score from users and messages dimensions. We fuse these mention contextual factors in latent spaces into the framework of probabilistic matrix factorization to improve the performance of mention recommendation. Through evaluation on a real-world dataset from Weibo, the empirically study demonstrates the effectiveness of discovered mention contextual factors. We also observe that topic relevance and mention affinity play a much significant role in the mention recommendation task. The results demonstrate our proposed method outperforms the state-of-the-art algorithms.
Bo Jiang, Ning Li and Zhigang Lu
167 Synchronization under control in complex networks for a panic model [abstract]
Abstract: After a sudden catastrophic event occurring in a population of individuals, panic can spread, persist and become more problematic than the catastrophe itself. In this paper, we explore through a computational approach the possibility to control the panic level in complex networks built with a recent behavioral model. After stating a rigorous theoretical framework, we propose a numerical investigation in order to establish the effect of the topology of the network on this control process, with randomly generated networks, and we compare the panic level for two distinct topology sets on a given network.
Cantin Guillaume, Lanza Valentina and Verdière Nathalie
214 Personalized Ranking in Dynamic Graphs Using Nonbacktracking Walks [abstract]
Abstract: Centrality has long been studied as a method of identifying node importance in networks. In this paper we study a variant of several walk-based centrality metrics based on the notion of a nonbacktracking walk, where the pattern iji is forbidden in the walk. Specifically, we focus our analysis on dynamic graphs, where the underlying data stream the network is drawn from is constantly changing. Efficient algorithms for calculating nonbactracking walk centrality scores in static and dynamic graphs are provided and experiments on graphs with several million vertices and edges are conducted. For the static algorithm, comparisons to a traditional linear algebraic method of calculating scores show that our algorithm produces scores of high accuracy within a theoretically guaranteed bound. Comparisons of our dynamic algorithm to the static show speedups of several orders of magnitude as well as a significant reduction in space required.
Eisha Nathan, Geoffrey Sanders and Van Henson

ICCS 2019 Main Track (MT) Session 16

Time and Date: 14:20 - 16:00 on 14th June 2019

Room: 1.3

Chair: Cantin Guillaume

233 An Agent-Based Model for Emergent Opponent Behavior [abstract]
Abstract: Organized crime, insurgency and terrorist organizations have a large and undermining impact on societies. This highlights the urgency to better understand the complex dynamics of these individuals and organizations in order to timely detect critical social phase transitions that form a risk for society. In this paper we introduce a new multi-level modelling approach that integrates insights from complex systems, criminology, psychology, and organizational studies with agent-based modelling. We use a bottom-up approach to model the active and adaptive reactions by individuals to the society, the economic situation and law enforcement activity. This approach enables analyzing the behavioral transitions of individuals and associated micro processes, and the emergent networks and organizations influenced by events at meso- and macro-level. At a meso-level it provides an experimentation analysis modelling platform of the development of opponent organization subject to the competitive characteristics of the environment and possible interventions by law enforcement. While our model is theoretically founded on findings in literature and empirical validation is still work in progress, our current model already enables a better understanding of the mechanism leading to social transitions at the macro-level. The potential of this approach is illustrated with computational results.
Koen van der Zwet, Ana Isabel Barros, Tom van Engers and Bob van der Vecht
352 Fine-Grained Label Learning via Siamese Network for Cross-modal Information Retrieval [abstract]
Abstract: Cross-modal information retrieval aims to search for semantically relevant data from various modalities when given a query from one modality. For text-image retrieval, a common solution is to map texts and images into a common semantic space and measure their similarity directly. Both the positive and negative examples are used for common semantic space learning. Existing work treats the positive/negative text-image pairs as equally positive/negative. However, we observe that many positive examples resemble the negative ones in some degrees and vice versa. These “hard examples” are challenging for existing models. In this paper, we aim to assign fine-grained labels for the examples to capture the degrees of “hardness”, thus enhancing cross-modal correlation learning. Specifically, we propose a siamese network on both the positive and negative examples to obtain their semantic similarities. For each positive/negative example, we use the text description of the image in the example to calculate its similarity with the text in the example. Based on these similarities, we assign fine-grained labels to both the positives and negatives and introduce these labels to a pairwise similarity loss function. The loss function benefits from the labels to increase the influence of hard examples on the similarity learning while maximizing the similarity of relevant text-image pairs and minimizing the similarity of irrelevant pairs. We conduct extensive experiments on the English Wikipedia, Chinese Wikipedia, and TVGraz datasets. Compared with state-of-the-art models, our model achieves significant improvement on the retrieval performance by incorporating with fine-grained labels.
Yiming Xu, Jing Yu, Jingjing Guo, Yue Hu and Jianlong Tan
354 MeshTrust: A CDN-centric Trust Model for Reputation Management on Video Traffic [abstract]
Abstract: Video applications today are more often deploying content delivery networks (CDNs) for content delivery. However, by decoupling the owner of the content and the organization serving it, CDNs could be abused by attackers to commit network crimes. Traditional flow-level measurements for generating reputation of IPs and domain names for video applications are insufficient. In this paper, we present MeshTrust, a novel approach that assessing reputation of service providers on video traffic automatically. We tackle the challenge from two aspects: the multi- tenancy structure representation and CDN-centric trust model. First, by mining behavioral and semantic characteristics, a Mesh Graph consisting of video websites, CDN nodes and their relations is constructed. Second, we introduce a novel CDN-centric trust model which transforms Mesh Graph into Trust Graph based on extended network embedding methods. Based on the labeled nodes in Trust Graph, a reputation score can be easily calculated and applied to real-time reputation management on video traffic. Our experiments show that MeshTrust can differentiate normal and illegal video websites with accuracy approximately 95% in a real cloud environment.
Xiang Tian, Yujia Zhu, Zhao Li, Chao Zheng, Qingyun Liu and Yong Sun
451 Optimizing spatial accessibility of company branches network with constraints [abstract]
Abstract: The ability of customer data collection in enterprise corporate information systems leads to the emergence of customer-centric algorithms and approaches. In this study, we consider the problem of choosing a candidate branch for closing based on the overall expected level of dissatisfaction of company customers with the location of remaining branches. To measure the availability of branches for individuals, we extract points of interests from the traces of visits using the clustering algorithm to find centers of interests. The following questions were further considered: (i) to which extent does spatial accessibility influence the choice of company branches by the customers? (ii) which algorithm provides a better trade-off between accuracy and computational complexity? These questions were studied in application to a bank branches network. In particular, data and domain restrictions from our bank-partner (one of the largest regional banks in Russia) were used. The results show that: (i) spatial accessibility significantly influences customers' choice (65%–75% of customers choose one of the top 5 branches by accessibility after closing the branch), (ii) the proposed greedy algorithm provides optimal solution in almost all of cases, (iii) output of the greedy algorithm may be further improved with a local search algorithm, (iv) instance of a problem with several dozens of branches and up to million customers may be solved with near-optimal quality in dozens of seconds.
Oleg Zaikin, Ivan Derevitskii, Klavdiya Bochenina and Janusz Holyst