ICCS 2019 Main Track (MT) Session 4

Time and Date: 10:15 - 11:55 on 13th June 2019

Room: 1.5

Chair: Jens Weismüller

15	Analysis of the construction of similarity matrices on multi-core and many-core platforms using different similarity metrics [abstract] Abstract: Similarity matrices are 2D representations of the degree of similarity between points of a given dataset which are employed in different fields such as data mining, genetics or machine learning. However, their calculation presents quadratic complexity and, thus, it is specially expensive for large datasets. MPICorMat is able to accelerate the construction of these matrices through the use of a hybrid parallelization strategy based on MPI and OpenMP. The previous version of this tool achieved high performance and scalability, but it only implemented one single similarity metric, the Pearson’s correlation. Therefore, it was suitable only for those problems where data are normally distributed and there is a linear relationship between variables. In this work, we present an extension to MPICorMat that incorporates eight additional metrics for similarity so that the users can choose the one that best adapts to their problem. The performance and energy consumption of each metric is measured in two platforms: a multi-core platform with two Intel Xeon Sandy-Bridge processors and a many-core Intel Xeon Phi KNL. Results show that MPICorMat executes faster and consumes less energy on the many-core architecture. The new version of MPICorMat is publicly available to download from its website: https://sourceforge.net/projects/mpicormat/	Uxía Casal, Jorge González-Domínguez and María J. Martín
16	High Performance Algorithms for Counting Collisions and Pairwise Interactions [abstract] Abstract: The problem of counting collisions or interactions is common in areas as computer graphics and scientific simulations. Since it is a major bottleneck in applications of these areas, a lot of research has been done on such subject, mainly focused on techniques that allow calculations to be performed within pruned sets of objects. This paper focuses on how interaction calculation (such as collisions) within these sets can be done more efficiently than existing approaches. Two algorithms are proposed: a sequential algorithm that has linear complexity at the cost of high memory usage; and a parallel algorithm, mathematically proved to be correct, that manages to use GPU resources more efficiently than existing approaches. The proposed and existing algorithms were implemented, and experiments show a speedup of 21.7 for the sequential algorithm (on small problem size), and 1.12 for the parallel proposal (large problem size). By improving interaction calculation, this work contributes to research areas that promote interconnection in the modern world, such as computer graphics and robotics.	Matheus Saldanha and Paulo Souza
206	Comparing domain-decomposition methods for the parallelization of distributed land surface models [abstract] Abstract: Current research challenges in hydrology require models with a high resolution on a global scale. These requirements stand in great contrast to the current capabilities of distributed land surface mod- els. Hardly any literature noting efficient scalability past approximately 64 processors could be found. Porting these models to supercomputers is no simple task, because the greater part of the computational load stems from the evaluation of highly parametrized equations. Further- more, the load is heterogeneous in both spatial and temporal dimension, and considerable load-imbalances are triggered by input data. We inves- tigate different domain-decomposition methods for distributed land sur- face models and focus on their properties concerning load balancing and communication minimizing partitionings. Artificial strong scaling exper- iments from a single core to 8, 192 cores show that graph-based methods can distribute the computational load of the application almost as ef- ficiently as coordinate-based methods, while the partitionings found by the graph-based methods significantly reduce communication overhead.	Alexander von Ramm, Jens Weismüller, Wolfgang Kurtz and Tobias Neckel
228	Analysis and Detection on Abused Wildcard Domain Names Based on DNS Logs [abstract] Abstract: Wildcard record is a type of resource records (RRs) in DNS, which can allow any domain name in the same zone to map to a single record value. Former works have made use of DNS zone file data and domain name blacklists to understand the usage of wildcard domain names. In this paper, we analyze wildcard domain names in real network DNS logs, and present some novel findings. By analyzing web contents, we found that the proportion of domain names related to pornography and online gambling contents (referred as abused domain names in this work) in wildcard domain names is much higher than that in non-wildcard domain names. By analyzing behaviors of registration, resolution and maliciousness, we found that abused wildcard domain names have remarkably higher risks in security than normal wildcard domain names. Then, based on the analysis, we proposed GSCS algorithm to detect abused wildcard domain names. GSCS is based on a domain graph, which can give insights on the similarities of abused wildcard domain names’ resolution behaviors. By applying spectral clustering algorithm and seed domains, GSCS can distinguish abused wildcard domain names from normal ones effectively. Experiments on real datasets indicate that GSCS can achieve about 86% detection rates with 5% seed domains, performing much better than BP algorithm.	Guangxi Yu, Yan Zhang, Huajun Cui, Xinghua Yang, Yang Li and Huiran Yang
275	XScan: An Integrated Tool for Understanding Open Source Community-based Scientific Code [abstract] Abstract: Many scientific communities have adopted community-based models that integrate multiple components to simulate whole system dynamics. The community-based models' software complex, stems from the integration of multiple individual software components that were developed under different application requirements and various machine architectures, has become a challenge for effective software system understanding and continuous software development. The paper presents an integrated software toolkit called X-ray Software Scanner (in abbreviation, XScan) for a better understanding of large-scale community-based scientific codes. Our software tool provides support to quickly summarize the overall information of scientific codes, including the number of lines of code, programming languages, external library dependencies, as well as architecture-dependent parallel software features. The XScan toolkit also realizes a static software analysis component to collect detailed structural information and provides an interactive visualization and analysis of the functions. We use a large-scale community-based Earth System Model to demonstrate the workflow, functions, and visualization of the toolkit. We also discuss using advanced graph analytics techniques to assist software modular design and component refactoring.	Weijian Zheng, Dali Wang and Fengguang Song