Session2 14:40 - 16:20 on 12th June 2018

ICCS 2019 Main Track (MT) Session 2

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 1.5

Chair: To be announced

453 Optimization of Demodulation for Air-Gap Data Transmission based on Backlight Modulation of Screen [abstract]
Abstract: Air-gap is an efficient technique for the improving of computer security. Proposed technique uses backlight modulation of monitor screen for data transmission from infected computer. The optimization algorithm for the segmentation of video stream is proposed for the improving of data transmission robustness. This algorithm is tested using Mote Carlo approach with full frame analysis for different values of standard deviations of additive Gaussian noise. Achieved results show improvements for proposed selective image processing for low values of standard deviation about ten times.
Dawid Bak, Przemyslaw Mazurek and Dorota Oszutowska-Mazurek
304 Reinsertion algorithm based on destroy and repair operators for dynamic dial a ride problems [abstract]
Abstract: The Dial-a-Ride Problem (DARP) consists in serving a set of customers who specify their pickup and drop-off locations using a fleet of vehicles. The aim of DARP is designing vehicle routes satisfying requests of customers and minimizing the total traveled distance. In this paper, we consider a real case of dynamic DARP service operated by Padam which offers a high quality transportation service in which customers ask for a service either in advance or in real time and get an immediate answer about whether their requests are accepted or rejected. A fleet of fixed number of vehicles is available during a working period of time to provide a transportation service. The goal is to maximize the number of accepted requests during the service. In this paper, we propose an original and novel online Reinsertion Algorithm based on destroy/repair operators to reinsert requests rejected by the online algorithm used by Padam. When the online algorithm fails to insert a new customer, the proposed algorithm intensively exploits the neighborhood of the current solution using destroy/repair operators to attempt to find a new solution, allowing the insertion of the new client while respecting the constraints of the problem. The proposed algorithm was implemented in the opti- mization engine of Padam and extensively tested on real hard instances up to 1011 requests and 14 vehicles. The results show that our method succeeds in improving the number of accepted requests while keeping similar transportation costs on almost all instances, despite the hardness of the real instances. In half of the cases, reduction of the number of vehicles is attained, which is a huge benefit for the company.
Sven Vallée, Ammar Oulamara and Wahiba Ramdane Cherif-Khettaf
399 Optimization heuristics for computing the Voronoi skeleton [abstract]
Abstract: A skeleton representation of geometrical objects is widely used in computer graphics, computer vision, image processing, and pattern recognition. Therefore, efficient algorithms for computing planar skeletons are of high relevance. In this paper, we focus on the algorithm for computing the Voronoi skeleton of a planar object represented by a set of polygons. The complexity of the considered Voronoi skeletonization algorithm is O(N log N), where N is the total number of polygon’s vertices. In order to improve the performance of the skeletonization algorithm, we proposed theoretically justified shape optimization heuristics basing on polygon simplification algorithms. We evaluated the efficiency of such heuristics using polygons extracted from MPEG 7 CE-Shape-1 dataset and measured the execution time of the skeletonization algorithm, computational overheads related to the introduced heuristics and also the influence of the heuristic onto the accuracy of the resulting skeleton. As a result, we established the criteria allowing us to choose the optimal heuristics for Voronoi skeleton construction algorithm depending on the critical system’s requirements.
Dmytro Kotsur and Vasyl Tereschenko
239 Transfer Learning for Leisure Centre Energy Consumption Prediction [abstract]
Abstract: Demand for energy is ever growing. Accurate prediction of energy demand of large buildings becomes essential for property management to operate these facilitates more efficiently and greener. Various temporal modelling provides reliable yet straightforward paradigm for short term building energy prediction. However newly constructed buildings, newly renovated buildings, or buildings that have energy monitoring systems newly installed do not have sufficient data to build energy demand prediction models. In contrast, established buildings often have vast amounts of data collected. The model learned from these data can be useful if transferred to buildings with little or no data. Two tree-based machine learning algorithms were introduced in this study on transfer learning. Datasets from two leisure centers in Melbourne were used. The results show that transfer learning is a promising technique in predicting accurately under a new scenario as it can achieve similar or even better performance compared to learning on a full dataset.
Paul Banda, Muhammed Bhuiyan, Kevin Zhang and Andy Song

ICCS 2019 Main Track (MT) Session 10

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 1.3

Chair: To be announced

175 Estimating agriculture NIR images from aerial RGB data [abstract]
Abstract: Remote Sensing in agriculture makes possible the acquisition of large amount of data without physical contact, providing diagnostic tools with important impacts on costs and quality of production. Hyperspectral imaging sensors attached to airplanes or unmanned aerial vehicles (UAVs) can obtain spectral signatures, that makes viable assessing vegetation indices and other charac-teristics of crops and soils. However, some of these imaging technologies are expensive and therefore less attractive to familiar and/or small producers. In this work a method for estimating Near Infrared (NIR) bands from a low-cost and well-known RGB camera is presented. The method is based on a weighted sum of NIR previously acquired from pre-classified uniform areas, using hyperspectral images. Weights (belonging degrees) for NIR spectra were obtained from outputs of K-nearest neighbor classification algorithm. The results showed that presented method has potential to estimate near in-frared band for agricultural areas by using only RGB images with error less than 9%.
Daniel Caio de Lima, Diego Saqui, Steve Ataky, Lúcio Jorge, Ednaldo José Ferreira and José Hiroki Saito
431 Simulation of Fluid Flow in Induced Fractures in Shale by the Lattice Boltzmann Method [abstract]
Abstract: With increasing interest in unconventional resources, understanding the flow in fractures, the gathering system for fluid production in these reservoirs, becomes an essential building block for developing effective stimulation treatment designs. Accurate determination of stress-dependent permeability of fractures requires time-intensive physical experiments on fractured core samples. Unlike previous attempts to estimate permeability through experiments, we utilize 3D Lattice Boltzmann Method simulations for increased understanding of how rock proper-ties and generated fracture geometries influence the flow. Here, both real induced shale rock fractures and synthetic fractures are studied. Digital representations are characterized for descriptive topological parameters, then duplicated, with the upper plane translated to yield an aperture and variable degree of throw. We pre-sent several results for steady LBM flow in characterized, unpropped fractures, demonstrating our methodology. Results with aperture variation in these com-plex, rough-walled geometries are described with a modification to the theoretical cubic law relation for flow in a smooth slit. Moreover, a series of simulations mimicking simple variation in proppant concentration, both in full and partial monolayers, are run to better understand their effects on the permeability of propped fractured systems.
Rahman Mustafayev and Randy Hazlett
433 Numeric computer simulations of the injection molding process of plastic parts [abstract]
Abstract: The plastics industry is continuously demanding for plastic parts with higher surface quality and improved mechanical properties. The injection molding process is the most widely used, and allows producing a huge variety of parts. Computer simulation can be a valuable tool when needing to optimize manufacturing processes, and along the last couple of decades, software was developed specifically to predict the outcome and optimize parameters in injection molding. However, several non-conventional injection molding processes still lack proper computational techniques/approaches. Such is the case of RHCM (Rapid Heating Cycle Molding), involving both heating and cooling cycles, with the injection of the material into the mold being made with a hot mold surface (against the conventional approach where the mold surface is cold). In this work, we explored the limits of state-of-the-art models for simulating this process, given the necessity to use it in a practical industrial application, and we propose a way to use homogenization theory to solve the heat transfer problem. It provides an assessment that in a periodic medium, the solution of the convection-conduction energy equation is approximated by the solution of a heat equation. In this equation, a new term appears: a homogenized conductivity tensor that includes terms that account for convection. This equation can be used for the first time to: (i) study in great detail one single periodic cell of the microstructure and use its results to characterize the performance of the macroscale domain (ii) serve as a model for material engineering in heat transfer applications (iii) model problems in other fields that possess the same physical and geometric nature. Our results are illustrated with analytical analyses and numerical simulations, proving this model can accurately reconstruct the physics behind the heat transfer process. With this novel approach, we can better understand the process, and improve industrial practice for RHCM in injection molding.
Ricardo Simoes, Luis Correia, Paulo Francisco, Carla L. Simões, Luís Faria and Jorge Laranjeira
457 Incentive Mechanism for Cooperative Intrusion Response: A Dynamic Game Approach [abstract]
Abstract: Multi-hop D2D (Device-to-Device) communication may be exposed to many intrusions for its inherent properties, such as openness and weak security protection. To mitigate the intrusions in time, one of significant approaches is to establish a Cooperative Intrusion Response System (CIRS) to detect and respond to the intrusion activities, i.e., during data transmission, User Equipments that act as relays (RUEs) cooperatively help destination nodes to detect and respond intrusion events. However, the CIRS cannot efficiently work in multi-hop D2D communication because the RUEs are selfish and unwilling to spend extra resources on undertaking the intrusion detection and response tasks. To address this problem, an incentive mechanism is required. In this paper, we formulate an incentive mechanism for CIRS in multi-hop D2D communication as a dynamic game and achieve an optimal solution to help RUEs decide whether to participate in detection or not. Theoretical analysis shows that only Nash equilibrium exists for the proposed game. Simulations demonstrate that our mechanism can efficiently motivate potential RUEs to participate in intrusion detection and response, and can also block intrusion propagation in time.
Yunchuan Guo, Xiao Wang, Liang Fang, Yongjun Li, Fenghua Li and Kui Geng
477 A k-Cover Model for Reliability-Aware Controller Placement in Software-Dened Networks [abstract]
Abstract: The main characteristics of Software-Defined Networks are the separation of the control and data planes, as well as a logically centralized control plane. This emerging network architecture simplifies the data forwarding and allows managing the network in a exible way. Controllers play a key role in SDNs since they manage the whole network. It is crucial to determine the minimum number of controllers and where they should be placed to provide low latencies between switches and their assigned controller. It is worth to underline that, if there are long propagation delays between controllers and switches, their ability of reacting to network events quickly is affected, degrading reliability. Thus, the Reliability-Aware Controller Placement (RCP) problem in Software-Defined Networks (SDNs) is a critical issue. In this work we propose a k-cover based model for the RCP problem in SDNs. It simultaneously optimizes the number and placement of controllers, as well as latencies of primary and backup paths between switches and controllers, providing reliable networks against link, switch and controller failures. Although RCP problem is NP-hard, the simulation results show that reliabilities greater than 97%, satisfying low latencies, were obtained and the model can be used to find the optimum solution for different network topologies, in negligible time.
Gabriela Schütz

Applications of Matrix Methods in Artificial Intelligence and Machine Learning (AMAIML) Session 1

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 0.3

Chair: Kourosh Modarresi

410 Biclustering via Mixtures of Regression Models [abstract]
Abstract: Bi-clustering of observations and the variables is of interest in many scientific disciplines; In a single set of data matrix it is handled through the singular value decomposition. Here we deal with two sets of variables: Response and predictor sets. We model the joint relationship via regression models and then apply SVD on the coefficient matrix. The sparseness condition is introduced via Group Lasso; the approach discussed here is quite general and is illustrated with an example from Finance.
Raja Velu, Zhaoque Zhou and Chyng Wen Tee
524 An Evaluation Metric for Content Providing Models, Recommender Systems and Online Campaigns [abstract]
Abstract: Creating an optimal digital experience for users require providing users desirable content and also delivering these contents in optimal time as user’s experience and interaction taking place. There are multiple metrics and variables that may determine the success of a “user digital experience”. These metrics may include accuracy, computational cost and other variables. Many of these variables may be contradictory to one another (as explained later in this submission) and their importance may depend on the specific application the digital experience optimization may be pursuing. To deal with this intertwined, possibly contradicting and confusing set of metrics, this invention introduces a generalized index entailing all possible metrics and variables - - that may be significant in defining a successful “digital experience design model”. Besides its generalizability, as it may include any metric the marketers or scientists consider to be important, this new index allows the marketers or the scientists to give different weights to the corresponding metrics as the significance of a specific metric may depends on the specific application. This index is very flexible and could be adjusted as the objective of” user digital experience optimization” may change. Here, we use “recommendation” as equivalent to “content providing” throughout the submission. One well known usage of “recommender systems” is in providing contents such as products, ads, goods, network connections, services, and so on. Recommender systems have other wide and broad applications and - - in general - - many problems and applications in AI and machine learning could be converted easily to an equivalent “recommender system” one. This feature increases the significance of recommender systems as an important application of AI and machine learning. The introduction of internet has brought a new dimension on the ways businesses sell their products and interact with their customers. Ubiquity of the web and consequently web applications are soaring and as a result much of the commerce and customer experience are taking place on line. Many companies offer their products exclusively or predominantly online. At the same time, many present and potential customers spend much time on line and thus businesses try to use efficient models to interact with online users and engage them in various desired initiatives. This interaction with online users is crucial for businesses that hope to see some desired outcome such as purchase, conversions of any types, simple page views, spending longer time on the business pages and so on. Recommendation system is one of the main tools to achieve these outcomes. The basic idea of recommender systems is to analyze what is the probability of a desires action by a specific user. Then, by knowing this probability, one can make decision of what initiatives to be taken to maximize the desirable outcomes of the online user’s actions. The types of initiatives could include, promotional initiatives (sending coupons, cash, …) or communication with the customer using all available media venues such as mail, email, online ad, etc. the main goal of recommendation or targeting model is to increase some outcomes such as “conversion rate”, “length of stay on sites”, “number of views” and so on. There are many other direct or indirect metrics influenced by recommender systems. Examples of these could include an increase of the sale of other products which were not the direct goal of the recommendations, an increase the chance of customer coming back at the site, increase in brand awareness and the chance of retargeting the same user at a later time. The Model: Overview At first, we demonstrate the problem we want to address, and we do it by using many models, data sets and multiple metrics. Then, we propose our unified and generalized metric to address the problems we observe in using different multiple and separate metrics. Thus, we use several models and multiple data sets to evaluate our approach. First, we use all these data sets to evaluate performances of the different models using different performance metrics which are “the state of the art”. Then, we are observing the difficulties of any evaluation using these performance metrics. That is because dealing with different performance metrics, which often make contradictory conclusion, it’d be hard to decide which model has the best performance (so to use the model for the targeting campaign in mind). Therefore, we create our performance index which produces a single, unifying performance metric evaluation a targeting model.
Kourosh Modarresi and Jamie Diner
162 Tolerance Near Sets and tNM Application in City Images [abstract]
Abstract: The Tolerance Near Set theory - is a formal basis for the observation, comparison and classification of objects, and tolerance Nearness Measure (tNM) is a normalized value, that indicates how much two images are similar. This paper aims to present an application of the algorithm that performs the comparison of images based on the value of tNM, so that the similarities between the images are verified with respect to their characteristics, such as Gray Levels and texture attributes extracted using Gray Level Co-occurrence Matrix (GLCM). Images of the center of some selected cities around the world, are compared using tNM, and classified.
Deivid Silva, José Saito and Daniel Caio De Lima
363 Meta-Graph based Attention-aware Recommendation over Heterogeneous Information Networks [abstract]
Abstract: Heterogeneous information network (HIN), which involves diverse types of data, has been widely used in recommender systems. However, most existing HINs based recommendation methods equally treat different latent features and simply model various feature interactions in the same way so that the rich semantic information cannot be fully utilized. To comprehensively exploit the heterogeneous information for recommendation, in this paper, we propose a Meta-Graph based Attention-aware Recommendation (MGAR) over HINs. First of all, the MGAR utilizes rich meta-graph based latent features to guide the heterogeneous information fusion recommendation. Specifically, in order to discriminate the importance of latent features generated by different meta-graphs, we propose an attention-based feature enhancement model. The model enables useful features and useless features contribute differently to the prediction, thus improves the performance of the recommendation. Furthermore, to holistically exploit the different interrelation of features, we propose a hierarchical feature interaction method which consists three layers of second-order interaction to mine the underlying correlations between users and items. Extensive experiments show that MGAR outperforms the state-of-the-art recommendation methods in terms of RMSE on Yelp and Amazon Electronics.
Feifei Dai, Xiaoyan Gu, Bo Li, Jinchao Zhang, Mingda Qian and Weiping Wang

Data Driven Computational Sciences 2019 (DDCS) Session 1

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 0.4

Chair: Craig Douglas

160 Nonparametric Signal Ensemble Analysis for the Search for Extraterrestrial Intelligence (SETI) [abstract]
Abstract: It might be easier for intelligent extraterrestrial civilizations to be found when they mark their position with a bright laser beacon. Given the possible distances involved, however, it is likely that weak signal detection techniques would still be required to identify even the brightest SETI beacon. The Bootstrap Error-adjusted Single-sample Technique (BEST) is such a detection technique. The BEST has been shown to outperform the more traditional Mahalanobis distance metric in analysis of SETI data from a Project Argus near-infrared telescope. The BEST algorithm is used to identify unusual signals, and returns a distance in asymmetric nonparametric multidimensional central 68% confidence intervals (equivalent to standard deviations for 1-D data that are normally distributed, or Mahalanobis distance units for normally distributed data of d dimensions). Calculation of the Mahalanobis metric requires matrix factorization and is O(d3). In contrast, calculation of the BEST metric does not require matrix factorization and is O(d). Furthermore, the accuracy and precision of the BEST metric are greater than the Mahalanobis metric in realistic data collection scenarios (many more wavelengths available than observations at those wavelengths).
Robert Lodder
93 Parallel Strongly Connected Components Detection with Multi-partition on GPUs [abstract]
Abstract: The graph computing is often used to analyze complex relationships in the interconnected world, and the strongly connected components (SCC) detection in digraphs is a basic problem in graph computing. As graph size increases, many parallel algorithms based on GPUs have been proposed to detect SCC. The state-of-the-art parallel algorithms of SCC detection can accelerate on various graphs, but there is still space for improvement in: (1) Multiple traversals are time-consuming when processing real-world graphs; (2) Pivot selection is less accurate or time-consuming. We proposed an SCC detection method with multi-partition that optimizes the algorithm process and achieves high performance. Unlike existing parallel algorithms, we select a pivot and traverse it forward, and then select a vice pivot and traverse the pivot and the vice pivot backwards simultaneously. After updating the state of each vertex, we can get multiple partitions to parallelly detect SCC. At different phases of our approach, we use a vertex with the largest degree product or a random vertex as the pivot to balance selection accuracy and efficiency. We also implement WCC detection and 2-SCC to optimize our algorithm. And the vertices marked by the WCC partition are selected as the pivot to reduce unnecessary operations. We conducted experiments on the NVIDIA K80 with real-world and synthetic graphs. The results show that the proposed algorithm achieves an average detection acceleration of 8.8 x and 21 x when compared with well-known algorithms, such as Tarjan's algorithm and Barnat's algorithm.
Junteng Hou, Shupeng Wang, Guangjun Wu, Ge Fu and Siyu Jia
122 Efficient Parallel Associative Classification based on Rules Memoization [abstract]
Abstract: Associative classification refers to a class of algorithms that is very efficient in classification problems. In such domain, data are typically multidimensional with each instance represents a point in fixed-length attribute space, usually exploring from two very large sets: training and test datasets. Models, known as classifiers, are generated by class association rules mined in the training data and are handled on eager or lazy strategies to label classes for unlabeled instances of a test dataset. In such strategies is typical that unlabeled data are evaluated independently by a series of sophisticated and high costly computations, which may lead to an expressive overlap among classifiers that evaluate similar points in the attribute space. To overcome such drawbacks, we propose a parallel and high-performance associative classification based on a lazy strategy, which partial computations of similar classifiers are cached and shared efficiently. In this sense, a PageRank-driven similarity metric is introduced to measure computations affinity among unlabeled data instances, memoizing the generated association rules. The experiments results show that our similarity-based metric maximizes the reuse of rules cached and, consequently, improve outperform for application, with gains up to 60% in execution time and 40% higher cache hit rate, mainly in limited cache space conditions.
Michel Pires, Leonardo Rocha, Renato Ferreira and Wagner Meira Jr.
407 Extreme Value Theory based Robust Anomaly Detection [abstract]
Abstract: Most current clustering based anomaly detection methods use a scoring schema and thresholds to classify anomalies. These methods are often tailored to target specific data sets with "known" number of clusters. The paper provides a streaming extension to a generalized model that has limited data dependency and performs probabilistic anomaly detection and clustering simultaneously. This ensures that the cluster formation is not impacted by the presence of anomalous data, thereby leading to more reliable definition of "normal vs abnormal" behaviour\footnote{When anomaly detection is performed post clustering, the presence of anomalies gives a slightly skewed definition traditional/normal behavior. To avoid this, simultaneous clustering and anomaly detection is performed. The motivations behind developing the integrated CRP-EV model and the path that leads to the streaming model is discussed.
Sreelekha Guggilam, Abani Patra and Varun Chandola

Machine Learning and Data Assimilation for Dynamical Systems (MLDADS) Session 2

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 0.5

Chair: Rossella Arcucci

334 Data assimilation in a nonlinear time-delayed dynamical system with Lagrangian optimization [abstract]
Abstract: When the heat released by a flame is sufficiently in phase with the acoustic pressure, a self-excited thermoacoustic oscillation can arise. These nonlinear oscillations are one of the biggest challenges faced in the design of safe and reliable gas-turbines and rocket motors. In the worst-case scenario, uncontrolled thermoacoustic oscillations can shake an engine apart. Reduced-order thermoacoustic models, which are nonlinear and time-delayed, can only qualitatively predict thermoacoustic oscillations. To make reduced-order models quantitatively predictive, we develop a data assimilation framework for state estimation. We numerically estimate the most likely nonlinear state of a Galerkin-discretized time delayed model of a prototypical combustor. Data assimilation is an optimal blending of observations with previous system’s state estimates (background) to produce optimal initial conditions. A cost functional is defined to measure (i) the statistical distance between the model output and the measurements from experiments; and (ii) the distance between the model’s initial conditions and the background knowledge. Its minimum corresponds to the optimal state, which is computed by Lagrangian optimization with the aid of adjoint equations. We study the influence of the number of Galerkin modes, which are the natural acoustic modes of the duct, with which the model is discretized. We show that decomposing the measured pressure signal in a finite number of modes is an effective way to enhance the state estimation, especially when highly nonlinear modal interactions occur in the assimilation window. This work represents the first application of data assimilation to nonlinear thermoacoustics, which opens new possibilities for real time calibration of reduced-order models with experimental measurements.
Tullio Traverso and Luca Magri
97 Machine learning to approximate solutions of ordinary differential equations: Neural networks vs. linear regressors [abstract]
Abstract: We discuss surrogate models based on machine learning as approximation to the solution of an ordinary differential equation. Neural networks and a multivariate linear regressor are assessed for this application. Both of them show a satisfactory performance for the considered case study of a damped perturbed harmonic oscillator. The interface of the surrogate model is designed to work similar to a solver of an ordinary differential equation, respectively a simulation unit. Computational demand and accuracy in terms of local and global error are discussed. Parameter studies are performed to discuss the sensitivity of the method and to tune the performance.
Georg Engel
130 Kernel Methods for Discrete-Time Linear Equations [abstract]
Abstract: Methods from learning theory are used in the state space of linear dynamical and control systems in order to estimate the system matrices and some relevant quantities such as a the topological entropy. The approach is illustrated via a series of numerical examples.
Boumediene Hamzi and Fritz Colonius
150 Data-driven inference of the ordinary differential equation representation of a chaotic dynamical model using data assimilation [abstract]
Abstract: Recent progress in machine learning has shown how to forecast and, to some extent, learn the dynamics of a model from its output, resorting in particular to neural networks and deep learning techniques. We will show how the same goal can be directly achieved using data assimilation techniques without leveraging on machine learning software libraries, with a view to high-dimensional models. The dynamics of a model are learned from its observation and an ordinary differential equation (ODE) representation of this model is inferred using a recursive nonlinear regression. Because the method is embedded in a Bayesian data assimilation framework, it can learn from partial and noisy observations of a state trajectory of the physical model. Moreover, a space-wise local representation of the ODE system is introduced and is key to deal with high-dimensional models. The method is illustrated on several chaotic discrete and continuous models of various dimensions, with or without noisy observations, with the goal to identify or improve the model dynamics, build a surrogate or reduced model, or produce forecasts from mere observations of the physical model. It has recently been suggested that neural network architectures could be interpreted as dynamical systems. Reciprocally, we show that our ODE representations are reminiscent of deep learning architectures. Furthermore, numerical analysis considerations on stability shed light on the assets and limitations of the method.
Marc Bocquet, Julien Brajard, Alberto Carrassi and Laurent Bertino

Classifier Learning from Difficult Data (CLDD) Session 2

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 0.6

Chair: Michal Wozniak

229 Missing Features Reconstruction and Its Impact on Classification Accuracy [abstract]
Abstract: In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also propose two approaches to using them for multiple features imputation. The experiments were performed on both real world and artificial datasets with continuous features where different numbers of features, varying from one feature to 50%, were missing. The results show that MICE and linear regression are generally good imputers regardless of the conditions. On the other hand, the performance of MLP and XGBT is strongly dataset dependent. Their performance is the best in some cases, but more often they perform worse than MICE or linear regression.
Magda Friedjungová, Daniel Vašata and Marcel Jiřina
78 A Deep Malware Detection Method Based on General-Purpose Register Features [abstract]
Abstract: Based on low-level features at micro-architecture level, the existing detection methods usually need a long sample length to detect malicious behaviours and can hardly identify non-signature malware, which will inevitably affect the detection efficiency and effectiveness. To solve the above problems, we propose to use the General-Purpose Registers (GPRs) as our features and design a novel deep learning model for malware detection. Specifically, each register has specific functions and changes of its content contain the action information which can be used to detect illegal behaviours. Also, we design a deep detection model, which can jointly fuse spatial and temporal correlations of GPRs for malware detection only requiring a short sample length. The proposed deep detection model can well learn discriminative characteristics from GPRs between normal and abnormal processes, and thus can also identify non-signature malware. Comprehensive experimental results show that our proposed method performs better than the state-of-art methods for malicious behaviours detection relying on low-level features.
Fang Li, Chao Yan, Ziyuan Zhu and Dan Meng
415 A Novel Distribution Analysis for SMOTE oversampling method in Handling Class Imbalance [abstract]
Abstract: Class Imbalance problems are often encountered in many applications. Such problems occur whenever a class is under-represented, has a few data points, compared to other classes. However, this minority class is usually a significant one. One approach for handling imbalance is to generate new minority class instances to balance the data distribution. The Synthetic Minority Oversampling TEchnique (SMOTE) is one of the dominant oversampling methods in the literature. SMOTE generates data using linear interpolation between minority class data point and one its $K$-nearest neighbors. In this paper, we present a theoretical and an experimental analysis of the SMOTE method. We explore the accuracy of how faithful SMOTE method emulates the underlying density. To our knowledge, this is the first mathematical analysis of the SMOTE method. Moreover, we study the impacts of the different factors on generation accuracy, such as the dimension of data, the number of examples, and the considered number of neighbors $K$ on both artificial, and real datasets.
Dina Elreedy and Amir Atiya
494 Forecasting purchase categories by transactional data: a comparative study of classification methods [abstract]
Abstract: Forecasting purchase behavior of bank clients allows for development of new recommendation and personalization strategies and results in better Quality-of-Service and customer experience. In this study, we consider the problem of predicting purchase categories of a client for the next time period by the historical transactional data. We study the predictability of expenses for different Merchant Category Codes (MCCs) and compare the efficiency of different classes of ma-chine learning models including boosting algorithms, long-short term memory networks and convolutional networks. The experimental study is performed on a massive dataset with debit card transactions for 5 years and about 1.2 M clients provided by our bank-partner. The results show that: (i) there is a set of MCC categories which are highly predictable (an exact number of categories varies with thresholds for minimal precision and recall), (ii) for most of the considered cases, convolutional neural networks perform better, and thus, may be recommended as basic choice for tackling similar problems.
Klavdiya Bochenina and Egor Shikov
439 Recognizing Faults in Software Related Difficult Data [abstract]
Abstract: In this paper we have investigated the use of numerous machine learning algorithms, with emphasis on multilayer artificial neural networks in the domain of software source code fault prediction. The main contribution lies in enhancing the data pre-processing step as the partial solution for handling software related difficult data. Before we put the data into an Artificial Neural Network, we are implementing PCA (Principal Component Analysis) and k-means clustering. The data clustering step improves the quality of the whole dataset. Using the presented approach we were able to obtain 10% increase of accuracy of the fault detection. In order to ensure the most reliable results, we implement 10-fold cross-validation methodology during experiments.
Michal Choras, Marek Pawlicki and Rafal Kozik

Simulations of Flow and Transport: Modeling, Algorithms and Computation (SOFTMAC) Session 2

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 1.4

Chair: Shuyu Sun

449 Energy Stable Simulation of Two-Phase Equilibria with Interfaces at Given Volume, Temperature, and Moles [abstract]
Abstract: In this paper, we formulate a modeling theory and numerical algorithm for a multi-component two-phase fluid system together with the interface between phases and with gravity. We use a diffuse interface model based on Peng-Robinson equation of state (EOS) for the modeling of the fluid. We show that gravity has a significant influence on the phase equilibrium behavior, which is an expected phenomenon but has not been numerically studied in the literature regarding to Peng-Robinson fluid modeled by a diffuse interface model.
Shuyu Sun
385 Effects of Numerical Integration on DLM/FD Method for Solving Interface Problems with Body-Unfitted Meshes [abstract]
Abstract: In this paper, the effects of different numerical integration schemes on the distributed Lagrange multiplier/fictitious domain (DLM/FD) method with body-unfitted meshes are studied for solving different types of interface problems: elliptic-, Stokes- and Stokes/elliptic-interface problems, for which the corresponding mixed finite element approximations are developed. Commonly-used numerical integration schemes, compound type formulas and a specific subgrid integration scheme are presented and the comparison between them is illustrated in numerical experiments, showing that different numerical integration schemes have significant effects on approximation errors of the DLM/FD finite element method for different types of interface problems, especially for Stokes- and Stokes/elliptic-interface problems, and that the subgrid integration scheme always results in numerical solutions with the best accuracy.
Cheng Wang, Pengtao Sun, Hao Shi, Rihui Lan and Fei Xu
111 Application of a Double Potential Method to Simulate Incompressible Viscous Flows [abstract]
Abstract: This paper discusses the application of the double potential method for modeling flow of incompressible fluid. The algorithm allows us to avoid a numeric calculation of pressure. This procedure is not easy for case of an incompressible fluid flow. It may lead to solution instability with approximation by cell center grid methods. Also, the double potential method overcomes a problem of a complex boundary conditions which arises in case of modelling with using the Navier-Stokes equations in the vector potential-vortex formulation. The resulting system of equations is approximated by using the finite volume method and the exponential transformation. As a verification problem, the problem of establishing the Poiseuille flow on three-dimensional cylindrical geometry was applied
Tatyana Kudryashova, Sergey Polyakov and Nikita Tarasov
161 A bubble formation in the two-phase system [abstract]
Abstract: The formation of the bubbles in the liquid was examined numerically and obtained results were successfully compared with the results provided by experiments. The study covered two different patterns defined by different Morton numbers or gas flow rates. The unsteady three dimensional calculations were carried out in code OpenFoam with the volume of fluid approach. Found numerical results were in a good math to the experiments in respect to bubble shapes, diameters and Reynolds numbers. More accurate comparison was found for lower gas flow rate then for the higher one. The main reason can be that under higher gas flow rate, a complex flow behavior between gas bubbles and surrounding liquid flow is created which after that worsen the accuracy of calculations. The main important output of the study was a comparison of the bubble diameters in time. Especially for higher gas flow rates, bubbles are growing rapidly during its climbing. Nevertheless a satisfactory agreement was found between numerics and experiments.
Karel Frana, Shehab Attia and Jorg Stiller

Marine Computing in the Interconnected World for the Benefit of the Society (MarineComp) Session 2

Time and Date: 14:40 - 16:20 on 12th June 2018

Room: 2.26

Chair: Flávio Martins

554 Implementation of a 3-dimentional hydrodynamic model to a fish aquaculture area in Sines, Portugal - A down-scaling approach [abstract]
Abstract: Coastal zones have always been preferential areas for human settlement, mostly due to their natural resources. However, human occupation poses complex problems and requires proper management tools. Numerical models rank among those tools and offer a way to evaluate and anticipate the impact of human pressures on the environment. This work describes the preliminary implementation of a 3-dimensional computational model for the coastal zone in Sines, Portugal. This coastal area is under significant pressure from human activities, and the model implementation targets the location of a fish aqua-culture. The model aims to reproduce the hydrodynamics of the system, as part of an ongoing project to simulate the dynamics of the aquaculture area. So far, the model application shows promising results.
Alexandre Correia, Lígia Pinto and Marcos Mateus
551 Numerical characterization of the Douro River plume [abstract]
Abstract: The Douro is one of the largest rivers of the Iberian Peninsula, representing the most important buoyancy source into the Atlantic Ocean on the northwestern Portuguese coast. The main goal of this study is to contribute to the knowledge of physical processes associated with the propagation of the Douro River plume. The general patterns of dispersion in the ocean and how the plume change hy-drography and coastal circulation were evaluated, considering the main drivers involved: river discharge and wind. Coastal models were implemented to charac-terize the propagation of the plume, its dynamics, and its impact on coastal circu-lation. Different numerical scenarios of wind and river discharge were analyzed. The estuarine outflow is sufficient to generate a northward coastal current without wind under moderate-to-high river discharge conditions. Under easterly winds, the propagation pattern is similar to the no wind forcing, with a northward current speed increasing. A southward coastal current is generated only by strong west-erly winds. Under upwelling-favorable (northerly) winds, the plume extends offshore with tilting towards the southwest. Southerly winds increase the velocity of the northward current, being the merging of the Douro and Minho estuarine plumes a likely consequence.
Renato Mendes, Nuno Vaz, Magda C. Sousa, João G. Rodrigues, Maite Decastro and João M. Dias
550 The Impact of Sea Level Rise in the Guadiana Estuary [abstract]
Abstract: Understanding the impact of sea level rise on coastal areas is crucial as a large percentage of the population live on the coast. This study uses computational tools to examine how two major consequences of sea level rise: salt intrusion and an increase in water volume affect the hydrodynamics and flooding areas of a major estuary in the Iberian Peninsula. A 2D numerical model created with the software MOHID was used to simulate the Guadiana Estuary in different scenarios of sea level rise combined with different freshwater flow rates. An increase in salinity was found in response to an increase in mean sea level in low and intermediate freshwater flow rates. An increase in flooding areas around the estuary were also positively correlated with an increase in mean sea level.
Lara Mills, João Janeiro and Flávio Martins
562 Estuarine light attenuation modelling towards improved management of coastal fisheries [abstract]
Abstract: The ecosystem function of local fisheries holds great societal importance in the coastal zone of Cartagena, Colombia, where coastal communities depend on artisanal fishing for their livelihood and health. These fishing resources have declined sharply in recent decades partly due to issues of coastal water pollution. Mitigation strategies to reduce pollution can be better evaluated with the support of numerical hydrodynamic models. To model the processes of hydrodynamics and water quality in Cartagena Bay, significant consideration must be dedicated to the process of light attenuation, given its importance to the bay’s characteristics of strong vertical stratification, turbid surface water plumes, algal blooms and hypoxia. This study uses measurements of total suspended solids (TSS), turbidity, chlorophyll-a (Chl-a) and Secchi depth monitored in the bay monthly over a 2-year period to calculate and compare the short-wave light extinction coefficient (Kd) according to nine different equations. The MOHID-Water model was used to simulate the bay’s hydrodynamics and to compare the effect of three different Kd values on the model’s ability to reproduce temperature profiles observed in the field. Simulations using Kd values calculated by equations that included TSS as a variable produced better results than those of an equation that included Chl-a as a variable. Further research will focus on evaluating other Kd calculation methods and comparing these results with simulations of different seasons. This study contributes valuable knowledge for eutrophication modelling which would be beneficial to coastal zone management in Cartagena Bay.
Marko Tosic, Flávio Martins, Serguei Lonin, Alfredo Izquierdo and Juan Darío Restrepo