ICCS 2019 Main Track (MT) Session 12

Time and Date: 10:15 - 11:55 on 13th June 2019

Room: 1.3

Chair: Katarzyna Rycerz

47 Synchronized Detection and Recovery of Steganographic Messages with Adversarial Learning [abstract]
Abstract: In this work, we mainly study the mechanism of learning the steganographic algorithm as well as combining the learning process with adversarial learning to learn a good steganographic algorithm. To handle the problem of embedding secret messages into the specific medium, we design a novel adversarial modules to learn the steganographic algorithm, and simultaneously train three modules called generator, discriminator and steganalyzer. Different from existing methods, the three modules are formalized as a game to communicate with each other. In the game, the generator and discriminator attempt to communicate with each other using secret messages hidden in an image. While the steganalyzer attempts to analyze whether there is a transmission of confidential information. We show that through unsupervised adversarial training, the adversarial model can produce robust steganographic solutions, which act like an encryption. Furthermore, we propose to utilize supervised adversarial training method to train a robust steganalyzer, which is utilized to discriminate whether an image contains secret information. Numerous experiments are conducted on publicly available dataset to demonstrate the effectiveness of the proposed method.
Haichao Shi, Xiao-Yu Zhang, Shupeng Wang, Ge Fu and Jianqi Tang
64 Multi-Source Manifold Outlier Detection [abstract]
Abstract: Outlier detection is an important task in data mining, with many practical applications ranging from fraud detection to public health. However, with the emergence of more and more multi-source data in many real-world scenarios, the task of outlier detection becomes even more challenging as traditional mono-source outlier detection techniques can no longer be suitable for multi-source heterogeneous data. In this paper, a general framework based the consistent representations is proposed to identify multi-source heterogeneous outlier. According to the information compatibility among different sources, Manifold learning are combined in the proposed method to obtain a shared representation space, in which the information-correlated representations are close along manifold while the semantic-complementary instances are close in Euclidean distance. Furthermore, the multi-source outliers can be effectively identified in the affine subspace which is learned through affine combination of shared representations from different sources in the feature-homogeneous space. Comprehensive empirical investigations are presented that confirm the promise of our proposed framework.
Lei Zhang and Shupeng Wang
155 A Fast NN-based Approach for Time Sensitive Anomaly Detection over Data Streams [abstract]
Abstract: Anomaly detection is an important data mining method aiming to discover outliers that show significant diversion from their expected behavior. A widely used criteria for determining outliers is based on the number of their neighboring elements, which are referred to as Nearest Neighbors (NN). Existing NN-based Anomaly Detection (NN-AD) algorithms cannot detect streaming outliers, which present time sensitive abnormal behavior characteristics in different time intervals. In this paper, we propose a fast NN-based approach for Time Sensitive Anomaly Detection (NN-TSAD), which can find outliers that present different behavior characteristics, including normal and abnormal characteristics, within different time intervals. The core idea of our proposal is that we combine the model of sliding window with Locality Sensitive Hashing (LSH) to monitor streaming elements distribution as well as the number of their Nearest Neighbors as time progresses. We use an ϵ-approximation scheme to implement the model of sliding window to compute Nearest Neighbors on the fly. We conduct widely experiments to examine our approach for time sensitive anomaly detection using three real-world data sets. The results show that our approach can achieve significant improvement on recall and precision for anomaly detection within different time intervals. Especially, our approach achieves two orders of magnitude improvement on time consumption for streaming anomaly detection, when compared with traditional NN-based anomaly detection algorithms, such as exact-Storm, approx-Storm, MCOD etc, while it only uses 10 percent of memory consumption.
Guangjun Wu, Zhihui Zhao, Ge Fu and Haiping Wang
199 Causal links between geological attributes of oil and gas reservoir analogues [abstract]
Abstract: Oil and gas reservoirs are distributed across the globe at different depth and geological ages. Although some petroleum deposits are situated spatially far from each other, they may share similar distributions of continuous attributes describing formation and fluid properties as well as categorical attributes describing tectonic regimes and depositional environments. In that case they are called reservoir analogues. Information about thousands of reservoirs from around the world forms a solid basis for uncertainty evaluation and missing data imputation. Besides these routine tasks in the industry, such dataset allows to obtain probabilistic reasoning through frequency analysis. This work presents graphical representation of causal links between geological attributes of reservoir analogues.
Nikita Bukhanov, Arthur Sabirov, Oksana Popova and Stanislav Slivkin
212 n-gram Cache Performance in Statistical Extraction of Relevant Terms in Large Corpora [abstract]
Abstract: Statistical extraction of relevant n-grams in natural language corpora is important for text indexing and classication since it can be language independent. We show how a theoretical model identies the distribution properties of the distinct n-grams and singletons appearing in large corpora and how this knowledge contributes to understanding the performance of an n-gram cache system used for extraction of rel- evant terms. We show how this approach allowed us to evaluate the benets from using Bloom lters for excluding singletons and from using static prefetching of nonsingletons in an n-gram cache. In the context of the distributed and parallel implementation of the LocalMaxs extraction method, we analyze the performance of the cache miss ratio and size, and the eciency of n-gram cohesion calculation with LocalMaxs.
Carlos Goncalves, Joaquim Silva and Jose Cunha