Workshop on Nonstationary Models of Pattern Recognition and Classifier Combinations (NMRPC) Session 1

Time and Date: 10:15 - 11:55 on 8th June 2016

Room: Boardroom East

Chair: Michal Wozniak

545 Workshop on Nonstationary Models of Pattern Recognition and Classifier Combinations [abstract]
Abstract: Workshop on Nonstationary Models of Pattern Recognition and Classifier Combinations
Michal Wozniak, Bartosz Krawczyk
550 Keynote speach: Computational Aspects of Data Processing and Pattern Recognition with Tensor Methods [abstract]
Abstract: We live in the era of massive data processing. Computational requirements on information processing and retrieval systems are therefore enormous - not only huge amounts of data needs to be processed and classified but also the systems need to deal with data multidimensionality. However, only recently data processing methods were extended to directly deal with multidimensional N-D patterns, without their prior vectorization, thanks to application of tensors. This talk will be focused on computational aspects of data processing and pattern recognition with tensors. We will present a systematic overview of tensor algebra and tensor decomposition methods with special stress on their applications in data representation, analysis, as well as pattern recognition. In the talk we will especially emphasize practical aspects, as well as implementation issues, of the presented algorithms. Prof. Cyganek bio: Bogusław Cyganek received his M.Sc. degree in electronics in 1993, and then M.Sc. in computer science in 1996, from the AGH University of Science and Technology, Krakow, Poland. He obtained his Ph.D. degree cum laude in 2001 with a thesis on correlation of stereo images, and D.Sc. degree in 2011 with a thesis on methods and algorithms of object recognition in digital images.
 During recent years dr. Bogusław Cyganek cooperated with many scientific and industrial partners such as Glasgow University Scotland UK, DLR Germany, and Surrey University UK, as well as Nisus Writer, USA, Compression Techniques, USA, Pandora Int., UK, and The Polished Group, Poland. He is an associated professor at the Department of Electronics of the AGH University of Science and Technology, Poland, currently serving as a visiting professor to the Wroclaw Technical University in the ENGINE project. His research interests include computer vision, pattern recognition, data mining, as well as development of embedded systems. He is an author or a co-author of over a hundred of conference and journal papers, as well as books with the latest “Object Detection and Recognition in Digital Images: Theory and Practice” published by Wiley in 2013. Dr. Cyganek is a senior member of the IEEE and member of IAPR and SPIE.
Bogusław Cyganek
327 Anticipative Hybrid Extreme Rotation Forest [abstract]
Abstract: This paper introduces an improvement on the recently published Hybrid Extreme Rotation Forest (HERF), consisting in the anticipative determination of the the fraction of each classifier architecture included in the ensemble. We call it AHERF. Both HERF and AHERF are hetero- geneous classifier ensembles, which aim to profit from the diverse problem domain specificities of each classifier architecture in order to achieve improved generalization over a larger spec- trum of problem domains. In this paper AHERF are built from a pool of Decision Trees (DT), Extreme Learning Machines (ELM), Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), Adaboost, Random Forests (RF), and Gaussian Naive Bayes (GNB) classifiers. Given a problem dataset, the process of anticipative determination of the ensemble composition is as follows: First, we estimate the performance of each classifier architecture by independent pilot cross-validation experiments on a small subsample of the data. Next, classifier architectures are ranked according to their accuracy results. A probability distribution of classifier architec- tures appearing in the ensemble is built from this ranking. Finally, the type of each individual classifier is decided by sampling this probability distribution. Computational experiments on a collection of benchmark classification problems shows improvement on the original HERF, and other state-of-the-art approaches.
Borja Ayerdi, Manuel Grana
398 Learning Decision Trees from Data Streams with Concept Drift [abstract]
Abstract: This paper address the data mining task of classifying data stream with concept drift. The proposed algorithm, named Concept-adapting Evolutionary Algorithm For Decision Tree (CEVOT) does not require any knowledge of the environment in which it operates (e.g. numbers and rates of drifts). The novelty of the approach is combining tree learner and evolutionary algorithm, where the decision tree is learned incrementally and all information (knowledge) are stored in the internal structure of the trees’ population. The proposed algorithm is experimentally compared with state-of-the-art stream methods on several real live and synthetic datasets. Results proves its high performance in term of accuracy and processing time.
Dariusz Jankowski, Konrad Jackowski, Bogusław Cyganek

Workshop on Nonstationary Models of Pattern Recognition and Classifier Combinations (NMRPC) Session 2

Time and Date: 13:25 - 15:05 on 8th June 2016

Room: Boardroom East

Chair: Michal Wozniak

332 GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift [abstract]
Abstract: Mining data streams is one of the most vital fields in the current era of big data. Continuously arriving data may pose various problems, connected to their volume, variety or velocity. In this paper we focus on two important difficulties embedded in the nature of data streams: non-stationary nature and skewed class distributions. Such a scenario requires a classifier that is able to rapidly adapt itself to concept drift and displays robustness to class imbalance problem. We propose to use online version of Extreme Learning Machine that is enhanced by an efficient drift detector and method to alleviate the bias towards the majority class. We investigate three approaches based on undersampling, oversampling and cost-sensitive adaptation. Additionally, to allow for a rapid updating of the proposed classifier we show how to implement online Extreme Learning Machines with the usage of GPU. The proposed approach allows for a highly efficient mining of high-speed, drifting and imbalanced data streams with significant acceleration offered by GPU processing.
Bartosz Krawczyk
397 Efficient Computation of the Tensor Chordal Kernel [abstract]
Abstract: In this paper new methods for fast computation of the chordal kernels are proposed. Two versions of the chordal kernels for tensor data are discussed. These are based on different projectors of the flattened matrices obtained from the input tensors. A direct transformation of multidimensional objects into the kernel feature space leads to better data separation which can result with a higher classification accuracy. Our approach to more efficient computation of the chordal distances between tensors is based on an analysis of the tensor projectors which exhibit different properties. Thanks to this an efficient eigen-decomposition becomes possible which is done with a version of the fixed-point algorithm. Experimental results show that our method allows significant speed-up factors, depending mostly on tensor dimensions.
Bogusław Cyganek, Michal Wozniak
375 A New Design Based-SVM of the CNN Classifier Architecture with Dropout for Offline Arabic Handwritten Recognition [abstract]
Abstract: In this paper we explore a new model focused on integrating two classifiers; Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for offline Arabic handwriting recognition (OAHR) on which the dropout technique was applied. The suggested system altered the trainable classifier of the CNN by the SVM classifier. A convolutional network is beneficial for extracting features information and SVM functions as a recognizer. It was found that this model both automatically extracts features from the raw images and performs classification. Additionally, we protected our model against over-fitting due to the powerful performance of dropout. In this work, the recognition on the handwritten Arabic characters was evaluated; the training and test sets were taken from the HACDB and IFN/ENIT databases. Simulation results proved that the new design based-SVM of the CNN classifier architecture with dropout performs significantly more efficiently than CNN based-SVM model without dropout and the standard CNN classifier. The performance of our model is compared with character recognition accuracies gained from state-of-the-art Arabic Optical Character Recognition, producing favorable results.
Mohamed Elleuch, Rania Maalej, Monji Kherallah
363 Active Learning Classification of Drifted Streaming Data [abstract]
Abstract: Contemporary classification systems have to make a decision not only on the basis of the static data, but on the data in motion as well. Objects being recognized may arrive continuously to a classifier in the form of data stream. Usually, we would like to start exploitation of the classifier as soon as possible, the models which can improve their models during exportation are very desirable. Basically, we produce the model on the basis a few object learning objects and then we use and improve the classifier when new data comes. This concept is still vibrant and may be used in the plethora of practical cases. Constructing such a system we have to realize that we have the limited resources (as memory and computational power) at our disposal. Nevertheless, during the exploitation of a classifier system the chosen characteristic of the classifier model may change within a time. This phenomena is called \textit{concept drift} and may lead the deep deterioration of the classification performance. This work deals with the data stream classification with the presence of \textit{concept drift}. We propose a novel classifier training algorithm based on the sliding windows approach which allows us to implement forgetting mechanism, i.e., that old objects came from outdated model will not be taken into consideration during the classifier updating and on the other hand we assume that all arriving examples can not be labeled, because we assume that we have a limited budget for labeling. We will employ active learning paradigm to choose an "interesting" object to be be labeled. The proposed approach has been evaluated on the basis of the computer experiments carried out on the data streams. Obtained results confirmed the usability of proposed method to the smoothly drifted data stream classification.
Michal Wozniak, Pawel Ksieniewicz, Bogusław Cyganek, Andrzej Kasprzak, Krzysztof Walkowiak