Session2 15:45 - 17:25 on 11th June 2018

ICCS 2018 Main Track (MT) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M1

Chair: Yan Liu

229 SDF-Net: Real-time Rigid Object Tracking Using a Deep Signed Distance Network [abstract]
Abstract: In this paper, a deep neural network is used to model the signed distance function (SDF) of a rigid object for real-time tracking using a single depth camera. By leveraging the generalization capability of the neural network, we could better represent the model of the object implicitly. With the training stage done off-line, our proposed methods are capable of real-time performance and running as fast as 1.29 ms per frame on one CPU core, which is suitable for applications with limited hardware capabilities. Furthermore, the memory footprint of our trained SDF-Net for an object is less than 10 kilobytes. A quantitative comparison using public dataset is being carried out and our approach is comparable with the state-of-the-arts. The methods are also tested on actual depth records to evaluate their performance in real-life scenarios.
Prayook Jatesiktat, Ming Jeat Foo, Guan Ming Lim and Wei Tech Ang
232 Insider Threat Detection with Deep Neural Network [abstract]
Abstract: Insider threat detection has attracted a considerable interest from the researchers and industries. Existing work mainly focused on applying machine-learning techniques to detecting insider threat. However, this work requires “feature engineering” which is difficult and time-consuming. As we know, the deep learning technique can automatically learn powerful features. In this paper, we present a novel insider-threat detection method with Deep Neural Network (DNN) based on user behavior. Specifically, we use the LSTM-CNN framework to recognize user’s anomalous behavior. First, similar to natural language modeling, we use the Long Short Term Memory (LSTM) to learn the language of user behavior through user actions and extract abstracted temporal features. Second, the extracted features are converted to the fixed-size feature matrices and the Convolutional Neural Network (CNN) use these fixed-size feature matrices to detect insider threat. We conduct experiments on a public dataset of insider threats. Experimental results show that our method is indeed successful at detecting insider threat and we obtained AUC = 0.9449 in best case.
Fangfang Yuan, Yanan Cao, Yanmin Shang, Yanbing Liu, Jianlong Tan and Binxing Fang
80 Incentive Mechanism for Cooperative Intrusion Detection: an Evolutionary Game Approach [abstract]
Abstract: In Mobile Ad-Hoc Networks, cooperative intrusion detection is efficient and scalable to massively parallel attacks. However, due to concerns of privacy leakage and resource costs, if without enough incentives, most mobile nodes are often selfish and disinterested in helping others to detect an intrusion event, thus an efficient incentive mechanism is required. In this paper, we formulate the incentive mechanism for cooperative intrusion detection as an evolutionary game and achieve an optimal solution to help nodes decide whether to participate in detection or not. Our proposed mechanism can deal with the problems that cooperative nodes do not own complete knowledge about other nodes. We develop a game algorithm to maximize nodes’utility. Simulations demonstrate that our strategy can efficiently incentivize potential nodes to cooperate.
Yunchuan Guo, Han Zhang, Lingcui Zhang, Liang Fang and Fenghua Li

ICCS 2018 Main Track (MT) Session 8

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M2

Chair: Stefano Casarin

68 Elucidation of Mechanism for Reducing Porosity in Electric Arc Spraying through CFD [abstract]
Abstract: We elucidated the mechanism for reducing the porosity (a means for achieving smaller globules) through Computational Fluid Dynamics while focusing on the flow of compressed air. A simulation study revealed that a spray gun nozzle comprising a flow splitting plate located upstream of the arc point in the nozzle produces compression waves whereby the flow field made in the nozzle differs substantially from that made in a conventional, plate-less nozzle. Observation using a high-speed camera showed that smaller particles of the molten metal (globules) were made due to the plate, which means that the compression waves generated upstream of the arc point affect the formation of globules at the arc point.
Ryoji Tamaki and Masashi Yamakawa
159 nSharma: Numerical Simulation Heterogeneity Aware Runtime Manager for OpenFOAM [abstract]
Abstract: CFD simulations are a fundamental engineering application, implying huge workloads, often with dynamic behaviour due to runtime mesh refinement. Parallel processing over heterogeneous distributed memory clusters is often used to process such workloads. The execution of dynamic workloads over a set of heterogeneous resources leads to load imbalances that severely impacts execution time, when static uniform load distribution is used. This paper proposes applying dynamic, heterogeneity aware, load balancing techniques within CFD simulations. nSharma, a software package that fully integrates with OpenFOAM, is presented and assessed. Performance gains are demonstrated, achieved by reducing busy times standard deviation among resources, i.e. heterogeneous computing resources are kept busy with useful work due to an effective workload distribution. To best of authors' knowledge, nSharma is the first implementation and integration of heterogeneity aware load balancing in OpenFOAM and will be made publicly available in order to foster its adoption by the large community of OpenFOAM users.
Roberto Ribeiro, Luís Paulo Santos and João Miguel Nóbrega
213 High Performance Computational Hydrodynamic Simulations: UPC Parallel Architecture as a Future Alternative [abstract]
Abstract: Developments in high-performance computing (HPC) has today transformed the manner of how computational hydrodynamic (CHD) simulations are performed. Till now, the message passing interface (MPI) remains the common parallelism architecture and has been adopted widely in CHD simulations. However, its bottleneck problem remains for some large-scale cases due to delays in message passing whereby the total communication time may exceed the total simulation runtime with an increasing number of processes. In this study, we utilise an alter-native parallelism architecture, known as PGAS-UPC, to develop our own UPC-CHD model with a 2-step explicit scheme from Lax-Wendroff family of predictors-correctors. The model is evaluated on three incompressible, adiabatic viscous 2D flow cases having moderate flow velocities. Model validation is achieved by the good agreement between the predicted and respective analytical values. We then compare the computational performance between UPC-CHD and that of MPI in its base design in an SGI UV-2000 server with 100 cores. The former achieves a near 1:1 speedup which demonstrates its efficiency potential for very large-scale CHD simulations, while the later experiences bottleneck at some point. Extension of UPC-CHD remains our main objective which can be achieved by the following additions: (a) inclusions of other numerical schemes to accommodate for varying flow conditions, and (b) coupling UPC-CHD with Amazon Web Service (AWS) to further exploit its parallelism efficiency as the viable alternative.
Alvin Wei Ze Chew, Tung Thanh Vu and Adrian Wing-Keung Law
373 On Parametric Excitation for Exploration of Lava Tubes and Caves [abstract]
Abstract: Huge lava tubes with an approximate diameter of 65-225m were found on the surfaces of Moon and Mars in the late 2000's. It has been argued that the interior of the caves are spacious, and are suitable to build artificial bases with habitable features such as constant temperature, as well as protection from both meteorites and harmful radiation. In line of the above, a number of studies which regard the soft landing mechanisms on the bottom of the lava tubes have been proposed. In this paper, aiming to extend the ability to explore arbitrary surface caves, we propose a mechanism which is able to reach the ceiling of lava tubes. The basic concept of our proposed mechanism consists of a rover connected to an oscillating sample-gatherer, wherein the rover is able to adjust the length of the rope parametrically to increase the deflection angle by considering periodic changes in the pivot, and thus to ease the collection of samples by hitting against the ceiling of the cave. Relevant simulations confirmed our theoretical observations which predict the increase of deflection angle by periodically winding and rewinding the rope according to pivotal variations. We believe our proposed approach brings the building blocks to enable finer control of exploration mechanisms of lava tubes and narrow environments.
Victor Parque, Masato Kumai, Satoshi Miura and Miyashita Tomoyuki

Advances in High-Performance Computational Earth Sciences: Applications and Frameworks (IHPCES) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M3

Chair: Xing Cai

184 Enabling Adaptive Mesh Refinement for Single Components of ECHAM6 [abstract]
Abstract: Adaptive mesh refinement (AMR) can be used to improve climate simulations since these exhibit features on multiple scales which would be too expensive to resolve using a uniform mesh. In particular, paleo-climate simulations as done in the framework of the German PalMod project only allow for low resolution simulations. Instead of constructing a complex model like an earth system model (ESM) based on AMR, it is desirable to apply the AMR to single components of the existing ESM. We explore the applicability of a forest of trees data structure to incorporate AMR into an existing model. The performance of the data structure is tested by an idealized test case using a numerical scheme for tracer transport in ECHAM6. The numerical results show that the data structure is compatible with the data structure of the original model and also demonstrate improvements of the efficiency compared to non-adaptive meshes.
Yumeng Chen, Konrad Simon and Jörn Behrens
228 Efficient and accurate evaluation of Bezier tensor product surfaces [abstract]
Abstract: This article proposes a bivariate compensated Volk and Schumaker (CompVSTP) algorithm, which extends the compensated Volk and Schumaker (CompVS) algorithm, to evaluate Bezier tensor product surfaces with floating-point coefficients and coordinates. The CompVSTP algorithm is obtained by applying error-free transformations to improve the traditional Volk and Schumaker tensor product (VSTP) algorithm. We study in detail the forward error analysis of the VSTP, CompVS and CompVSTP algorithms. Our numerical experiments illustrate that the CompVSTP algorithm is much more accurate than the VSTP algorithm, relegating the influence of the condition numbers up to second order in the rounding unit of the computer.
Jing Lan, Hao Jiang and Peibing Du

Computational Optimization, Modelling and Simulation (COMS) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M4

Chair: Tiew On Ting

116 Optimizing Deep Learning by Hyper Heuristic Approach for Classifying Good Quality Images [abstract]
Abstract: Deep Convolutional Neural Network (CNN), which is one of the prominent deep learning methods, has shown a remarkable success in a variety of computer vision tasks, especially image classification. However, tuning CNN hyper-parameters requires expert knowledge and a large amount of manual effort of trial and error. In this work, we present the use of CNN on classifying good quality images versus bad quality images without understanding the image content. The well known data-sets were used for performance evaluation. More importantly we propose a hyper-heuristics approach on CNN for tuning its hyper-parameters. The proposed method encompasses of a high level strategy and various low level heuristics. The high level strategy utilises search performance to determine how to apply low level heuristics to automatically find an optimal set of CNN hyper-parameters. Our experiments show the effectiveness of this hyper-heuristic approach which can achieve high accuracy even when the training size is significantly reduced and conventional CNNs can no longer perform well. In short the proposed hyper-heuristic method does enhance CNN deep learning.
Muneeb Ul Hassan, Nasser R. Sabar and Andy Song
150 An Agent-based Distributed Approach for Bike Sharing Systems [abstract]
Abstract: Shared bikes are wildly welcomed and becoming increasing popular in the world, as a result, quite a few bike sharing systems have been conducted to provide services for the bicycle users. However, the current bike sharing systems are not fexible and humanly enough for the public bicycle users because of the fixed stations and not well considered for the users. In this paper, an agent-based distributed approach for bike sharing systems is proposed, this approach aims at helping users with obtaining a needed shared bike successfully and effciently. We pay more attention on user's preference to improve the satifised degree of the target shared bikes, meanwhile, agent trust and the probabilities are considered to improve the eciency and the success rate. To the end, a practical example simulation and results analysis are given to show its efficiency.
Ningkui Wang, Hayfa Zgaya, Philippe Mathieu and Slim Hammadi
325 A fast vertex-swap operator for the prize-collecting Steiner tree problem [abstract]
Abstract: The prize-collecting Steiner tree problem (PCSTP) is one of the important topics in computational science and operations research. The vertex-swap operation, which involves removal and addition of a pair of vertices based on a given minimum spanning tree (MST), has been proven very effective for some particular PCSTP instances with uniform edge costs. This paper extends the vertex-swap operator to make it applicable for solving more general PCSTP instances with varied edge costs. Furthermore, we adopt multiple dynamic data structures (such as ST trees and logarithmic-time heaps), which guarantee that the total time complexity for evaluating all the O(n2) possible vertex-swap moves is bounded by O(n) · O(m·log n), where n and m denote the number of ver- tices and edges respectively (if we choose to run Kruskal’s algorithm with a Fibonacci heap from scratch after swapping any pair of vertices, the total time complexity would reach O(n2)·O(m+n·logn)). We also prove that after applying the vertex-swap operation, the resulting solutions are necessarily MSTs (unless infeasible).
Yi-Fei Ming, Si-Bo Chen, Yong-Quan Chen and Zhang-Hua Fu
33 Solving CSS-Sprite Packing Problem Using a Transformation to the Probabilistic Non-Oriented Bin Packing Problem [abstract]
Abstract: CSS-Sprite is a technique of regrouping small images of a web page, called tiles, into images called sprites in order to reduce network transfer time. CSS-sprite packing problem is considered as an optimization problem. We approach it as a probabilistic non-oriented two-dimensional bin packing problem (2PBPP|R). Our main contribution is to allow tiles rotation while packing them in sprites. An experimental study evaluated our solution, which outperforms current solutions.
Soumaya Sassi Mahfoudh, Monia Bellalouna and Leila Horchani
71 Optimization of Resources Selection for Jobs Scheduling in Heterogeneous Distributed Computing Environments [abstract]
Abstract: In this work, we introduce slot selection and co-allocation algorithms for parallel jobs in distributed computing with non-dedicated and heterogeneous resources (clusters, CPU nodes equipped with multi-core processors, networks etc.). A single slot is a time span that can be assigned to a task, which is a part of a parallel job. The job launch requires a co-allocation of a specified number of slots starting and finishing synchronously. The challenge is that slots associated with different heterogeneous resources of distributed computing environments may have arbitrary start and finish points, different pricing policies. Some existing algorithms assign a job to the first set of slots matching the resource request without any optimization (the first fit type), while other algorithms are based on an exhaustive search. In this paper, algorithms for effective slot selection are studied and compared with known approaches. The novelty of the proposed approach is in a general algorithm selecting a set of slots efficient according to the specified criterion.
Victor V. Toporkov and Dmitry Yemelyanov

Applications of Matrix Methods in Artificial Intelligence and Machine Learning (AMAIML) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M5

Chair: Kourosh Modarresi

380 An Efficient Deep Learning Model for Recommender Systems [abstract]
Abstract: Recommending the best and optimal content to user is the essential part of digital space activities and online user interactions. For example, we like to know what items should be sent to a user, what promotion is the best one for a user, what web design would fit a specific user, what ad a user would be more susceptible to or what creative cloud package is more suitable to a specific user. In this work, we use deep learning (autoencoders) to create a new model for this purpose. The previous art includes using Autoencoders for numerical features only and we extend the application of autoencoders to non-numerical features. Our approach in coming up with recommendation is using “matrix comple-tion” approach which is the most efficient and direct way of finding and evaluating content recommendation.
Kourosh Modarresi and Jamie Diner
381 Standardization of Featureless Variables for Machine Learning Models using Natural Language Processing (NLP) [abstract]
Abstract: AI and machine learning are mathematical modeling methods for learning from data and producing intelligent models based on this learning. The data these models need to deal with, is normally a mixed of data type where both numerical (continuous) variables and categorical (non-numerical) data types. Most models in AI and machine learning accept only numerical data as their input and thus, standardization of mixed data into numerical data is a critical step when applying machine learning models. Having data in the standard shape and format that models require often a time consuming, nevertheless very significant step of the process.
Kourosh Modarresi and Abdurrahman Munir
382 Generalized Variable Conversion using K-means Clustering and Web Scraping [abstract]
Abstract: The world of AI and Machine Learning is the world of data and learning from data so the insights could be used for analysis and prediction. Almost all data sets are of mixed variable types as they may be quantitative (numerical) or qualitative (categorical). The problem arises from the fact that a long list of methods in Machine Learning such as “multiple regression”, “logistic regression”, “k-means clustering”, and “support vector machine”, all to be as examples of such models, designed to deal with numerical data type only. Though the data, that need to be analyzed and learned from, is almost always, a mixed data type and thus, standardization step must be undertaken for all these data sets. The standardization process involves the conversion of qualitative (categorical) data into numerical data type.
Kourosh Modarresi and Abdurrahman Munir
372 Parallel Latent Dirichlet Allocation on GPUs [abstract]
Abstract: Latent Dirichlet Allocation (LDA) is a statistical technique for topic modeling. Since it is very computationally demanding, its parallelization has garnered considerable interest. In this paper, we systematically analyze the data access patterns for LDA and devise suitable algorithmic adaptations and parallelization strategies for GPUs. Experiments on large-scale datasets show the effectiveness of the new parallel implementation on GPUs.
Gordon Moon, Israt Nisa, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Srinivasan Parthasarathy and P. Sadayappan
348 Improving Search through A3C Reinforcement Learning based Conversational Agent [abstract]
Abstract: We develop a reinforcement learning based search assistant which can assist users through a sequence of actions to enable them realize their intent. Our approach caters to subjective search where user is seeking digital assets such as images which is fundamentally different from the tasks which have objective and limited search modalities. Labeled conversational data is generally not available in such search tasks, to counter this problem we propose a stochastic virtual user which impersonates a real user for training and obtaining bootstrapped agent. We develop A3C algorithm based context preserving architecture to train agent and evaluate performance on average rewards obtained by the agent while interacting with virtual user. We evaluated our system with actual humans who believed that it helped in driving their search forward with appropriate actions without being repetitive while being more engaging and easy to use compared to conventional search interface.
Milan Aggarwal, Aarushi Arora, Shagun Sodhani and Balaji Krishnamurthy

Data Driven Computational Sciences (DDCS) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M6

Chair: Craig Douglas

24 Bisections-weighted-by-element-size-and-order algorithm to optimize direct solver performance on 3D hp-adaptive grids. [abstract]
Abstract: The $hp$-adaptive Finite Element Method ($hp$-FEM) generates a sequence of adaptive grids with different polynomial orders of approximation and element sizes. The $hp$-FEM delivers exponential convergence of the numerical error with respect to the mesh size. In this paper, we propose a heuristic algorithm to construct element partition trees. The trees can be transformed directly into the orderings, which control the execution of the multi-frontal direct solvers during the $hp$ refined finite element method. In particular, the orderings determine the number of floating point operations performed by the solver. Thus, the quality of the orderings obtained from the element partition trees is important for good performance of the solver. Our heuristic algorithm has been implemented in three-dimensions and tested on a sequence of $hp$-refined meshes, generated during the $hp$ finite element method computations. We compare the quality of the orderings found by the heuristic algorithm to those generated by alternative state-of-the-art algorithms. We show 50 percent reduction in the number of flops and execution time.
Hassan Aboueisha, Victor Calo, Konrad Jopek, Mikhail Moshkov, Anna Paszynska and Maciej Paszynski
41 Establishing EDI for a Clinical Trial of a Treatment for Chikungunya [abstract]
Abstract: Ellagic acid (EA) is a polyphenolic compound with antiviral activity against Chikungunya, a rapidly spreading tropical disease transmitted to humans by mosquitoes. The most common symptoms of chikungunya virus infection are fe-ver and joint pain. Other manifestations of infection can include encephalitis and an arthritis-like joint swelling with pain that may persist for months or years after the initial infection. In 2014, there were 11 locally-transmitted cases of Chikungunya virus in the U.S., all reported in Florida. There is no approved vac-cine to prevent or medicine to treat Chikungunya virus infections. In this study, the Estimated Daily Intake (EDI) of EA from the food supply established using the National Health and Nutrition Examination Survey (NHANES) is used to set a maximum dose of an EA formulation for the clinical trial.
Robert Lodder, Mark Ensor and Cynthia Dickerson
283 Deadlock Detection in MPI Programs Using Static Analysis and Symbolic Execution [abstract]
Abstract: Parallel computing using MPI has become ubiquitous on multi-node computing clusters. A common problem while developing parallel codes is determining whether or not a deadlock condition can exist. Ideally we do not want to have to run a large number of examples to find deadlock conditions through trial and error procedures. In this paper we describe a methodology using both static analysis and symbolic execution of a MPI program to make a determination when it is possible. We note that using static analysis by itself is insufficient for realistic cases. Symbolic execution has the possibility of creating a nearly infinite number of logic branches to investigate. We provide a mechanism to limit the number of branches to something computable. We also provide examples and pointers to software necessary to test MPI programs.
Craig C. Douglas and Krishanthan Krishnamoorthy

Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M7

Chair: Rodrigo Weber dos Santos

383 Development of Octree-Based High-Quality Mesh Generation Method for Biomedical Simulation [abstract]
Abstract: This paper proposes a robust high-quality finite element mesh generation method which is capable of modeling problems with complex geometries and multiple materials and suitable for the use in biomedical simulation. The previous octree-based method can generate a high-quality mesh with complex geometries and multiple materials robustly allowing geometric approximation. In this study, a robust mesh optimization method is developed combining smoothing and topology optimization in order to correct geometries guaranteeing element quality. Through performance measurement using sphere mesh and application to HTO tibia mesh, the validity of the developed mesh optimization method is checked.
Keisuke Katsushima, Kohei Fujita, Tsuyoshi Ichimura, Muneo Hori and Lalith Maddegedara
258 1,000x Faster than PLINK: Genome-Wide Epistasis Detection with Logistic Regression Using Combined FPGA and GPU Accelerators [abstract]
Abstract: Logistic regression as implemented in PLINK is a powerful and commonly used framework for assessing gene-gene (GxG) interactions. However, fitting regression models for each pair of markers in a genome-wide dataset is a computationally intensive task. Performing billions of tests with PLINK takes days if not weeks, for which reason pre-filtering techniques and fast epistasis screenings are applied to reduce the computational burden. Here, we demonstrate that employing a combination of a Xilinx UltraScale KU115 FPGA and an Nvidia Tesla P100 GPU leads to runtimes of only minutes for logistic regression GxG tests on a genome-wide level. In particular, a dataset with 53,000 samples genotyped at 130,000 SNPs was analyzed in 8 minutes, resulting in a speedup of more than 1,000 when compared to PLINK v1.9 using 32 threads on a server-grade computing platform. Furthermore, on-the-fly calculation of test statistics, p-values and LD-scores in double precision make commonly used pre-filtering strategies obsolete.
Lars Wienbrandt, Jan Christian Kässens, Matthias Hübenthal and David Ellinghaus
280 Combining Molecular Dynamics Simulations and Informatics to Model Nucleosomes and Chromatin [abstract]
Abstract: Nucleosomes are the fundamental building blocks of chromatin, the biomaterial that houses the genome in all higher organisms. A nucleosome consist of 145-147 base pairs of double strained DNA wrapped approximately 1.7 times around eight histones. There are almost 100 atomic resolution structures of the nucleosome available from the protein data bank. Collectively they explore histone mutations, species variations, binding of drugs and ionic effects, but only three sequences of DNA. Given a four-letter code (A, C, G, T) for DNA there are on the order of 4^147 ~ 10^88 possible sequences of DNA that can form a nucleosome. Exhaustive studies are not possible. Fortunately, next generation sequencing enables researchers to identify a single nucleosome of interest, and today’s super computing resources enable simulation ensembles representing different realizations of the nucleosome to be accumulated overnight as a means of investigating its structure and dynamics. Here we present a workflow that integrates molecular simulation and genome browsing to manage such efforts. The workflow is exploited to study nucleosome positioning in atomic detail and its relation to chromatin folding in coarse-grained detail. The exchange of data between physical and informatics models is bidirectional. This allows cross validation of simulation and experiment and the discovery of structure‑function relationships. All simulation and analysis data from the studies are available on the TMB-iBIOMES server: http://dna.engr.latech.edu/ibiomes.html.
Ran Sun, Zilong Li and Thomas Bishop
169 A Stochastic Model to Simulate the Spread of Leprosy in Juiz de Fora [abstract]
Abstract: The Leprosy, also known as Hansen's disease, is an infectious disease in which the main etiological agent is the Mycobacterium leprae. The disease mainly affects the skin and peripheral nerves and can cause physical disabilities. For this reason, represents a global public health concern, especially in Brazil, where more than twenty-five thousand of new cases were reported in 2016. This work aims to simulate the spread of Leprosy in a Brazilian city, Juiz de Fora, using the SIR model and considering some of its pathological aspects. SIR models divide the studied population into compartments in relation to the disease, in which S, I and R compartments refer to the groups of susceptible, infected and recovered individuals, respectively. The model was solved computationally by a stochastic approach using the Gillespie algorithm. Then, the results obtained by the model were validated using the public health records database of Juiz de Fora.
Vinícius Clemente Varella, Aline Mota Freitas Matos, Henrique Couto Teixeira, Angélica Da Conceição Oliveira Coelho, Rodrigo Santos and Marcelo Lobosco

Computational Finance and Business Intelligence (CFBI) Session 2

Time and Date: 15:45 - 17:25 on 11th June 2018

Room: M8

Chair: Yong Shi

305 Parallel Harris Corner Detection on Heterogeneous Architecture [abstract]
Abstract: Corner detection is a fundamental step for many image processing applications including image enhancement, object detection and pattern recognition. Recent years, the quality and the number of images are higher than before, and applications mainly perform processing on videos or image flow. With the popularity of embedded devices, the real- time processing on the limited computing resources is an essential problem in high-performance computing. In this paper, we study the parallel method of Harris corner detection and implement it on a heterogeneous architecture using OpenCL. We also adopt some optimization strategy on the many-core processor. Experimental results show that our parallel and optimization methods highly improve the performance of Harris algorithm on the limited computing resources.
Yiwei He, Yue Ma and Dalian Liu
307 A New Method for Structured Learning with Privileged Information [abstract]
Abstract: In this paper, we present a new method JKSE+ for structured learning. Compared with some classical mathods such as SSVM and CRFs, the optimization problem in JKSE+ is a convex quadratical problem and can be easily solved because it is based on JKSE. By incorporating the privileged information into JKSE, the performance of JKSE+ is improved. We apply JKSE+ to the problem of object detec- tion, which is a typical one in structured learning. Some experimental results show that JKSE+ performs better than JKSE.
Shiding Sun and Chunhua Zhang
312 An Effective Model between Mobile Phone Usage and P2P Default Behavior [abstract]
Abstract: P2P online lending platforms have become increasingly developed. However, these platforms may suffer a serious loss caused by default behaviors of borrowers. In this paper, we present an effective default behavior prediction model to reduce default risk in P2P lending. The proposed model uses mobile phone usage data, which are generated from widely used mobile phones. We extract features from five aspects, including consumption, social network, mobility, socioeconomic, and individual attribute. Based on these features, we propose a joint decision model, which makes a default risk judgment through combining Random Forests with Light Gradient Boosting Machine. Validated by a real-world dataset collected by a mobile carrier and a P2P lending company in China, the proposed model not only demonstrates satisfactory performance on the evaluation metrics but also outperforms the existing methods in this area. Based on these results, the proposed model implies the high feasibility and potential to be adopted in real-world P2P online lending platforms.
Huan Liu, Lin Ma, Xi Zhao and Jianhua Zou
340 A Novel Data Mining Approach towards Human Resource Performance Appraisal [abstract]
Abstract: Performance appraisal has always been a very important research field in human resource management. A reasonable performance appraisal plan lays a solid foundation for the development of an enterprise. Traditional performance appraisal programs are mostly labor-based, with difficulty in fairly examining employee results. Furthermore, as globalization and technology advance, enterprises meet fast changing strategic goals and increasing cross-functional tasks, which raises new challenges for performance appraisal. Starting from the above angles, this paper sets up a data mining-based performance appraisal framework, to conduct comprehensive assessment of employees on their ability to work and job competency. This framework has been successfully applied, providing a reliable basis for human resources management.
Pei Quan, Ying Liu and Yong Shi
341 Word Similarity Fails in Multiple Sense Word Embedding [abstract]
Abstract: Word representation is one foundational research in natu- ral language processing which full of challenges compared to other elds such as image and speech processing. It embeds words to a dense low- dimensional vector space and is able to learn syntax and semantics at the same time. But this representation only get one single vector for a word no matter it is polysemy or not. In order to solve this problem, sense information are added in the multiple sense language models to learn alternative vectors for each single word. However, as the most popular measuring method in single sense language models, word similarity did not get the same performance in multiple situation, because word simi- larity based on cosine distance doesn’t match annotated similarity scores. In this paper, we analyzed similarity algorithms and found there is ob- vious gap between cosine distance and benchmark datasets, because the negative internal in cosine space does not correspond to manual scores space and cosine similarity did not cover semantic relatedness contained in datasets. Based on this, we proposed a new similarity methods based on mean square error and the experiments showed that our new eval- uation algorithm provided a better method for word vector similarity evaluation.
Yong Shi, Yuanchun Zheng, Kun Guo, Wei Li and Luyao Zhu