Applications of Matrix Computational Methods in the Analysis of Modern Data (MATRIX) Session 1

Time and Date: 10:35 - 12:15 on 6th June 2016

Room: Rousseau Center

Chair: Kouroush Modarresi

74 Fast and accurate finite-difference method solving multicomponent Smoluchowski coagulation equation with source and sink terms [abstract]
Abstract: In this work we present the novel numerical method solving multicomponent Smoluchowski coagulation equation. The new method is based on application of the fast algorithms of linear algebra and the fast arithmetics in tensor train format to acceleration of well-known highly accurate second order Runge-Kutta scheme. After the application of proposed algorithmic optimizations we obtain a dramatical speedup of the classical methodology without loss of the accuracy. We test our solver the problem with source and sink terms and obtain that the TT-ranks of numerical solution do not grow tremendously even with the insert of the physical effects into the basic Smolushowski coagulation model.
Alexander Smirnov, Sergey Matveev, Dmitry Zheltkov, Eugene Tyrtyshnikov
95 A Riemannian Limited-Memory BFGS Algorithm for Computing the Matrix Geometric Mean [abstract]
Abstract: Various optimization algorithms have been proposed to compute the Karcher mean (namely the Riemannian center of mass in the sense of the affine-invariant metric) of a collection of symmetric positive-definite matrices. Here we propose to handle this computational task with a recently developed limited-memory Riemannian BFGS method using an implementation tailored to the symmetric positive-definite Karcher mean problem. We also demonstrate empirically that the method is best suited for large-scale problems in terms of computation time and robustness when comparing to the existing state-of-the-art algorithms.
Xinru Yuan, Wen Huang, Pierre-Antoine Absil, Kyle Gallivan
256 GPU optimization for data analysis of Mario Schenberg spherical detector [abstract]
Abstract: The Gravitational Wave (GW) detectors, advanced LIGO and advanced Virgo, are acquiring the potential for recording unprecedented astronomic data for astrophysical events. The Mario Schenberg detector (MSD) is a smaller scale experiment that could participate to this search. Previously, we developed a first data analysis pipeline (DAP) to transform the detector's signal into relevant GW information. This pipeline was extremely simplified in order to be executed in low-latency. In order to improve the analysis methods while keeping a low execution time, we propose three different parallel approaches using GPU/CUDA. We implemented the parallel models using cuBLAS library functions and enhance its capability with asynchronous processes in CUDA streams. Our novel model achieves performances that surpass the serial implementation within the data analysis pipeline by a speed up of 21% faster than the traditional model. This first result is part of a more comprehensive approach, in which all DAP modules that can be parallelized, are being re-written in GPGP/CUDA, and then tested and validated within the MSD context.
Eduardo C. Vasconcellos, Esteban W. G. Clua, Reinaldo R. Rosa, João G. F. M. Gazolla, Nuno César Da R. Ferreira, Victor Carlquist, Carlos F. Da Silva Costa

Applications of Matrix Computational Methods in the Analysis of Modern Data (MATRIX) Session 2

Time and Date: 14:30 - 16:10 on 6th June 2016

Room: Rousseau Center

Chair: Kouroush Modarresi

467 Algorithmic Approach for Learning a Comprehensive View of Online Users [abstract]
Abstract: Online users may use many different channels, devices and venues for any online user experience. To make all services such as web design, ads, web content, shopping, personalized for every user; we need to be able to recognize them regardless of device, channels and venues they are using. This, in turn, requires building up a comprehensive view of the user which includes all of their behavioral characteristics - that are spread all over these different venues. This would not be possible without having all behavioral related data of the user which requires the capacity of connecting the user all over the devices, and channels, so to have all of their behavior under a single view. This work is a major attempt in doing this using only behavioral data of users while protecting the user’s privacy.
Kourosh Modarresi
473 Recommendation System Based on Complete Personalization [abstract]
Abstract: Current recommender systems are very inefficient. There are many metrics that are used to measure the effectiveness of recommender systems. These metrics often include “conversion rate” and “click through rate”. Recently, these rates are in low single digit (less than 10%). In other words, more than 90% of times, the model that the targeting system is based on, produces noise. The belief in this work is that the main problem leading to getting such unsatisfactory outcomes is the modeling problem. Much of the modeling problem could be represented and exemplified in treating users and items as member of clusters (segments). In this work, we consider full personalization of recommendation systems. We aim at personalization of users and items simultaneously.
Kourosh Modarresi
520 Learning Vector-Space Representations of Items for Recommendations using Word Embedding Models [abstract]
Abstract: We present a method of generating item recommendations by learning item feature vector embeddings. Our work is analogous to approaches like Word2Vec or Glove used to generate a good vector representation of words in a natural language corpus. We treat the items that a user interacted with as analogous to words and the string of items interacted with in a session as sentences. Our embedding generates semantically related clusters and the item vectors generated can be used to compute item similarity which can be used to drive product recommendations. Our method also allows us to use the feature vectors in other machine learning systems. We validate our method on the MovieLens dataset.
Balaji Krishnamurthy, Nikaash Puri
530 Improved Mahout Decision Tree Builders [abstract]
Abstract: The default decision tree builder in Mahout 0.9 has severe implementation problems that build small, weak decision trees which limit its usefulness in production situations when the features are strictly numerical. In this talk I will describe a simple, more powerful decision tree builder that systematically produces regression models with much better AUCs without sacrificing performance. The new builder also creates models that are of relatively compact size (about 30-50 Kb in the tested data sets), as compared to the large (500 Kb – 2 Mb) models generated from a fixed version of the original decision tree builder. I will describe the problem with the Mahout decision tree builder and the simple replacement and how they work, and will compare the model size, build times, and AUC performance on several historic data sets from Adobe Target from customers in different industries to show that the improvement is very general.
John Kucera