Applications of Matrix Computational Methods in the Analysis of Modern Data (AMCMD) Session 1

Time and Date: 10:35 - 12:15 on 12th June 2017

Room: HG E 41

Chair: Raja Velu

625	The epsilon-algorithm in matrix computations [abstract] Abstract: E-algorithm is designed to expedite iterative algorithms in matrix computation. In this talk, this algorithm is explained in the connection with Krylov-Subspace Methods and matrix completion problem.	Walter Gander
39	Clustering Mixed-Attribute Data using Random Walk [abstract] Abstract: Most clustering algorithms rely in some fundamental way on a measure of either similarity or distance, either between objects themselves, or between objects and cluster centroids. When the dataset contains mixed attributes, defining a suitable measure can be problematic. This paper presents a general graph-based method for clustering mixed-attribute datasets that does not require any explicit measure of similarity or distance. Empirical results on a range of well-known datasets using a range of evaluation measures show that the method achieves performance competitive with traditional clustering algorithms that require explicit calculation of distance or similarity, as well as with more recently proposed clustering algorithms based on matrix factorization.	Andrew Skabar
274	Regularized Computation of Oscillatory Integrals with Stationary Points [abstract] Abstract: Ability to calculate integrals of rapidly oscillating functions is crucial for solving many problems in optics, electrodynamics, quantum mechanics, nuclear physics, and many other areas. The article considers the method of computing oscillatory integrals with the help of the transition to the numerical solution of the system of ordinary differential equations. Using the Levin’s collocation method, we reduce the problem to solving a system of linear algebraic equations. In the case where the phase function has stationary points, (its derivative vanishes on the interval of integration) the solution of the corresponding system becomes an ill-posed task. The regularized algorithm presented in the article describes the stable method of integration of rapidly oscillating functions at the presence of stationary points. Performance and high accuracy of the algorithms illustrated by various examples.	Konstantin P. Lovetskiy, Leonid A. Sevastianov and Nikolai Ed. Nikolaev
536	Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices [abstract] Abstract: A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well.	Tingxing Dong, Azzam Haidar, Stanimire Tomov and Jack Dongarra

Applications of Matrix Computational Methods in the Analysis of Modern Data (AMCMD) Session 2

Time and Date: 15:45 - 17:25 on 12th June 2017

Room: HG E 41

Chair: Kourosh Modarresi

569	Separable Covariance Matrices and Kronecker Approximation [abstract] Abstract: Abstract: When a model structure allows for the error covariance matrix to be written in the form of the Kronecker product of two positive definite covariance matrices, the estimation of the relevant parameters is intuitive and easy to carry out. In many time series models, the covariance matrix does not have a separable structure. Van Loan and Pitsanis (1993) provide an approximation with Kronecker products.In this paper, we apply their method to estimate the parameters of a multivariate regression model withautoregressive errors. An illustrative example is also provided	Raja Velu and Krzysztof Herman
137	Distributed Bayesian Probabilistic Matrix Factorization [abstract] Abstract: Using the matrix factorization technique in machine learning is very common mainly in areas like recommender systems. Despite its high prediction accuracy and its ability to avoid over-fitting of the data, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of the prohibitive cost. In this paper, we propose a distributed high-performance parallel implementation of the BPMF using Gibbs sampling on shared and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations.	Tom Vander Aa, Imen Chakroun and Tom Haber
551	Finding the Winner of a Hypothesis Test via Sample Allocation Optimization [abstract] Abstract: Hypothesis testing, which is one of the most important and practical tools for comparing different distributions, has been recently drawn a huge interest among the practitioners. In many real-life scenarios, among many different options, one wants to find the best option in the quickest time. Statistical hypothesis tests were previous approaches for addressing this problem and had shown to achieve a satisfactory performance in the presence of large number of samples. Multi-armed bandit methods which were first introduced in the mid-twenty century, has been recently shown promising performances for this task as well. Although bandit methods work well for regret minimization, they are not well-designed for finding the winner of a campaign with some statistical guarantee. In this paper, we will introduce a new algorithm based on the sample allocation optimization, which finds the winner of the campaign with statistical guarantees, in the quickest time.	Kourosh Modarresi and Khashayar Khosravi
575	Efficient Hybrid Algorithms for Computing Clusters Overlap [abstract] Abstract: Every year, marketers target different segments of the population with multitudes of advertisements. However, millions of dollars are wasted targeting similar segments with different advertisements. Furthermore, it is extremely expensive to compute the similarity between the segments because these segments can be very large, on the order of millions and billions. In this project, we aim to come up with a fast algorithm that can determine the similarity between large segments with a higher degree of accuracy than other known methods.	Kourosh Modarresi, Aran Nayebi and Yi Liu
520	Optimizing Fiber to the Premises Build-Out Plans Using Geoqueries and Spectral Graph Paritioning [abstract] Abstract: The build-out of fiber internet connectivity to the premises (FTTP) is a large scale, capital intensive endeavor. We developed a method that allows us to take advantage of economies of scale by finding regions of adjacent Distribution Areas with high similarity in cost structure. The algorithm employs geographic queries and spectral clustering on weighted large scale adjacency graphs to find these high adjacency regions. We present the business context, development of a similarity measure based on build out cost structure, the application of spectral clustering on the graphs, and post processing steps.	Susanne Halstead, Halim Damerdji and Andrew Domenico