Applications of Matrix Computational Methods in the Analysis of Modern Data (AMCMD) Session 2

Time and Date: 15:45 - 17:25 on 12th June 2017

Room: HG E 41

Chair: Kourosh Modarresi

569	Separable Covariance Matrices and Kronecker Approximation [abstract] Abstract: Abstract: When a model structure allows for the error covariance matrix to be written in the form of the Kronecker product of two positive definite covariance matrices, the estimation of the relevant parameters is intuitive and easy to carry out. In many time series models, the covariance matrix does not have a separable structure. Van Loan and Pitsanis (1993) provide an approximation with Kronecker products.In this paper, we apply their method to estimate the parameters of a multivariate regression model withautoregressive errors. An illustrative example is also provided	Raja Velu and Krzysztof Herman
137	Distributed Bayesian Probabilistic Matrix Factorization [abstract] Abstract: Using the matrix factorization technique in machine learning is very common mainly in areas like recommender systems. Despite its high prediction accuracy and its ability to avoid over-fitting of the data, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of the prohibitive cost. In this paper, we propose a distributed high-performance parallel implementation of the BPMF using Gibbs sampling on shared and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations.	Tom Vander Aa, Imen Chakroun and Tom Haber
551	Finding the Winner of a Hypothesis Test via Sample Allocation Optimization [abstract] Abstract: Hypothesis testing, which is one of the most important and practical tools for comparing different distributions, has been recently drawn a huge interest among the practitioners. In many real-life scenarios, among many different options, one wants to find the best option in the quickest time. Statistical hypothesis tests were previous approaches for addressing this problem and had shown to achieve a satisfactory performance in the presence of large number of samples. Multi-armed bandit methods which were first introduced in the mid-twenty century, has been recently shown promising performances for this task as well. Although bandit methods work well for regret minimization, they are not well-designed for finding the winner of a campaign with some statistical guarantee. In this paper, we will introduce a new algorithm based on the sample allocation optimization, which finds the winner of the campaign with statistical guarantees, in the quickest time.	Kourosh Modarresi and Khashayar Khosravi
575	Efficient Hybrid Algorithms for Computing Clusters Overlap [abstract] Abstract: Every year, marketers target different segments of the population with multitudes of advertisements. However, millions of dollars are wasted targeting similar segments with different advertisements. Furthermore, it is extremely expensive to compute the similarity between the segments because these segments can be very large, on the order of millions and billions. In this project, we aim to come up with a fast algorithm that can determine the similarity between large segments with a higher degree of accuracy than other known methods.	Kourosh Modarresi, Aran Nayebi and Yi Liu
520	Optimizing Fiber to the Premises Build-Out Plans Using Geoqueries and Spectral Graph Paritioning [abstract] Abstract: The build-out of fiber internet connectivity to the premises (FTTP) is a large scale, capital intensive endeavor. We developed a method that allows us to take advantage of economies of scale by finding regions of adjacent Distribution Areas with high similarity in cost structure. The algorithm employs geographic queries and spectral clustering on weighted large scale adjacency graphs to find these high adjacency regions. We present the business context, development of a similarity measure based on build out cost structure, the application of spectral clustering on the graphs, and post processing steps.	Susanne Halstead, Halim Damerdji and Andrew Domenico