ICCS 2017 Main Track (MT) Session 2

Time and Date: 15:45 - 17:25 on 12th June 2017

Room: HG F 30

Chair: Anna-Lena Lamprecht

341 Resolving Entity Morphs based on Character-Word Embedding [abstract]
Abstract: Morph is a special type of fake alternative names. Internet users use morphs to achieve certain goals such as expressing special sentiment or avoiding censorship. For example, Chinese internet users often replace “马景涛” (Ma Jingtao) with “咆哮教主”(Roar Bishop). “咆哮教主”(Roar Bishop) is a morph and “马景涛” (Ma Jingtao) is the target entity of “咆哮教主”(Roar Bishop). This paper mainly focuses on morph resolution: given a morph, figure out the entity that it really refers to. After analysis the common characteristic of morphs and target entities from cross-source corpora, we exploit temporal and semantic constraints to collect target candidates. Next, we propose a character-word embeddings framework to rank target candidates. Our method does not need any human-annotated data. Experimental results demonstrate that our approach outperforms the state-of-the-art method. The results also show that the performance is better when morphs share any character with target entities.
Ying Sha, Zhenhui Shi, Rui Li, Qi Liang, Bin Wang and Li Guo
273 Graph Ranking Guarantees for Numerical Approximations to Katz Centrality [abstract]
Abstract: Graphs and networks are prevalent in modeling relational datasets from many fields of research. Using iterative solvers to approximate graph measures (specifically Katz Centrality) allows us to obtain a ranking vector, consisting of a number for each vertex in the graph identifying its relative importance. We use the residual to accurately estimate how much of the ranking from an approximate solution matches the ranking given by the exact solution. Using probabilistic matrix norms and applying numerical analysis to the computation of Katz Centrality, we obtain bounds on the accuracy of the approximation compared to the exact solution with respect to the highly ranked nodes. This relates the numerical accuracy of the linear solver to the data analysis accuracy of finding the correct ranking. In particular, we answer the question of which pairwise rankings are reliable given an approximate solution to the linear system. Experiments on many real-world networks up to several million vertices and several hundred million edges validate our theory and show that we are able to accurately estimate large portions of the approximation. By analyzing convergence error, we develop confidence in the ranking schemes of data mining.
Eisha Nathan, Geoffrey Sanders, James Fairbanks, Van Henson and David Bader
447 Simulating a Search Engine Service focusing on Network Performance [abstract]
Abstract: Large-scale computer systems like Search Engines provide services to thousands of users, and their user demand can change suddenly. This unstable demand impacts sensitively to the service components (like network and hosts). The system should be able to address unexpected scenarios; otherwise, users would be forced to leave the service. Creating tests scenarios is an alternative to deal with this variable workload before implementing new configuration in the system. However, the complexity and size of the system are a huge constraint to create physical models. Simulation can help to test promising models of search engines. In this paper we propose a method to model a Search Engine Service (SES) on small scale to analyze the impact of different configurations. We model the interaction of a typical search engine with three main components: a Front Service (FS), a Cache Service (CS) and an Index service (IS). The FS takes as input a query of user and search into a database with the support of a CS to improve the performance of the system. The proposed model processes a trace file from a real SES and based on the dependency relation among the messages, services and queries, it is modeled the full functionality of the SES. The output is, on one hand a simulated trace file to compare the model with the real system and on the other hand statistics about performance. The simulation allow us to test configurations of FS, CS, and IS, which can be unlikely in the real system.
Joe Carrión, Daniel Franco and Emilo Luque
245 Fully-Dynamic Graph Algorithms with Sublinear Time Inspired by Distributed Computing [abstract]
Abstract: We study dynamic graphs in the fully-dynamic centralized setting. In this setting the vertex set of size n of a graph G is fixed, and the edge set changes step-by-step, such that each step either adds or removes an edge. The goal in this setting is maintaining a solution to a certain problem (e.g., maximal matching, edge coloring) after each step, such that each step is executed efficiently. The running time of a step is called update-time. One can think of this setting as a dynamic network that is monitored by a central processor that is responsible for maintaining the solution. Currently, for several central problems, the best-known deterministic algorithms for general graphs are the naive ones which have update-time O(n). This is the case for maximal matching and proper O(Delta)-edge-coloring. The question of existence of sublinear in n update-time deterministic algorithms for dense graphs and general graphs remained wide open. In this paper we address this question by devising sublinear update-time deterministic algorithms for maximal matching in graphs with bounded neighborhood independence o(n/ log^2 n), and for proper O(Delta)-edge-coloring in general graphs. The family of graphs with bounded neighborhood independence is a very wide family of dense graphs. In particular, graphs with constant neighborhood independence include line-graphs, claw-free graphs, unit disk graphs, and many other graphs. Thus, these graphs represent very well various types of networks. For graphs with constant neighborhood independence, our maximal matching algorithm has ~O(\sqrt n) update-time. Our O(Delta)-edge-coloring algorithms has ~O(\sqrt Delta) update-time for general graphs. In order to obtain our results we employ a novel approach that adapts certain distributed algorithms of the LOCAL setting to the centralized fully-dynamic setting. This is achieved by optimizing the work each processors performs, and efficiently simulating a distributed algorithm in a centralized setting. The simulation is efficient thanks to a careful selection of the network parts that the algorithm is invoked on, and by deducing the solution from the additional information that is present in the centralized setting, but not in the distributed one. Our experiments on various network topologies and scenarios demonstrate that our algorithms are highly-efficient in practice. We believe that our approach is of independent interest and may be applicable to additional problems.
Leonid Barenboim and Tzalik Maimon