## Session1 11:20 - 13:00 on 10th June 2014

### Main Track (MT) Session 8

#### Chair: Michela Taufer

 115 The influence of network topology on reverse-engineering of gene-regulatory networks [abstract]Abstract: Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern computational biology investigations into gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from gene expression time-course data. Common mathematical formalisms used to represent such models capture both the relative weight or strength of a regulator gene and the type of the regulator (activator, repressor) with a single model parameter. The goal of this study is to quantify the role this parameter plays in terms of the computational performance of the reverse-engineering process and the predictive power of the inferred GRN models. We carried out three sets of computational experiments on a GRN system consisting of 22 genes. While more comprehensive studies of this kind are ultimately required, this computational study demonstrates that models with similar training (reverse-engineering) error that have been inferred under varying degrees of a priori known topology information, exhibit considerably different predictive performance. This study was performed with a newly developed multiscale modeling and simulation tool called MultiGrain/MAPPER. Alexandru Mizeranschi, Noel Kennedy, Paul Thompson, Huiru Zheng, Werner Dubitzky 188 Maximizing the Cumulative Influence through a Social Network when Repeat Activation Exists [abstract]Abstract: We study the problem of employing social networks for propagate influence when repeat activation is involved. While influence maximization has been extensively studied as the fundamental solution, it neglects the reality that a user may purchase a product/service repeatedly, incurring cumulative sales of the product/service. In this paper, we explore a new problem of cumulative influence maximization that brings the influence maximization a step closer to real-world viral marketing applications. In our problem setting, repeat activation exists and we aim to find a set of initial users, through which the maximal cumulative influence can be stimulated in a given time period. To describe the repeat activation behavior, we adopt the voter model to reflect the variation of activations over time. Under the voter model, we formulate the maximization problem and present an effective algorithm. We test and compare our method with heuristic algorithms on real-world data sets. Experimental results demonstrate the utility of the proposed method. Chuan Zhou, Peng Zhang, Wenyu Zang, Li Guo 320 Mining Large-scale Knowledge about Events from Web Text [abstract]Abstract: This paper addresses the problem of automatic acquisition of semantic relations between events. Since most of the previous researches rely on annotated corpus, the main challenge is the need for more generic methods to identify related event pairs and to extract event-arguments (particularly the predicate, subject and object). Motivated by this background, we develop a three-phased approach that acquires causality from the Web. Firstly, we use explicit connective markers (such as “because”) as linguistic cues to discover causal related events. Then, we extract the event-arguments based on local dependency parse trees of event expressions. In the final phase, we propose a statistical model to measure the potential causal relations. The present results of our empirical evaluation on a large-scale Chinese Web corpus have shown that (a) the use of local dependency tree extensively improves both the accuracy and recall of event-arguments extraction task; (b) our measure which is an improvement on PMI has a better performance. Yanan Cao, Peng Zhang, Jing Guo, Li Guo 200 Discovering Multiple Diffusion Source Nodes in Social Networks [abstract]Abstract: Social networks have greatly amplified spread of information across different communities. However, we recently observe that various malicious information, such as computer virus and rumors, are broadly spread via social networks. To restrict these malicious information, it is critical to develop effective method to discover the diffusion source nodes in social networks. Many pioneer works have explored the source node identification problem, but they all based on an ideal assumption that there is only a single source node, neglecting the fact that malicious information are often diffused from multiple sources to intentionally avoid network audit. In this paper, we present a multi-source locating method based on a given snapshot of partially and sparsely observed infected nodes in the network. Specifically, we first present a reverse propagation method to detect recovered and unobserved infected nodes in the network, and then we use community cluster algorithms to change the multi-source locating problem into a bunch of single source locating problems. At the last step, we identify the nodes having the largest likelihood estimations as the source node on the infected clusters. Experiments on three different types of complex networks show the performance of the proposed method. Wenyu Zang, Peng Zhang, Chuan Zhou, Li Guo 293 The Origin of Control in Complex Networks [abstract]Abstract: Recent work at the borders of network science and control theory have begun to investigate the control of complex systems by studying their underlying network representations. A majority of the work in this nascent field has looked at the number of controls required in order to fully control a network. In this talk I will present research that provides a ready breakdown of this number into categories that are both easy to observe in real world networks as well as instructive in understanding the underlying functional reasons for why the controls must exist. This breakdown is able to shed light on several observations made in the previous literature regarding controllability of networks. This decomposition produces a mechanism to cluster networks into classes that are consistent with their large scale architecture and purpose. Finally, we observe that synthetic models of formation generate networks with control breakdowns substantially different from what is observed in real world networks. Justin Ruths

### Agent Based Simulations, Adaptive Algorithms and Solvers (ABS-AA-S) Session 1

#### Chair: Maciej Paszynski

 130 PETIGA: HIGH-PERFORMANCE ISOGEOMETRIC ANALYSIS OF PHASE-FIELD MODELS [abstract]Abstract: \begin{document} We have developed fast implementations of B-spline/NURBS based finite element solvers, written using PETSc. PETSc is frequently used in software packages to leverage its optimized and parallel implementation of solvers, however we also are using PETSc data structures to assemble the linear systems. These structures in PETSC (called DA‚Äôs) were originally intended for the parallel assembly of linear systems resulting from finite differences. We have reworked this structure for linear systems resulting from isogeometric analysis based on tensor product spline spaces. The result of which is the PetIGA framework for solving problems using isogeometric analysis which is scalable and greatly simplified over previous solvers. Our infrastructure has also allowed us to develop scalable solvers for a variety of problems. We have chosen to pursue nonlinear time dependent problems~\cite{PetIGAp, PetIGAc}, such as: \begin{itemize} \item Cahn-Hilliard \item Navier-Stokes-Korteweg \item Variational Multiscale for Navier-Stokes \item Diffusive Wave Approximation to Shallow Water Equations \item Phase-Field Crystal (PFC) equation and its time integration \item Divergence-conforming B-spline modeling of nanoparticle suspensions \end{itemize} We also have solvers for an assortment of linear problems: Poisson, elasticity, Helmholtz, thin shells, advection-diffusion, and diffusion-reaction. All solvers are written to be inherently parallel and run on anything from a laptop to a supercomputer such as Shaheen, KAUST‚Äôs IBM-BlueGeneP supercomputer. In this presentation we will focus on new time integration techniques for phase-field modeling which are energy stable and allow for stable linearizations of the underlying non-linear model~\cite{PFC}. \begin{thebibliography}{99} \setlength{\parskip}{0pt} \bibitem{PetIGAp} N. Collier, L. Dalcin, and V.M. Calo, PetIGA: High-Performance Isogeometric Analysis,'' submitted, 2013. \bibitem{PetIGAc} L. Dalcin and N. Collier, PetIGA: A framework for high performance Isogeometric Analysis,'' https://bitbucket.org/dalcinl/petiga/, 2013 \bibitem{PFC} P. Vignal, L. Dalcin, D.L. Brown, N. Collier, and V.M. Calo, Energy-stable time-discretizations for the phase-Ô¨Åeld crystal equation,'' in preparation, 2014. \end{thebibliography} Victor Calo, Nathan Collier, Lisandro Dalcin and Philippe Vignal 44 Graph grammar based multi-thread multi-frontal direct solver with Galois scheduler [abstract]Abstract: In this paper, we present a multi-frontal solver algorithm for the adaptive finite element method expressed by graph grammar productions. The graph grammar productions construct first the binary elimination tree, and then process frontal matrices stored in distributed manner in nodes of the elimination tree. The solver is specialized for a class of one, two and three dimensional h refined meshes whose elimination tree has a regular structure. In particular, this class contains all one dimensional grids, two and three dimensional grids refined towards point singularities, two dimensional grids refined in an anisotropic way towards edge singularity as well as three dimensional grids refined in an anisotropic way towards edge or face singularities. In all these cases, the structure of the elimination tree and the structure of the frontal matrices are similar. The solver is implemented within the Galois environment, which allows parallel execution of graph grammar productions. We also compare the performance of the Galois implementation of our graph grammar based solver with the MUMPS solver Damian Goik, Konrad Jopek, Maciej Paszynski, Andrew Lenharth, Donald Nguyen, Keshav Pingali 154 Automatically Adapted Perfectly Matched Layers for Problems with High Contrast Materials Properties [abstract]Abstract: For the simulation of wave propagation problems, it is necessary to truncate the computational domain. Perfectly Matched Layers are often employed for that purpose, especially in high contrast layered materials where absorbing boundary conditions are difficult to design. In here, we define a Perfectly Matched Layer that automatically adjusts its parameters without any user interaction. The user only has to indicate the desired decay in the surrounding layer. With this Perfectly Matched Layer, we show that even in the most complex scenarios where the material contrast properties are as high as sixteen orders of magnitude, we do not introduce numerical reflections when truncating the domain, thus, obtaining accurate solutions. Julen Alvarez-Aramberri, David Pardo, Helene Barucq, Elisabete Alberdi Celaya 127 A Linear Complexity Direct Solver for H-adaptive Grids With Point Singularities [abstract]Abstract: In this paper we present a theoretical proof of linear computational cost and complexity for a recently developed direct solver driven by hypergraph grammar productions. The solver is specialized for computational meshes with point singularities in two and three dimensions. Linear complexity is achieved due to utilizing the special structure of such grids. We describe the algorithm and estimate the exact computational cost on an example of a two-dimensional mesh containing a point singularity. We extend this reasoning to the three dimensional meshes. Numerical results fully support our theoretical estimates. Piotr Gurgul 436 Towards a new software tool for conservation planning [abstract]Abstract: In a dynamic world, the process of prioritizing where to invest limited conservation resources is extremely complex. It needs to incorporate information on features (species, or landforms), planning units, ongoing or predicted future threats, and the costs and effectiveness of potential conservation actions. Extended research has been conducted on the spatial and temporal conservation prioritization using software tools such as Marxan, C-Plan, and Zonation to aid managers in their decision-making process. However, these tools are limited in various ways in addressing the full complexity of day-to-day management decisions. Some tools fail to consider variation in: land values in space and time; multiple threats and their spatio-temporal variations; multiple conservation actions applied to individual areas; the feasibility, effectiveness, and varying costs of actions; and the dynamic nature of biodiversity responses in space and time. Optimizing such a multi-dimensional system is a large challenge in complexity mathematics. What is needed is a new software tool that builds on current approaches, but allows for more realistic scenarios as described above, developed and parameterised in close collaboration with managers. This includes the modification of existing tools and the creation of new algorithms. The new software will be trialled in conservation planning exercises for islands in north-western Western Australia and the Great Barrier Reef. The current approaches mostly exploit simulated annealing as it was proven the fastest and sufficiently efficient for problems which do not need the best solution. The new software, however, intends to include sub-models on threats, costs, and contribution of action on individual islands. We are examining the option of use constraint programming to incorporate these sub-models into the decision process, with desirable time resolution. Jana Brotankova, Bob Pressey, Ian Craigie, Steve Hall, Amelia Wenger

### 7th Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 1

#### Chair: Giuseppe A. Trunfio

 294 Mining Association Rules from Gene Ontology and Protein Networks: Promises and Challenges. [abstract]Abstract: The accumulation of raw experimental data about genes and proteins has been accompanied by the accumulation of functional information organized and stored into knowledge bases and ontologies. The assembly, organization and analysis of this data has given a considerable impulse to research. Usually biological knowledge is encoded by using annotation terms, i.e. terms describing for instance function or localization of genes and proteins. Such annotations are often organized into ontologies, that offer a formal framework to organize in a systematic way biological knowledge. For instance, Gene Ontology (GO) provides a set of annotations (namely GO Terms) of biological aspects. Consequently, for each biological concept, i.e. gene or protein a list of annotating terms is available. Each annotation may be derived using different methods, and an Evidence Code (EC) takes into account of this process. For instance electronically inferred annotations are distinguished from manual ones. Mining annotation data may thus extract biologically meaningful knowledge. For instance the analysis of these annotated data using association rules may evidence the co-occurrence of annotation helping for instance the classification of proteins starting from the annotation. Nevertheless, the use of frequent itemset mining is less popular with respect to other techniques, such as statistical based methods or semantic similarities. Here we give a short survey of these methods discussing possible future directions of research. We considered in particular the impact of the nature of annotation on association rule performances by discussing two case studies on protein complexes and protein families. As evidenced on this preliminary study the presence of electronic annotation has not a positive impact on the quality of association rules suggesting the possibility to introduce novel algorithm that are aware of evidence codes. Pietro Hiram Guzzi, Marianna Milano, Mario Cannataro 53 Automated Microalgae Image Classification [abstract]Abstract: In this paper we present a new method for automated recognition of 12 microalgae that are most commonly found in water resources of Thailand. In order to handle some difficulties encountered in our problem such as unclear algae boundary and noisy background, we proposed a new method for segmenting algae bodies from an image background and proposed a new method for computing texture descriptors from a blurry texture object. Feature combination approach is applied to handle a variation of algae shapes of the same genus. Sequential Minimal Optimization (SMO) is used as a classifier. An experimental result of 97.22% classification accuracy demonstrates an effectiveness of our proposed method. Sansoen Promdaen, Pakaket Wattuya, Nuttha Sanevas 192 A Clustering Based Method Accelerating Gene Regulatory Network Reconstruction [abstract]Abstract: One important direction of Systems Biology is to infer Gene Regulatory Networks and many methods have been developed recently, but they cannot be applied effectively in full scale data. In this work we propose a framework based on clustering to handle the large dimensionality of the data, aiming to improve accuracy of inferred network while reducing time complexity. We explored the efficiency of this framework employing the newly proposed metric Maximal Information Coefficient (MIC), which showed superior performance in comparison to other well established methods. Utilizing both benchmark and real life datasets, we showed that our method is able to deliver accurate results in fractions of time required by other state of the art methods. Our method provides as output interactions among groups of highly correlated genes, which in an application on an aging experiment were able to reveal aging related pathways. Georgios Dimitrakopoulos, Ioannis Maraziotis, Kyriakos Sgarbas, Anastasios Bezerianos 208 Large Scale Read Classification for Next Generation Sequencing [abstract]Abstract: Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and a pressing need for rapid identification as a prelude to annotation and further analysis. NGS data consists of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on significant attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics. James Hogan, Timothy Peut