7th Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Bluewater I

Chair: Giuseppe A. Trunfio

294 Mining Association Rules from Gene Ontology and Protein Networks: Promises and Challenges. [abstract]
Abstract: The accumulation of raw experimental data about genes and proteins has been accompanied by the accumulation of functional information organized and stored into knowledge bases and ontologies. The assembly, organization and analysis of this data has given a considerable impulse to research. Usually biological knowledge is encoded by using annotation terms, i.e. terms describing for instance function or localization of genes and proteins. Such annotations are often organized into ontologies, that offer a formal framework to organize in a systematic way biological knowledge. For instance, Gene Ontology (GO) provides a set of annotations (namely GO Terms) of biological aspects. Consequently, for each biological concept, i.e. gene or protein a list of annotating terms is available. Each annotation may be derived using different methods, and an Evidence Code (EC) takes into account of this process. For instance electronically inferred annotations are distinguished from manual ones. Mining annotation data may thus extract biologically meaningful knowledge. For instance the analysis of these annotated data using association rules may evidence the co-occurrence of annotation helping for instance the classification of proteins starting from the annotation. Nevertheless, the use of frequent itemset mining is less popular with respect to other techniques, such as statistical based methods or semantic similarities. Here we give a short survey of these methods discussing possible future directions of research. We considered in particular the impact of the nature of annotation on association rule performances by discussing two case studies on protein complexes and protein families. As evidenced on this preliminary study the presence of electronic annotation has not a positive impact on the quality of association rules suggesting the possibility to introduce novel algorithm that are aware of evidence codes.
Pietro Hiram Guzzi, Marianna Milano, Mario Cannataro
53 Automated Microalgae Image Classification [abstract]
Abstract: In this paper we present a new method for automated recognition of 12 microalgae that are most commonly found in water resources of Thailand. In order to handle some difficulties encountered in our problem such as unclear algae boundary and noisy background, we proposed a new method for segmenting algae bodies from an image background and proposed a new method for computing texture descriptors from a blurry texture object. Feature combination approach is applied to handle a variation of algae shapes of the same genus. Sequential Minimal Optimization (SMO) is used as a classifier. An experimental result of 97.22% classification accuracy demonstrates an effectiveness of our proposed method.
Sansoen Promdaen, Pakaket Wattuya, Nuttha Sanevas
192 A Clustering Based Method Accelerating Gene Regulatory Network Reconstruction [abstract]
Abstract: One important direction of Systems Biology is to infer Gene Regulatory Networks and many methods have been developed recently, but they cannot be applied effectively in full scale data. In this work we propose a framework based on clustering to handle the large dimensionality of the data, aiming to improve accuracy of inferred network while reducing time complexity. We explored the efficiency of this framework employing the newly proposed metric Maximal Information Coefficient (MIC), which showed superior performance in comparison to other well established methods. Utilizing both benchmark and real life datasets, we showed that our method is able to deliver accurate results in fractions of time required by other state of the art methods. Our method provides as output interactions among groups of highly correlated genes, which in an application on an aging experiment were able to reveal aging related pathways.
Georgios Dimitrakopoulos, Ioannis Maraziotis, Kyriakos Sgarbas, Anastasios Bezerianos
208 Large Scale Read Classification for Next Generation Sequencing [abstract]
Abstract: Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and a pressing need for rapid identification as a prelude to annotation and further analysis. NGS data consists of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on significant attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
James Hogan, Timothy Peut