Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 1

Time and Date: 10:35 - 12:15 on 1st June 2015

Room: V206

Chair: Mario Cannataro

759 8th Workshop on Biomedical and Bioinformatics Challenges for Computer Science - BBC2015 [abstract]
Abstract: This is the summary of the 8th Workshop on Biomedical and Bioinformatics Challenges for Computer Science - BBC2015
Stefano Beretta, Mario Cannataro, Riccardo Dondi
374 Robust Conclusions in Mass Spectrometry Analysis [abstract]
Abstract: A central issue in biological data analysis is that uncertainty, resulting from different factors of variabilities, may change the effect of the events being investigated. Therefore, robustness is a fundamental step to be considered. Robustness refers to the ability of a process to cope well with uncertainties, but the different ways to model both the processes and the uncertainties lead to many alternative conclusions in the robustness analysis. In this paper we apply a framework allowing to deal with such questions for mass spectrometry data. Specifically, we provide robust decisions when testing hypothesis over a case/control population of subject measurements (i.e. proteomic profiles). To this concern, we formulate (i) a reference model for the observed data (i.e., graphs), (ii) a reference method to provide decisions (i.e., test of hypotheses over graph properties) and (iii) a reference model of variability to employ sources of uncertainties (i.e., random graphs). We apply these models to a real-case study, analyzing the mass spectrometry pofiles of the most common type of Renal Cell Carcinoma; the Clear Cell variant.
Italo Zoppis, Riccardo Dondi, Massimiliano Borsani, Erica Gianazza, Clizia Chinello, Fulvio Magni, Giancarlo Mauri
612 Modeling of Imaging Mass Spectrometry Data and Testing by Permutation for Biomarkers Discovery in Tissues [abstract]
Abstract: Exploration of tissue sections by imaging mass spectrometry reveals abundance of different biomolecular ions in different sample spots, allowing finding region specific features. In this paper we present computational and statistical methods for investigation of protein biomarkers i.e. biological features related to presence of different pathological states. Proposed complete processing pipeline includes data pre-processing, detection and quantification of peaks by using Gaussian mixture modeling and identification of specific features for different tissue regions by performing permutation tests. Application of created methodology provides detection of proteins/peptides with concentration levels specific for tumor area, normal epithelium, muscle or saliva gland regions with high confidence.
Michal Marczyk, Grzegorz Drazek, Monika Pietrowska, Piotr Widlak, Joanna Polanska, Andrzej Polanski
336 Fuzzy indication of reliability in metagenomics NGS data analysis [abstract]
Abstract: NGS data processing in metagenomics studies has to deal with noisy data that can contain a large amount of reading errors which are difficult to detect and account for. This work introduces a fuzzy indicator of reliability technique to facilitate solutions to this problem. It includes modified Hamming and Levenshtein distance functions that are aimed to be used as drop-in replacements in NGS analysis procedures which rely on distances, such as phylogenetic tree construction. The distances utilise fuzzy sets of reliable bases or an equivalent fuzzy logic, potentially aggregating multiple sources of base reliability.
Milko Krachunov, Dimitar Vassilev, Maria Nisheva, Ognyan Kulev, Valeriya Simeonova, Vladimir Dimitrov
559 Pairwise genome comparison workflow in the Cloud using Galaxy [abstract]
Abstract: Workflows are becoming the new paradigm in bioinformatics. In general, bioinformatics problems are solved by interconnecting several small software pieces to perform complex analyses. This demands a minimal expertise to create, enact and monitor such tools compositions. In addition bioinformatics is immersed in the big-data territory, facing huge problems to analyse such amount of data. We have addressed these problems by integrating a tools management platform (Galaxy) and a Cloud infrastructure, which prevents moving the big datasets between different locations and allows the dynamic scaling of the computing resources depending on the user needs. The result is a user-friendly platform that facilitates the work of the end-users while performing their experiments, installed in a Cloud environment that includes authentication, security and big-data transfer mechanisms. To demonstrate the suitability of our approach we have integrated in the infrastructure an existing pairwise and multiple genome comparison tool which comprises the management of huge datasets and high computational demands.
Óscar Torreño Tirado, Michael T. Krieger, Paul Heinzlreiter, Oswaldo Trelles
645 Iterative Reconstruction from Few-View Projections [abstract]
Abstract: In the medical imaging field, iterative methods have become a hot topic of research due to their capacity to resolve the reconstruction problem from a limited number of projections. This gives a good possibility to reduce radiation exposure on patients during the data acquisition. However, due to the complexity of the data, the reconstruction process is still time consuming, especially for 3D cases, even though implemented on modern computer architecture. Time of the reconstruction and high radiation dose imposed on patients are two major drawbacks in computed tomography. With the aim to resolve them effectively, we adapted Least Square QR method with soft threshold filtering technique for few-view image reconstruction and present its numerical validation. The method is implemented using CUDA programming mode and compared to standard SART algorithm. The numerical simulations and qualitative analysis of the reconstructed images show the reliability of the presented method.
Liubov A. Flores, Vicent Vidal, Gumersindo Verdú

Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 2

Time and Date: 14:30 - 16:10 on 1st June 2015

Room: V206

Chair: Riccardo Dondi

319 GoD: An R-Package based on Ontologies for Prioritization of Genes with respect to Diseases. [abstract]
Abstract: Omics sciences are widely used to analyze diseases at a molecular level. Usually, results of omics experiments are a large list of candidate genes, proteins or other molecules. The interpretation of results and the filtering of candidate genes or proteins selected in an experiment is a challenge in some scenarios. This problem is particularly evident in clinical scenarios in which researchers are interested in the behaviour of few molecules related to some specific disease. The filtering requires the use of domain-specific knowledge that is often encoded into ontologies. To support this interpretation, we implemented GoD (Gene ranking based On Diseases), an algorithm that ranks a given set of genes based on ontology annotations. The algorithm orders genes by the semantic similarity computed between annotation of each gene and those describing the selected disease. We tested as proof-of-principle our software using Human Phenotype Ontology (HPO), Gene Ontology (GO) and Disease Ontology (DO) using the semantic similarity measures. The dedicated website is \url{}.
Mario Cannataro, Pietro Hiram Guzzi and Marianna Milano
693 Large Scale Comparative Visualisation of Regulatory Networks with TRNDiff [abstract]
Abstract: The advent of Next Generation Sequencing technologies has seen explosive growth in genomic datasets, and dense coverage of related organisms, supporting study of subtle, strain-specific variations as a determinant of function. Such data collections present fresh and complex challenges for bioinformatics, those of comparing models of complex relationships across hundreds and even thousands of sequences. Transcriptional Regulatory Network (TRN) structures document the influence of regulatory proteins called Transcription Factors (TFs) on associated Target Genes (TGs). TRNs are routinely inferred from model systems or iterative search, and analysis at these scales requires simultaneous displays of multiple networks well beyond those of existing network visualisation tools [1]. In this paper we describe TRNDiff, an open source tool supporting the comparative analysis and visualization of TRNs (and similarly structured data) from many genomes, allowing rapid identification of functional variations within species. The approach is demonstrated through a small scale multiple TRN analysis of the Fur iron-uptake system of Yersinia, suggesting a number of candidate virulence factors; and through a far larger study based on integration with the RegPrecise database ( - a collection of hundreds of manually curated and predicted transcription factor regulons drawn from across the entire spectrum of prokaryotic organisms. The tool is presently available in stand-alone and integrated form. Information may be found at the dedicated site, which includes example data, a short tutorial and links to a working version of the stand-alone system. The integrated regulon browser is currently available at the demonstration site Source code is freely available under a non-restrictive Apache 2.0 licence from the authors’ repository at
Xin-Yi Chua, Lawrence Buckingham, James Hogan
30 Epistatic Analysis of Clarkson Disease [abstract]
Abstract: Genome Wide Association Studies (GWAS) have predominantly focused on the association between single SNPs and disease. It is probable, however, that complex diseases are due to combined effects of multiple genetic variations, as opposed to single variations. Multi-SNP interactions, known as epistatic interactions, can potentially provide information about causes of complex diseases, and build on previous GWAS looking at associations between single SNPs and phenotypes. By applying epistatic analysis methods to GWAS datasets, it is possible to identify significant epistatic interactions, and map SNPs identified to genes allowing the construction of a gene network. A large number of studies have applied graph theory techniques to analyse gene networks from microarray data sets, using graph theory metrics to identify important hub genes in these networks. In this work, we present a graph theory study of SNP and gene interaction networks constructed for a Clarkson disease GWAS, as a result of applying epistatic interaction methods to identify significant epistatic interactions. This study identifies a number of genes and SNPs with potential roles for Clarkson disease that could not be found using traditional single SNP analysis, including a number located on chromosome 5q previously identified as being of interest for capillary malformation.
Alex Upton, Oswaldo Trelles, James Perkins
527 Multiple structural clustering of bromodomains of the bromo and extra terminal (BET) proteins highlights subtle differences in their structural dynamics and acetylated leucine binding pocket [abstract]
Abstract: BET proteins are epigenetic readers whose deregulation results in cancer and inflammation. We show that BET proteins (BRD2, BRD3, BRD4 and BRDT) are globally similar with subtle differences in the sequences and structures of their N-terminal bromodomain. Principal component analysis and non-negative matrix factorization reveal distinct structural clusters associated with specific BET family members, experimental methods, and source organisms. Subtle variations in structural dynamics are evident in the acetylated lysine (Kac) binding pocket of BET bromodomains. Using multiple structural clustering methods, we have also identified representative structures of BET proteins, which are potentially useful for developing potential therapeutic agents.
Suryani Lukman, Zeyar Aung, Kelvin Sim
633 Parallel Tools for Simulating the Depolarization Block on a Neural Model [abstract]
Abstract: The prototyping and the development of computational codes for biological models, in terms of reliability, efficient and portable building blocks allow to simulate real cerebral behaviours and to validate theories and experiments. A critical issue is the tuning of a model by means of several numerical simulations with the aim to reproduce real scenarios. This requires a huge amount of computational resources to assess the impact of parameters that influence the neuronal response. In this paper, we describe how parallel tools are adopted to simulate the so-called depolarization block of a CA1 pyramidal cell of hippocampus. Here, the high performance computing techniques are adopted in order to achieve a more efficient model simulation. Finally, we analyse the performance of this neural model, investigating the scalability and benefits on multi-core and on parallel and distributed architectures.
Salvatore Cuomo, Pasquale De Michele, Ardelio Galletti, Giovanni Ponti

Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 3

Time and Date: 16:40 - 18:20 on 1st June 2015

Room: V206

Chair: Mauro Castelli

423 Using visual analytics to support the integration of expert knowledge in the design of medical models and simulations [abstract]
Abstract: Visual analytics (VA) provides an interactive way to explore vast amounts of data and find interesting patterns. This has already benefited the development of computational models, as the patterns found using VA can then become essential elements of the model. Similarly, recent advances in the use of VA for the data cleaning stage are relevant to computational modelling given the importance of having reliable data to populate and check models. In this paper, we demonstrate via case studies of medical models that VA can be very valuable at the conceptual stage, to both examine the fit of a conceptual model with the underlying data and assess possible gaps in the model. The case studies were realized using different modelling tools (e.g., system dynamics or network modelling), which emphasizes that the relevance of VA to medical modelling cuts across techniques. Finally, we discuss how the interdisciplinary nature of modelling for medical applications requires an increased support for collaboration, and we suggest several areas of research to improve the intake and experience of VA for collaborative modelling in medicine.
Philippe Giabbanelli, Piper Jackson
409 Mining Mobile Datasets to Enable the Fine-Grained Stochastic Simulation of Ebola Diffusion [abstract]
Abstract: The emergence of Ebola in West Africa is of worldwide public health concern. Successful mitigation of epidemics requires coordinated, well-planned intervention strategies that are specific to the pathogen, transmission modality, population, and available resources. Modeling and simulation in the field of computational epidemiology provides predictions of expected outcomes that are used by public policy planners in setting response strategies. Developing up to date models of population structures, daily activities, and movement has proven challenging for developing countries due to limited governmental resources. Recent collaborations (in 2012 and 2014) with telecom providers have given public health researchers access to Big Data needed to build high-fidelity models. Researchers now have access to billions of anonymized, detailed call data records (CDR) of mobile devices for several West African countries. In addition to official census records, these CDR datasets provide insights into the actual population locations, densities, movement, travel patterns, and migration in hard to reach areas. These datasets allow for the construction of population, activity, and movement models. For the first time, these models provide computational support of health related decision making in these developing areas (via simulation-based studies). New models, datasets, and simulation software were produced to assist in mitigating the continuing outbreak of Ebola. Existing models of disease characteristics, propagation, and progression were updated for the current circulating strain of Ebola. The simulation process required the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract governmental policy level). The predictive results from this system were validated against results from the CDC's high-level predictions.
Nicholas Vogel, Christopher Theisen, Jonathan Leidig, Jerry Scripps, Douglas Graham, Greg Wolffe
383 A Novel O(n) Numerical Scheme for ECG Signal Denoising [abstract]
Abstract: High quality Electrocardiogram (ECG) data is very important because this signal is generally used for the analysis of heart diseases. Wearable sensors are widely adopted for physical activity monitoring and for the provision of healthcare services, but noise always degrades the quality of these signals. The paper describes a new algorithm for ECG signal denoising, applicable in the contest of the real-time health monitoring using mobile devices, where the signal processing efficiency is a strict requirement. The proposed algorithm is computationally cheap because it belongs to the class of Infinite Impulse Response (IIR) noise reduction algorithms. The main contribution of the proposed scheme is that removes the noise’s frequencies without the implementation of the Fast Fourier Transform that would require the use of special optimized libraries. It is composed by only few code lines and hence offers the possibility of implementation on mobile computing devices in an easy way. Moreover, the scheme allows the local denoising and hence a real time visualization of the denoised signal. Experiments on real datasets have been carried out in order to test the algorithm from accuracy and computational point of view.
Raffaele Farina, Salvatore Cuomo, Ardelio Galletti
549 Syncytial Basis for Diversity in Spike Shapes and their Propagation in Detrusor Smooth Muscle [abstract]
Abstract: Syncytial tissues, such as the smooth muscle of the urinary bladder wall, are known to produce action potentials (spikes) with marked differences in their shapes and sizes. The need for this diversity is currently unknown, and neither is their origin understood. The small size of the cells, their syncytial arrangement, and the complex nature of innervation poses significant challenges for the experimental investigation of such tissues. To obtain better insight, we present here a three-dimensional electrical model of smooth muscle syncytium, developed using the compartmental modeling technique, with each cell possessing active channel mechanisms capable of producing an action potential. This enables investigation of the syncytial effect on action potential shapes and their propagation. We show how a single spike shape could undergo modulation, resulting in diverse shapes, owing to the syncytial nature of the tissue. Difference in the action potential features could impact their capacity to propagate through a syncytium. This is illustrated through comparison of two distinct action potential mechanisms. A better understanding of the origin of the various spike shapes would have significant implications in pathology, assisting in evaluating the underlying cause and directing their treatment.
Shailesh Appukuttan, Keith Brain, Rohit Manchanda
200 The Potential of Machine Learning for Epileptic Seizures Prediction [abstract]
Abstract: Epilepsy is one of the most common neurological diseases, affecting about 1% of the world population, of all ages, genders, origins. About one third of the epileptic patients cannot be treated by medication or surgery: they suffer from refractory epilepsy and must live with their seizures during all their lives. A seizure can happen anytime, anywhere, imposing severe constrains in the professional and social lives of these patients. The development of transportable and comfortable devices, able to capture a sufficient number of EEG scalp channels, to digitally process the signal, to extract appropriate features from the EEG raw signals, and give these features to machine learning classifiers, is an important objective that a large research community is pursuing worldwide. The classifiers must detect the pre-ictal time (some minutes before the seizure). In this presentation the problem is presented, solutions are proposed, results are discussed. The problem is formulated as a classification of high-dimensional datasets, with unbalanced four classes. Preprocessing of raw data, classification using Artificial Neural Networks and Support Vector Machines to the 275 patients of the European Epilepsy Database show that computer science, in this case machine learning, will have an important role in the problem. For about 30% of the patients we found results with clinical relevance. Real-time experiments made with some patients, in clinical environment and at home will be shown (including video) and discussed. The problem is still challenging the computer science community researching in medical applications. New research directions will be pointed out in the presentation.
Antonio Dourado, Cesar Teixeira and Francisco Sales