Advances in the Kepler Scientific Workflow System and Its Applications (Kepler) Session 2

Time and Date: 14:10 - 15:50 on 7th June 2016

Room: Boardroom West

Chair: Marcin Plociennik

507	Kepler + CometCloud: Dynamic Scientific Workflow Execution on Federated Cloud Resources [abstract] Abstract: As more and more public and private Cloud resources are becoming available, it is common for a user to have access to multiple Cloud resources at the same time. Cloud federation dynamically aggregates multiple Cloud resources into a federated one. This paper explores how to build and run scientific workflows on top of a federated Cloud by integrating Kepler scientific workflow platform with CometCloud platform. Our integration can leverage capabilities of the two plat- forms: 1) dynamic resource federation, provisioning and allocation from CometCloud; 2) Easy workflow composition from Kepler; 3) Dynamic workflow scheduling and execution from the integration. We apply our integration to a bioinformatics workflow with three Cloud resources to evaluate its capabilities. We also discuss possible future directions from the integration.	Jianwu Wang, Moustafa Abdelbaky, Javier Diaz-Montes, Shweta Purawat, Manish Parashar, Ilkay Altintas
509	Natural Language Processing using Kepler Workflow System: First Steps [abstract] Abstract: Scientific community across many disciplines is exploring new ways to extract knowledge from all available sources. Historically, written manuscripts have been the media of choice for recording experimental findings. Many disciplines such as social science, medical science are exploring ways to automate knowledge discovery from a vast repository of published scientific work. This work attempts to accelerate the process of information extraction by extending Kepler, a graphical workflow management tool. Kepler provides a simple way of designing and executing complex workflows in the form of directed graphs. This work presents a scalable approach to convert published research as PDF documents into indexable XML documents using Kepler. This conversion is a critical step in the Natural Language Processing pipeline. Kepler's distributed data processing capability enables scientists to scale this critical computation by simply adding more computing resources over the cloud.	Ankit Goyal, Alok Singh, Shitij Bhargava, Daniel Crawl, Ilkay Altintas, Chun-Nan Hsu
498	Two-level dynamic workflow orchestration in the INDIGO DataCloud for large-scale, climate change data analytics experiments [abstract] Abstract: In this paper we present the approach proposed by EU H2020 INDIGO-DataCloud project to orchestrate dynamic workflows over a cloud environment. The main focus of the project is on the development of open source Platform as a Service solutions targeted at scientific communities, deployable on multiple hardware platforms, and provisioned over hybrid e-Infrastructures. The project is addressing many challenging gaps in current cloud solutions, responding to specific requirements coming from scientific communities including Life Sciences, Physical Sciences and Astronomy, Social Sciences and Humanities, and Environmental Sciences. We are presenting the ongoing work on implementing the whole software chain on the Infrastructure as a Service, PaaS and Software as a Service layers, focusing on the scenarios involving scientific workflows and big data analytics frameworks. INDIGO module for Kepler worflow system has been introduced along with the INDIGO underlying services exploited by the workflow components. A climate change data analytics experiment use case regarding the precipitation trend analysis on CMIP5 data is described, that makes use of Kepler and big data analytics services.	Marcin Plociennik, Sandro Fiore, Giacinto Donvito, Michal Owsiak, Marco Fargetta, Roberto Barbera, Riccardo Bruno, Emidio Giorgio, Dean N. Williams, Giovanni Aloisio