Workshop on Advances in the Kepler Scientific Workflow System and Its Applications (KEPLER) Session 1

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Bluewater I

Chair: Ilkay Altintas

260 Design and Implementation of Kepler Workflows for BioEarth [abstract]
Abstract: BioEarth is an ongoing research initiative for the development of a regional-scale Earth System Model (EaSM) for the U.S. Pacific Northwest. Our project seeks to couple and integrate multiple stand-alone EaSMs developed through independent efforts for capturing natural and human processes in various realms of the biosphere: atmosphere (weather and air quality), terrestrial biota (crop, rangeland, and forest agro-ecosystems) and aquatic (river flows, water quality, and reservoirs); hydrology links all these realms. Due to the need to manage numerous complex simulations, an application of automated workflows was essential. In this paper, we present a case study of workflow design for the BioEarth project using the Kepler system to manage applications of the Regional Hydro-Ecologic Simulation System (RHESSys) model. In particular, we report on the design of Kepler workflows to support: 1) standalone executions of the RHESSys model under serial and parallel applications, and 2) a more complex case of performing calibration runs involving multiple preprocessing modules, iterative exploration of parameters and parallel RHESSys executions. We exploited various Kepler features including a user-friendly design interface and support for parallel execution on a cluster. Our experiments show a performance speedup between 7–12x, using 16 cores of a Linux cluster, and demonstrate the general effectiveness of our Kepler workflows in managing RHESSys runs. This study shows the potential of Kepler to serve as the primary integration platform for the BioEarth project, with implications for other data- and compute-intensive Earth systems modeling projects.
Tristan Mullis, Mingliang Liu, Ananth Kalyanaraman, Joseph Vaughan, Christina Tague, Jennifer Adam
327 Tools, methods and services enhancing the usage of the Kepler-based scientific workflow framework [abstract]
Abstract: Scientific workflow systems are designed to compose and execute either a series of computational or data manipulation steps, or workflows in a scientific application. They are usually part of the larger eScience environment. The usage of workflow systems, while very beneficial, is mostly not trivial for the scientists. There are many requirements for additional functionalities around scientific workflows systems that need to be taken into account, like ability of sharing workflows, provision of the user-friendly GUI tools for automation of some tasks, or for submission to distributed computing infrastructures, etc. In this paper we present a tools developed in the response to the requirements of three different scientific communities. These tools simplifies and empower they work with the Kepler scientific workflow system. The usage of such tools and services are presented on the Nanotechnology, Astronomy and Fusion scenarios examples.
Marcin Plociennik, Szymon Winczewski, Paweł Ciecieląg, Frederic Imbeaux, Bernard Guillerminet, Philippe Huynh, Michał Owsiak, Piotr Spyra, Thierry Aniel, Bartek Palak, Tomasz Żok, Wojciech Pych, Jarosław Rybicki
371 Progress towards automated Kepler scientific workflows for computer-aided drug discovery and molecular simulations [abstract]
Abstract: We describe the development of automated workflows that support computed-aided drug discovery (CADD) and molecular dynamics (MD) simulations and are included as part of the National Biomedical Computational Resource (NBCR). The main workflow components include: file-management tasks, ligand force field parameterization, receptor-ligand molecular dynamics (MD) simulations, job submission and monitoring on relevant high-performance computing (HPC) resources, receptor structural clustering, virtual screening (VS), and statistical analyses of the VS results. The workflows aim to standardize simulation and analysis and promote best practices within the molecular simulation and CADD communities. Each component is developed as a stand-alone workflow, which allows easy integration into larger frameworks built to suit user needs, while remaining intuitive and easy to extend.
Pek U. Ieong, Jesper Sørensen, Prasantha L. Vemu, Celia W. Wong, Özlem Demir, Nadya P. Williams, Jianwu Wang, Daniel Crawl, Robert V. Swift, Robert D. Malmstrom, Ilkay Altintas, Rommie E. Amaro
341 Flexible approach to astronomical data reduction workflows in Kepler [abstract]
Abstract: The growing scale and complexity of cataloguing and analyzing of astronomical data forces scientists to look for a new technologies and tools. The workflow environments appear best suited for their needs, but in practice they prove to be too complicated for most users. Before such enviroments are used commonly, they have to be properly adapted for domain specific needs. We have created a universal solution based on the Kepler workflow environment to that end. It consists of a library of domain modules, ready-to-use workflows and additional services for sharing and running worklows. There are three access levels depending on the needs and skills of the user: 1) desktop application, 2) web application 3) on-demand Virtual Research Environment. Everything is set up in the context of Polish grid infrastructure, enabling access to its resources.For flexibility, our solution includes interoperability mechanisms with the domain specific applications and services (including astronomical Virtual Observatory) as well as with other domain grid services.
Paweł Ciecieląg, Marcin Płóciennik, Piotr Spyra, Michał Urbaniak, Tomasz Żok, Wojciech Pych
282 Identifying Information Requirement for Scheduling Kepler Workflow in the Cloud [abstract]
Abstract: Kepler scientific workflow system has been used to support scientists to automatically perform experiments of various domains in distributed computing systems. An execution of a workflow in Kepler is controlled by a director assigned in the workflow. However, users still need to specify compute resources on which the tasks in the workflow are executed. To further ease the technical effort required by scientists, a workflow scheduler that is able to assign workflow tasks to resources for execution is necessary. To this end, we identify from a review of several cloud workflow scheduling techniques the information that should be made available in order for a scheduler to schedule Kepler workflow in the cloud computing context. To justify the usefulness, we discuss each type of information regarding workflow tasks, cloud resources, and cloud providers based on their benefit on workflow scheduling.
Sucha Smanchat, Kanchana Viriyapant