Data-Driven Computational Sciences (DDCS) Session 2

Time and Date: 16:20 - 18:00 on 13th June 2017

Room: HG D 7.2

Chair: Craig Douglas

242	Human Identification and Localization by Robots in Collaborative Environments [abstract] Abstract: Environments in which mobile robots and humans must coexist tend to be quite dangerous to the humans. Many employers have resorted to separating the two groups since the robots move quickly and do not maneuver around humans easily resulting in human injuries. In this paper we provide a roadmap towards being able to integrate the two worker groups (human and robots) to increase both efficiency and safety. Improved human to robot communication and collaboration has implications in multiple applications. For example: (1) Robots that manage all aspects of dispensing items (e.g., drugs in pharmacies or supplies and tools in a remote workplace), reducing human errors. (2) Dangerous location capable robots that triage injured subjects using remote sensing of vital signs. (3) 'Smart' crash carts that move themselves to a required location in a hospital or in the field, help dispense drugs and tools, save time and money, and prevent accidents.	Craig C. Douglas and Robert A. Lodder
257	Data-driven design of an Ebola therapeutic [abstract] Abstract: Data-driven computational science has found many applications in drug design. Molecular data are commonly used to design new drug molecules. Engineering process simulations guide the development of the Chemistry, Manufacturing, and Controls (CMC) section of Investigational New Drug (IND) applications filed at FDA. Computer simulations can also guide the design of human clinical trials. Formulation is very important in drug delivery. The wrong formulation can render a drug product useless. The amount of preclinical (animal and in vitro) work that must be done before a new drug candidate can be tested in humans can be a problem. The cost of these cGxP studies is typically $3-$5 million. If the wrong drug product formulation is tested, new iterations of the formulation must be tested with additional costs of 3 to $5 million each. Data-driven computational science can help reduce this cost. In the absence of existing human exposure, a battery of tests involving acute and chronic toxicology, cardiovascular, central nervous system, and respiratory safety pharmacology must be performed in at least two species before FDA will permit testing in humans. However, for many drugs (such as those beginning with natural products) there is a history of human exposure. In these cases, computer modeling of a population to determine human exposure may be adequate to permit phase 1 studies with a candidate formulation in humans. The CDC’s National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. The NHANES database can be mined to determine the average and 90th percentile exposures to a food additive, and early human formulation testing conducted at levels beneath those to which the US population is ordinarily exposed through food. These data can be combined with data mined from international chemical shipments to validate an exposure model. This paper describes the data driven formulation testing process using a new candidate Ebola treatment that, unlike vaccines, can be used after a person has contracted the disease. This drug candidate’s mechanism of action permits it to be potentially used against all strains of the virus, a characteristic that vaccines might not share.	Robert Lodder
383	Transforming a Local Medical Image Analysis for Running on a Hadoop Cluster [abstract] Abstract: There is a progressive digitization in many medical fields, such as digital microscopy, which leads to an increase in data volume and processing demands for the underlying computing infrastructure. This paper explores scaling behaviours of a Ki-67 analysis application, which processes medical image tiles, originating from a WSI (Whole Slide Image) file format. Furthermore, it describes how the software is transformed from a Windows PC to a distributed Linux cluster environment. A test for platform independence revealed a non-deterministic behaviour of the application, which has been fixed successfully. The speedup of the application is determined. The slope of the increase is quite close to 1, i.e. there is almost no loss due to a parallelization overhead. Beyond the cluster's hardware limit (72 cores, 144 threads, 216 GB RAM) the speedup saturates to a value around 64. This is a strong improvement of the original software, whose speedup is limited to two.	Marco Strutz, Hermann Heßling and Achim Streit
208	Decentralized Dynamic Data-Driven Monitoring of Dispersion Processes on Partitioned Domains [abstract] Abstract: The application of mobile sensor-carrying vehicles for online estimating dynamic dispersion processes is extremely beneficial. Based on current estimates that rely on past measurements and forecasts obtained from a discretized PDE-model, the movement of the vehicles can be adapted resulting in measurements at more informative locations. In this work, a novel decentralized monitoring approach based on a partitioning of the spatial domain into several subdomains is proposed. Each sensor is assigned to the subdomain it is located in and is only required to maintain a process and multi-vehicle model related to its subdomain. In this way, vast communication requirements of related centralized approaches and costly full model simulations are avoided making the presented approach more scalable with respect to a larger number of sensor-carrying vehicles and a larger problem domain. The approach consists of a new prediction and update method based on a domain decomposition method and a partitioned variant of the Ensemble Square Root Filter getting along with a minimum exchange of data between sensors on neighboring subdomains. Furthermore, a cooperative vehicle controller is applied in such a way that a dynamic adaption of the sensor distribution becomes possible.	Tobias Ritter, Stefan Ulbrich and Oskar von Stryk
265	A Framework for Direct and Transparent Data Exchange of Filter-stream Applications in Multi-GPUs Architectures [abstract] Abstract: The massive data generation has been pushing for significant advances in computing architectures, reflecting in heterogeneous architectures composed by different types of processing units. The filter-stream paradigm is typically used to exploit the parallel processing power of these new architectures. The efficiency of applications in this paradigm is achieved by exploring a set of interconnected computers (cluster) using filters and communication between them in a coordinated way. In this work we propose, implement and test a generic abstraction for direct and transparent data exchange of filter-stream applications in heterogeneous cluster with multi-GPU (Graphics Processing Units) architectures. This abstraction allows hiding from the programmers all the low-level implementation details related to GPU communication and the control related to the location of filters. Further, we consolidate such abstraction into a framework. Empirical assessments using a real application show that the proposed abstraction layer ease the implementation of filter-stream applications without compromising the overall application performance.	Leonardo Rocha, Gabriel Ramons, Guilherme Andrade, Rafael Sachetto, Daniel Madeira, Renan Carvalho, Renato Ferreira and Fernando Mourão