Session6 16:20 - 18:00 on 7th June 2016

ICCS 2016 Main Track (MT) Session 6

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: KonTiki Ballroom

Chair: Jianwu Wang

295	Finite Element Model for Brittle Fracture and Fragmentation [abstract] Abstract: A new computational model for brittle fracture and fragmentation has been developed based on finite element analysis of non-linear elasticity equations. The proposed model propagates the cracks by splitting the mesh nodes alongside the most over-strained edges based on the principle direction of strain tensor. To prevent elements from overlapping and folding under large deformations, robust geometrical constraints using the method of Lagrange multipliers have been incorporated. The model has been applied to 2D simulations of the formation and propagation of cracks in brittle materials, and the fracture and fragmentation of stretched and compressed materials.	Wei Li, Tristan Delaney, Xiangmin Jiao, Roman Samulyak, Cao Lu
350	Aggressive Pruning Strategy for Time Series Retrieval Using a Multi-Resolution Representation Based on Vector Quantization Coupled with Discrete Wavelet Transform [abstract] Abstract: Time series representation methods are widely used to handle time series data by projecting them onto low-dimensional spaces where queries are processed. Multi-resolution representation methods speed up the similarity search process by using pre-computed distances which are calculated and stored at the indexing stage and then used at the query stage together with filters in the form of exclusion conditions. In this paper we present a new multi-resolution representation method that combines the Haar wavelet- based multiresolution method with vector quantization to maximize the pruning power of the similarity search algorithm. The new method is validated through extensive experiments on different datasets from several time series repositories. The results obtained prove the efficiency of the new method.	Muhammad Marwan Muhammad Fuad
392	Integration of Randomized Sketches for Singular Value Decomposition and Principal Component Analysis [abstract] Abstract: Low-rank singular value decomposition (SVD) and principal component analysis (PCA) of large-scale matrices is one key tool in modern data analytics and scientific computing. Rapid growing matrix size further increases the needs and poses the challenges for developing efficient large-scale SVD algorithms. Random sketching is a promising method to reduce the problem size before computing an approximate SVD. We generalize the one-time sketching to multiple random sketches and develop algorithms to integrate these random sketches containing various subspace information in different randomizations. Such integration procedure can lead to SVD with higher accuracy and the multiple randomizations can be conducted on parallel computers simultaneously. We also reveal the insights and analyze the performance of the proposed algorithms from statistical and geometric viewpoints. Numerical results are presented and discussed to demonstrate the efficiency of the proposed algorithms. This is a joint work with Ting-Li Chen and Su-Yun Huang at the Institute of Statistical Science, Academia Sinica, David Chang, Hung Chen, and Chen-Yao Lin at Institute of Applied Mathematical Sciences, National Taiwan University.	Weichung Wang

ICCS 2016 Main Track (MT) Session 13

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Toucan

Chair: Daniel Crawl

461	Success Rate of Creatures Crossing a Highway as a Function of Model Parameters [abstract] Abstract: In modeling swarms of autonomous robots, individual robots may be identified as cognitive agents. We describe a model of population of simple cognitive agents, naïve creatures, learning to safely cross a cellular automaton based highway. These creatures have the ability to learn from each other by evaluating if creatures in the past were successful in crossing the highway for their current situation. The creatures use “observational social learning” mechanism in their decision to cross the highway or not. The model parameters heavily influence the learning outcomes examined through the collected simulation metrics. We study how these parameters, in particular the knowledge base, influence the creatures’ success rate of crossing the highway.	Anna T. Lawniczak, Leslie Ly, Fei Yu
10	Using Analytic Solution Methods on Unsaturated Seepage Flow Computations [abstract] Abstract: This paper describes a change of variables applied to Richards’ equation for steady-state unsaturated seepage flow that makes the numerical representation of the new version of this highly nonlinear partial differential equation (PDE) much easier to solve, and the solution is significantly more accurate. The method is applied to two-dimensional unsaturated steady-state flow in a block of soil that is initially very dry until water is applied at the top. Both a quasi-linear version of relative hydraulic conductivity for which an analytic solution exists and a van Genuchten version of relative hydraulic conductivity are numerically solved using the original and new versions of the governing PDE. Finally, results of this research will be presented in this paper. It was found that for the test problem, the change-of-variables version of the governing PDE was significantly easier to solve and resulted in more accurate solutions than the original version of the PDE.	Fred Tracy
188	Predictor Discovery for Early-Late Indian Summer Monsoon Using Stacked Autoencoder [abstract] Abstract: Indian summer monsoon has distinct behaviors in its early and late phase. The influencing climatic factors are also different. In this work we aim to predict the national rainfall in these phases. The predictors used by the forecast models are discovered using a stacked autoencoder deep neural network. A fitted regression tree is used as the forecast model. A superior accuracy to state of art method is achieved. We also observe that the late monsoon can be predicted with higher accuracy than early monsoon rainfall.	Moumita Saha, Pabitra Mitra, Ravi S. Nanjundiah

Tools for Program Development and Analysis in Computational Science (TOOLS) Session 2

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Macaw

Chair: Jie Tao

447	Online MPI Trace Compression using Event Flow Graphs and Wavelets [abstract] Abstract: Performance analysis of scientific parallel applications is essential to use High Performance Computing (HPC) infrastructures efficiently. Nevertheless, collecting detailed data of large-scale parallel programs and long-running applications is infeasible due to the huge amount of performance information generated. Even though there are no technological constraints in storing Terabytes of performance data, the constant flushing of such data to disk introduces a massive overhead into the application that makes the performance measurements worthless. This paper explores the use of Event flow graphs together with wavelet analysis and EZW-encoding to provide MPI event traces that are orders of magnitude smaller while preserving accurate information on timestamped events. Our mechanism compresses the performance data online while the application runs, thus, reducing the pressure put on the I/O system due to buffer flushing. As a result, we achieve lower application perturbation, reduced performance data output, and the possibility to monitor longer application runs.	Xavier Aguilar, Karl Fuerlinger, Erwin Laure
194	WOWMON: A Machine Learning-based Profiler for Self-adaptive Instrumentation of Scientific Workflows [abstract] Abstract: Performance debugging using program profiling and tracing for scientific workflows can be extremely difficult for two reasons. 1) Existing performance tools lack the ability to automatically produce global performance data based on local information from coupled scientific applications, particularly at runtime. 2) Profiling/tracing with static instrumentation may incur high overhead and significantly slow down science-critical tasks. To gain more insights on work- flows we introduce a lightweight workflow monitoring infrastructure, WOWMON (WOrkfloW MONitor), which enables user’s access not only to cross-application performance data such as end-to-end latency and execution time of individual workflow components at runtime, but also to customized performance events. To reduce profiling overhead, WOWMON uses adaptive selection of performance metrics based on machine learning algorithms to guide profilers collecting only metrics that have most impact on performance of workflows. Through the study of real scientific workflows (e.g., LAMMPS) with the help of WOWMON, we found that performance of workflows can be significantly affected by both software and hardware factors, such as policy of process mapping and hardware configurations of clusters. Moreover, we experimentally show that WOWMON can reduce data movement for profiling by up to 54% without missing key metrics for performance debugging.	Xuechen Zhang, Hasan Abbasi, Kevin Huck, Allen Malony
334	A DSL based toolchain for design space exploration in structured parallel programming [abstract] Abstract: We introduce a DSL based toolchain supporting the design of parallel applications where parallelism is structured after parallel design pattern compositions. A DSL provides the possibility to write high level parallel design pattern expressions representing the structure of parallel applications, to refactor the pattern expressions, to evaluate their non-functional properties (e.g. ideal performance, total parallelism degree, etc.) and finally to generate parallel code ready to be compiled and run on different target architectures. We discuss a proof-of-concept prototype implementation of the proposed toolchain generating FastFlow code and show some preliminary results achieved using the prototype implementation.	Marco Danelutto, Massimo Torquati, Peter Kilpatrick

Workshop on Computational Optimization, Modelling & Simulation (COMS) Session 3

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Cockatoo

Chair: Leifur Leifsson

189	Asynchronous Two-Level Checkpointing Scheme for Large-Scale Adjoints in the Spectral-Element Solver Nek5000 [abstract] Abstract: Adjoints are an important computational tool for large-scale sensitivity evaluation, uncertainty quantification, and derivative-based optimization. An essential component of their performance is the storage/recomputation balance in which efficient adjoint checkpointing strategies play a key role. We introduce a novel asynchronous two-level adjoint checkpointing scheme for numerical time discretizations targeted at large-scale numerical simulations. The checkpointing scheme combines bandwidth-limited disk checkpointing and binomial memory checkpointing. Based on assumptions about the target petascale systems, which we later demonstrate to be realistic on the IBM Blue Gene/Q system Mira, we create a model of the predicted performance of the adjoint computation and validate it using the highly scalable Navier-Stokes spectral-element solver Nek5000 on small to moderate subsystems of the Mira supercomputer.	Michel Schanen, Oana Marin, Hong Zhang, Mihai Anitescu
260	AGORAS: A Fast Algorithm for Estimating Medoids in Large Datasets [abstract] Abstract: The k-medoids methods for modeling clustered data have many desirable properties such as robustness to noise and the ability to use non-numerical values, however, they are typically not applied to large datasets due to their associated computational complexity. In this paper, we present AGORAS, a novel heuristic algorithm for the k-medoids problem where the algorithmic complexity is driven by, k, the number of clusters, rather than, n, the number of data points. Our algorithm attempts to isolate a sample from each individual cluster within a sequence of uniformly drawn samples taken from the complete data. As a result, computing the k-medoids solution using our method only involves solving k trivial sub-problems of centrality. This allows our algorithm to run in comparable time for arbitrarily large datasets with same underlying density distribution. We evaluate AGORAS experimentally against PAM and CLARANS -- two of the best-known existing algorithms for the k-medoids problem -- across a variety of published and synthetic datasets. We find that AGORAS outperforms PAM by up to four orders of magnitude for data sets with less than 10,000 points, and it outperforms CLARANS by two orders of magnitude on a dataset of just 64,000 points. Moreover, we find in some cases that AGORAS also outperforms in terms of cluster quality.	Esteban Rangel, Wei-Keng Liao, Ankit Agrawal, Alok Choudhary, William Hendrix
271	Impact of boundary conditions on shaping frequency response of a vibrating plate - modelling, optimization, and simulation [abstract] Abstract: The aim of this paper is to further develop the original method proposed by the authors in their previous publications and submitted as a patent to shape frequency response of a vibrating plate according to precisely dened demands. The method is based on modeling the plate together with additional masses and ribs, and applying a sophisticated optimization algorithm, which issues arrangement of the masses and ribs. It has a very high practical potential. It can be used to improve acoustic radiation of the plate for required frequencies or enhance acoustic isolation of noise barriers and device casings. It can be utilized for both passive and active control. For the latter case it allows at the same time to optimally arrange actuators and sensors. In the paper there are presented, compared and discussed simulation results of the method for a plate with dierent boundary conditions: simply supported, fully-clamped and elastically restrained against rotation (corresponding to a mounting in a real device casing). Proposed optimization criteria are followed from practical scenarios, where precise modication of a vibrating plate frequency response is desired. The application of the proposed method for active control is also shown. The important additional outcome of the paper are guidelines on de- signing device casings in terms of rigidity in order to obtain their required vibration and noise isolation features.	Marek Pawelczyk, Stanislaw Wrona
288	Simulations of One Tap Update LMS Algorithm in Application to Active Noise Control [abstract] Abstract: Partial Update LMS (PU LMS) algorithms started to play an important role in adaptive processing of sound. Due to reduction of computational power demands, these algorithms allow to use longer adaptive filters, and therefore achieve better results in adaptive filtering applications, e.g., system identification, adaptive line enhancement (ALE), and active noise control (ANC). There are two main groups of PU algorithms: data-independent algorithms and data-dependent algorithms. While application of an algorithm belonging to the first group almost always results in a degradation of performance, application of data-dependent PU algorithms may even result in an increase of the performance, compared with full parameters update. However, the latter group of algorithms requires sorting. A number of updated parameters is the factor that allows to decide how much of performance should be sacrificed to obtain computational power savings. In the extreme case only one filter tap out of possilbly hundreds is updated during each sampling period. The goal of this paper is to show extensive simulations prooving that careful selection of this one tap results in a useful and well performing algorithm, even in the demanding application of active noise control. As a final step, the simulations are confirmed in laboratory experiments	Dariusz Bismor
299	Formal analysis of an energy-aware collision resolution protocol for wireless sensor networks [abstract] Abstract: This paper provides a comprehensive and rigorous study of a novel collision resolution algorithm for wireless sensor networks: 2CS-WSN. It is specifically designed to be used during the contention phase of IEEE 802.15.4. This algorithm has been modelled in terms of discrete time Markov chains (DTMCs) and, using the probabilistic symbolic model checker PRISM, correctness properties and different operation modes of the algorithm have been studied. Moreover, different model abstractions have been used in order to identify any inconsistencies or ambiguities, and to prove interesting properties for non-trivial, practical and relevant scenarios. Finally, since the biggest source of energy waste is the collision, this paper conducts a wide study of energy saving in this algorithm.	M.Carmen Ruiz, Hermenegilda Macia, Jose Antonio Mateo, Francisco Javier Calleja

Data-Driven Computational Sciences - DDCS 2016 (DDCS) Session 2

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Boardroom East

Chair: Craig Douglas

28	On Solving Ill Conditioned Linear Systems [abstract] Abstract: This paper presents the first results to combine two theoretically sound methods (spectral projection and multigrid methods) together to attack ill-conditioned linear systems. Our preliminary results show that the proposed algorithm applied to a Krylov subspace method takes much fewer iterations for solving an ill-conditioned problem downloaded from a popular online sparse matrix collection.	Craig C. Douglas, Long Lee, Man-Chung Yeung
206	Abstract Framework for Decoupling Coupled PDE Models in Multi-Physics Applications:Algorithm, Analysis, and Software [abstract] Abstract: We discuss decoupling issues in multi-physics and complex system computation. A general framework is presented for decoupling coupled PDE models in multi-physics applications. Examples of decoupled numerical algorithms and theory are illustrated for two-grid/multi-grid methods, preconditioning methods, mixed implicit/explicit marching methods for coupled fluid/porous media flows, fluid-solid interaction, superconductivity, etc	Mo Mu
178	Hierarchical Density-Based Clustering based on GPU Accelerated Data Indexing Strategy [abstract] Abstract: Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and effectively for this purpose, we highlight those related to clustering, more specifically, density-based clustering strategies, which stands out for its ability to define clusters of arbitrary shape and the robustness to deal with the presence of data noise, such as DBSCAN and OPTICS. However, these algorithms are still a computational challenge since they are distance-based proposals. In this work we present a new approach to make OPTICS feasible based on data indexing strategy. Although the simplicity with which the data are indexed, using graphs, it allows explore various parallelization opportunities, which were explored using graphic processing unit (GPU). Based on this structure, the complexity of OPTICS is reduced to O(E*logV) in the worst case, becoming itself very fast. In our evaluation we show that our proposal can be over 200x faster than its sequential version using CPU.	Leonardo Rocha, Danilo Melo, Sávyo Toledo, Guilherme Andrade, Renato Ferreira, Fernando Mourão, Srinivasan Parthasarathy, Rafael Sachetto

Workshop on Large Scale Computational Physics (LSCP) Session 1

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Boardroom West

Chair: E. de Doncker

548	Workshop on Large Scale Computational Physics - LSCP 2016 [abstract] Abstract: The LSCP workshop focuses on symbolic and numerical methods and simulations, algorithms and tools (software and hardware) for developing and running large-scale computations in physical sciences. Special attention goes to parallelism, scalability and high numerical precision. System architectures are also of interest as long as they are supporting physics-related calculations, such as: massively parallel systems, GPUs, many-integrated-cores, distributed (cluster, grid/cloud) computing, and hybrid systems. Topics this year are from theoretical physics (high energy physics and lattice gauge theory/QCD). The effects of transformations in obtaining numerical results for Feynman loop integrals, and the deployment of a novel architecture to achieve large computational power with low electric power consumption are presented.	Omofolakunmi Olagbemi, Elise de Doncker, Fukuko Yuasa
307	First application of lattice QCD to Pezy-SC processor [abstract] Abstract: Pezy-SC processor is a novel new architecture developed by Pezy Computing K. K. that has achieved large computational power with low electric power consumption. It works as an accelerator device similarly to GPGPUs. A programming environment that resembles OpenCL is provided. Using a hybrid parallel system ``Suiren'' installed at KEK, we port and tune a simulation code of lattice QCD, which is computational elementary particle physics based on Monte Carlo method. We offload an iterative solver of a linear equation for a fermion matrix, which is in general the most time consuming part of the lattice QCD simulations. On single and multiple Pezy-SC devices, the sustained performance is measured for the matrix multiplications and a BiCGStab solver. We examine how the data layout affects the performance. The results demonstrate that the Pezy-SC processors provide a feasible environment to perform numerical lattice QCD simulations.	Tatsumi Aoyama, Ken-Ichi Ishikawa, Yasuyuki Kimura, Hideo Matsufuru, Atsushi Sato, Tomohiro Suzuki, Sunao Torii
529	Adaptive Integration and Singular Boundary Transformations [abstract] Abstract: We apply and compare results of transformations used to annihilate boundary singularities for multivariate integration over hyper-rectangular and simplicial domains. While classically these transformations are applied with a product trapezoidal rule, we use adaptive methods in the ParInt software package, based on rules of higher polynomial degree for the integration over subdomains. ParInt is layered over the MPI environment (Message Passing Interface) and deploys advanced parallel computation techniques such as load balancing among processes that are distributed over a network of nodes. The message passing is performed in a non-blocking and asynchronous manner, and permits overlapping of computation and communication. Comparisons of computation times using long double vs. double precision confirm that the extended format does not considerably increase the time for long doubles. We further apply the proposed methods to problems arising from self-energy Feynman loop diagrams with massless internal lines, in particular where the corresponding integrand has singularities on the boundaries of the integration domain.	Elise de Doncker, Fukuko Yuasa, Tadashi Ishikawa, John Kapenga, Omofolakunmi Olagbemi

Mathematical Methods and Algorithms for Extreme Scale (MMAES) Session 1

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Rousseau West

Chair: Vassil Alexandrov

482	Reducing Communication in Distributed Asynchronous Iterative Methods [abstract] Abstract: Communication costs have become an important factor in evaluating the performance of massively parallel algorithms. Asynchronous iterative methods have the potential to reduce these communication costs compared to their synchronous counterparts for solving systems of equations. The goal of this paper is to develop a communication-avoiding iterative method using an asynchronous implementation. Implemented using passive one-sided remote memory access (RMA) MPI functions, the method presented is a variation of the asynchronous Gauss-Seidel method. The variation is based on the Southwell method, where rows are relaxed greedily, instead of sequentially, by choosing the row with the maximum residual value. By comparing a process's residual value to its neighbors', the process decides to relax if it holds the maximum moduli residual. Additionally, a parameter is experimentally determined that dictates how long a process will wait after it successfully relaxes in order to let its update to propagate changes through its neighbors. Experimental results show that this method reduces communication costs compared to several other asynchronous iterative methods and the classic synchronous Jacobi method.	Jordi Wolfson-Pou, Edmond Chow
321	A Robust Technique to Make a 2D Advection Solver Tolerant to Soft Faults [abstract] Abstract: We present a general technique to solve Partial Differential Equations, called robust stencils, which make them tolerant to soft faults, i.e. bit flips arising in memory or CPU calculations. We show how it can be applied to a two-dimensional Lax-Wendroff solver. The resulting 2D robust stencils are derived using an orthogonal application of their 1D counterparts. Combinations of 3 to 5 base stencils can then be created. We describe how these are then implemented in a parallel advection solver. Various robust stencil combinations are explored, representing tradeoff between performance and robustness. The results indicate that the 3-stencil robust combinations are slightly faster on large parallel workloads than Triple Modular Redundancy (TMR). They also have one third of the memory footprint. We expect the improvement to be significant if suitable optimizations are performed. Because faults are avoided each time new points are computed, the proposed stencils are also comparably robust to faults as TMR for a large range of error rates. The technique can be generalized to 3D (or higher dimensions) with similar benefits.	Peter Strazdins, Brian Lee, Brendan Harding, Jackson Mayo, Jaideep Ray, Robert Armstrong

Multiscale Modelling and Simulation, 13th International Workshop (MSCALE) Session 2

Time and Date: 16:20 - 18:00 on 7th June 2016

Room: Rousseau East

Chair: D. Groen

180	Variance-reduced HMM for Stochastic Slow-Fast Systems [abstract] Abstract: We propose a novel variance reduction strategy based on control variables for simulating the averaged equation of a stochastic slow-fast system. In this system, we assume that the fast equation is ergodic, implying the existence of an invariant measure, for every fixed value of the slow variable. The right hand side of the averaged equation contains an integral with respect to this unknown invariant measure, which is approximated by the heterogeneous multiscale method (HMM). The HMM method corresponds to a Markov chain Monte Carlo method in which samples are generated by simulating the fast equation. As a consequence, the variance of the HMM estimator decays slowly. Therefore, we introduce a variance-reduced HMM estimator based on control variables: from the current time HMM estimation, we subtract a second HMM estimator at the previous time step using the exact same seed as the current time HMM estimator. To avoid introducing a bias, we add the previously calculated variance-reduced estimator. We analyze convergence of the proposed estimator and apply it to a linear and nonlinear model problem.	Ward Melis, Giovanni Samaey
78	Coupled lattice Boltzmann and link-flux simulations of electrokinetic transport in porous media [abstract] Abstract: Porous materials are instrumental in a wide range of micro- and nanoscale engineering applications, and porous media research continues to spawn innovations, e.g., in the biomedical and energy domains. For instance, oil recovery from reservoir rocks relies on multiphase flows that are governed by a complicated interplay of capillary pressures, permeability, and wettability of the porous formation. When a porous medium is filled with an electrolyte, dissociation or adsorption of ionic groups lead to a net surface charge which is compensated by an excess distribution of counterions in the bulk fluid. Under an applied electric field the charged ions accelerate the fluid resulting in electro-osmotic flow. Simple analytical theories can explain the basic effects but fail to predict quantitatively more complex electrokinetic transport phenomena that are affected by multiscale effects due to coupling of hydrodynamic, electrokinetic and diffusive transport. Mesoscopic simulation techniques have proven successful in solving numerically the coupled partial differential equations describing such systems, in particular when complex boundary conditions have to be taken into account. We present pore-scale simulations of electro-osmotic flow through a charged porous geometry using coupled lattice Boltzmann and link-flux methods implemented in our LB3D code. We investigate the dependence of the macroscopic fluxes on bulk ionic concentration, salt concentration, and applied potential gradient. Moreover, we apply the moment propagation method to calculate diffusion coefficients for neutral, cationic and anionic tracer particles in a simple model pore. The results reveal a crossover of the effective diffusion coefficient for charged tracers, and a non-monotonic dispersion coefficient depending on the salt concentration and Debye length. These findings are relevant for potential upscaling strategies between microscopic and macroscopic transport coefficients. Future extensions of our simulation approach include multiphase flows with charged amphiphilic surfactants, and more generally charged fluids for novel functional materials.	Ulf D. Schiller and Peter V. Coveney
278	Multiscale Modeling and Simulation of Rolling Contact Fatigue [abstract] Abstract: A multicale modeling is developed to study rolling contact fatigue and predict fatigue lives. At the nanoscale, molecular dynamics simulations of confined n-alkanes are performed to calculate the friction coefficient of contact surface in the presence of lubrication. Then, the finite element method is used to conduct fatigue analysis of roller contact elements at the macroscale. The fatigue crack initiation life and the position of the initial crack can be estimated. This work can be viewed as a frame work for studying mechanical systems subject to cyclic loads with the consideration of lubrication effects.	Shaoping Xiao, Ali Ghaffari and Yan Zhang
121	High-fidelity multiscale/multiphysics simulation of laser propagation in optically-active semiconductors [abstract] Abstract: We compute the interaction between the light field and the non-linear polarization of an optically-active, semiconductor medium. This coupling strongly influences laser light behavior. For the macroscale calculations we have adapted the streamline-upwind/Petrov-Galerkin finite element method (FEM) to compute the laser propagation within the paraxial approximation. On the microscale, we compute the medium polarization local to the FEM Gauss points, using both a simple model (optical Kerr) and a sophisticated model (Semiconductor-Maxwell-Bloch equations within Monte Carlo simulation). The coupled medium-lightfield response is handled using the Hierarchical Multiscale Method. This approach enables large-scale, high-fidelity calculations on high-performance computers of both the laser propagation and the material response.	Brent Kraczek and Jaroslaw Knap
142	An MPMD approach to discrete modeling of small scale plasticity [abstract] Abstract: The strength of crystalline materials are controlled, to a large extent, by the motion of dislocations. In materials with a high density of microstructural features, the motion of dislocations is restricted, resulting in increased strength. The small scale plasticity occurring in the vicinity of microstructure is a fundamentally multiscale phenomena. Bridging the characterization of individual dislocations at the atomistic scale with the macroscopic plastic response from cooperative dislocation motion at the continuum scale remains an open challenge. A method well suited for small scale plasticity is discrete dislocation dynamics (DDD) where plasticity is explicitly captured by the motion of dislocations. However, the computational expense of DDD grows immensely with the introduction of microstructure. Furthermore, parallel scalability is limited by the inherent differences in domain decomposition and load balancing when modeling both dislocations and microstructure. To address these issues, we have developed a multiple program multiple data (MPMD) approach to incorporating the effects of microstructure on plasticity. In this method, we couple DDD with a finite element (FE) solver to account for microstructural effects. Each application is executed separately to provide optimal domain decomposition, load balancing, and concurrency. Communication between applications is performed in parallel using distributed shared memory (DSM). In the present work, we analyze the performance of this algorithm and demonstrate the ability to model small scale plasticity in previously intractable systems.	Joshua Crone, Kenneth Leiter, Lynn Munday, James Ramsey and Jaroslaw Knap