Tuesday, Nov. 13
Pete Beckman, Argonne National Laboratory
“The Tortoise and the Hare: Is There Still Time for HPC to Catch Up to the Cloud in the Performance Race?”
Speed and scale define supercomputing. By many metrics, our supercomputers are the fastest, most capable systems on the planet. We have succeeded in deploying extreme-scale systems with high reliability, extended uptime, and large user communities. Computational science at extreme scale is leading to scientific breakthroughs. Over the past twenty years, however, the community has become overconfident in our designs for HPC system software and intelligent networking, while the cloud computing community has been steadily adding new software features and intelligent networking. From containers and virtual machines to software-defined networking and FPGAs in the fabric, the hyperscalers have been steadily moving forward building advanced systems. Has the cloud computing community already won the race? Can HPC regain leadership in the design and architecture of flexible system software and leverage containers, advanced operating systems, reconfigurable fabrics, and software-defined networking? Come learn about Argo, an operating system project for the Exascale Computing Project, how “Fluid HPC” could make large-scale system more flexible, and how the HPC community might leverage these new technologies.
“Fermilab’s Quantum Computing Program”
Fermilab’s Panagiotis Spentzouris will discuss the goals and strategy of the Fermilab Quantum Science Program, which includes simulation of quantum field theories, development of algorithms for high-energy physics computational problems, teleportation experiments and applying qubit technologies to quantum sensors in high-energy physics experiments.
Sriram Krishnamoorthy, Pacific Northwest National Laboratory
“Intense National Focus on QIS”
PNNL scientist Sriram Krishnamoorthy invites you to learn how the scientific grand challenge of quantum chemistry will benefit from quantum computers. PNNL, with its depth of experience in computational chemistry, is currently exploring and designing the quantum chemistry problems that can benefit most from quantum computers. In addition, PNNL’s computer scientists and computational chemists are working closely with industry partners to jointly design the first quantum computing-based quantum chemistry calculations that surpass the limits of classical supercomputers. In this talk, Krishnamoorthy will describe these efforts and collaborations as well as other ongoing quantum computing-related activities at PNNL.
“Introducing NERSC-9, Berkeley Lab’s Next-Generation Pre-Exascale Supercomputer”
The NERSC-9 pre-exascale system, to be deployed in 2020, will support the broad Office of Science user community. The system is designed to support the needs of both simulations and modeling, as well as data analysis from DOE’s experimental facilities. This talk will announce and describe the NERSC-9 system for the SC18 community, including architecture features and plans for transitioning NERSC’s 7,000-member user community.
Inder Monga, Lawrence Berkeley National Laboratory
“ESnet6: Design of the Next-Generation Science Network”
Because of the dramatically increasing size of datasets and the need to make scientific data broadly accessible, ESnet is designing ESnet6, its next-generation network. The network will offer higher bandwidth, more growth capability, advanced features tailored for modern science and the necessary resilience to support DOE’s core research mission. The talk will discuss the conceptual ESnet6 architecture that will comprise of a programmable, scalable and resilient hollow core coupled with a flexible, dynamic and programmable services edge.ESnet6 will feature services that monitor and measure the network to make sure it is operating at peak performance. These services will also facilitate advanced cybersecurity capabilities providing the control and management needed to protect the network.
“The Ristra Project: Preparing for Multi-Physics Simulation at Exascale”
Two key challenges on the path to efficient multi-physics simulation on exascale-class computing platforms are (a) abstracting exascale hardware from multi-physics code development, and (b) solving integral problems at multiple physical scales. Ristra, a four-year old Los Alamos project under the Advanced Technology Development and Mitigation (ATDM) sub-program of the DOE ASC program, is developing a toolkit for multi-physics code development based around a computer science interface (FleCSI) that limits the impact of disruptive computer technology on physics developers. FleCSI enables the adoption of novel programming models and data management methods to address the challenges and diversity of new technology. Simultaneously, Ristra is exploring the use of multi-scale numerical methods that offer improved physics fidelity and computing efficiency. The Ristra software architecture and progress to date will be presented, together with early results of simulations in solid mechanics and multi-scale radiation hydrodynamics.
“Machine Learning and Predictive Simulation: HPC and the U.S. Cancer Moonshot on Sierra”
The marriage of experimental science with simulation has been a fruitful one–the fusion of HPC-based simulation and experimentation moves science forward faster than either discipline alone, rapidly testing hypotheses and identifying promising directions for future research. The emergence of machine learning at scale promises to bring a new type of thinking into the mix, incorporating data analytics techniques alongside traditional HPC to accompany experiment. I will discuss the convergence of machine learning, predictive simulation and experiment in the context of one element of the U.S. Cancer Moonshot– a multi-scale investigation of Ras biology in realistic membranes.
“BigPanDA project. Workflow and Workload Management System for High Energy and Nuclear Physics, and for Extreme Scale Scientific Applications”
The PanDA software is used for workload management on distributed grid resources by the ATLAS experiment at the LHC. An effort was launched to extend PanDA, called BigPanDA, to access HPC resources, funded by the US Department of Energy (DOE-ASCR). Through this successful effort, ATLAS today uses over 25 million hours monthly on the Titan supercomputer at Oak Ridge National Laboratory. Many challenges were met and overcome in using HPCs for ATLAS simulations. ATLAS uses two different operational modes at Titan. The traditional mode uses allocations – which require software innovations to fit the low latency requirements of experimental science. New techniques were implemented to shape large jobs using allocations on a leadership class machine. In the second mode, high priority work is constantly sent to Titan to backfill high priority leadership class jobs. This has resulted in impressive gains in overall utilization of Titan, while benefiting the physics objective s of ATLAS. For both modes, BigPanDA has integrated traditional grid computing with HPC architecture.
Wednesday, Nov. 14
Kerstin Kleese van Dam, Brookhaven National Laboratory
“Real Time Performance Analysis of Applications and Workflows”
As part of the ECP CODAR project Brookhaven National Laboratory in collaboration with the Oregon Universities TAU team have developed unique capabilities to analyze, reduce and visualize single application and complete workflow performance data in-situ. The resulting tool enables the researchers to examine and explore their workflow performance as it is being executed.
Arthur “Buddy” Bland, Oak Ridge National Laboratory
“An Overview of ORNL’s Summit Supercomputer”
In June 2018, U.S. Department of Energy’s Oak Ridge National Laboratory unveiled Summit as the world’s most powerful and smartest scientific supercomputer. Summit has a peak performance of 200 petaflops, and for certain scientific applications, will also be capable of more than three billion billion mixed precision calculations per second, or 3.3 exaops. Summit will provide unprecedented computing power for research in energy, advanced materials and artificial intelligence (AI), among other domains, enabling scientific discoveries that were previously impractical or impossible.
Mike Sprague, National Renewable Energy Laboratory
“ExaWind: Towards Predictive Wind Farm Simulations on Exascale Platforms”
This talk will describe the ExaWind Exascale Computing Project, which is in pursuit of predictive wind turbine and wind plant simulations. Predictive, physics-based high-fidelity computational models, validated with targeted experiments, provide the most efficacious path to understanding wind plant physics and reducing wind plant losses. Predictive simulations will require blade-resolved moving meshes, high-resolution grids to resolve the flow structures, hybrid-RANS/LES turbulence modeling, fluid-structure interaction, and coupling to meso-scale flows. The modeling and algorithmic pathways of ExaWind include unstructured-grid finite volume spatial discretization and pressure-projection methods for incompressible flow. The ExaWind code is Nalu-Wind, which is built on Trilinos/STK and employs the Kokkos abstraction layer for performance portability. Results will be shown for turbine simulations with the Hypre and Trilinos linear-system solver stacks with particular focus on strong scaling performance on NERSC Cori and NREL Peregrine and the underlying algebraic multigrid (AMG) preconditioners. We also describe new Hypre results on SummitDev at OLCF, and recent MW-scale single-turbine simulations under turbulent inflow.
Yee Ting Li, SLAC National Accelerator Laboratory
“Hyperscale (Petabyte, Exabyte and Beyond) Data Distribution for Delivery of LCLS-II Free Electron Laser Data to Supercomputers”
The next generation Linear Coherent Light Source (LCLS-II) at SLAC is planned to achieve first light in 2020. The potential data rates are 1000X greater than the existing LCLS. By 2025, experimenters will need to stream data from the detectors at SLAC to DOE supercomputers at rates substantially exceeding terabits/sec. Since 2014, we have been working to create an effective solution for hyperscale data distribution. Using 5 rack-unit co-located clusters and 80Gbits/sec capacity links over a 5000mile path, we recently transferred a petabyte of encrypted data in a world-leading 29 hours. Our next steps are to transport data from SLAC to NERSC over an ESnet 100Gbps capacity link, compare software solutions and evaluate Intel Optane SSDs.
Doug Kothe, Oak Ridge National Laboratory
“Exascale Computing Project Update”
An update on the U.S. Department of Energy’s Exascale Computing Project – a multi-lab, 7-year collaborative effort focused on accelerating the delivery of a capable exascale computing ecosystem by 2021. The goal of the ECP is to enable breakthrough solutions that can address our most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. The project is a joint effort of two U.S. Department of Energy (DOE) organizations: the Office of Science and the National Nuclear Security Administration (NNSA).
Jim Laros, Sandia National Laboratories
“Vanguard-Astra: NNSA Advanced Architecture Prototype Platform”
Jim Brandt, Sandia National Laboratories
“Platform Independent Run Time HPC Monitoring, Analysis, and Feedback at Any-Scale”
Large-scale HPC simulation applications may execute across thousands to millions of processor threads. Contention for network and/or file system resources and mismatches in processor, memory, and network resources can have significant impact on application performance. Such effects can stem from a variety of sources from manufacturing variation to resource allocation, to power and cooling variation and more. This talk presents a suite of scalable tools, developed by Sandia, to gain insight into per-instance causes of application performance degradation. We present background, architectural details and actual use case examples of monitoring sources, data, and run time analyses of that data. We also present how the output can directly inform application users and operations staff about application and system performance characteristics as well as be used to provide feedback to applications and system software components. The tools are not only useful for the insights they provide but are also fun to use and can provide hours of enjoyment for users, operations staff, and researchers trying to identify ways to architect more efficient systems/applications.
Graham Heyes, Thomas Jefferson National Accelerator Facility
“Streaming Data for Nuclear Physics Experiments”
The computing workflow model for most nuclear physics experiments has remained relatively unchanged for over thirty years. Data is read from detectors, heavily filtered to reduce data rate and stored. At a later date the data is retrieved and processed using thousands of individual jobs on a batch system. The final, compute intensive, processing, was performed locally since network bandwidth limited offsite data access. The whole process is slow, with weeks or months between steps, and forces the scientist to make choices in advance of data taking that affect data quality. Advances in all aspects of computing are beginning to make possible a model, new to nuclear physics, where filtering is relaxed and data is streamed in parallel through various stages of online and near line processing. This results in rich multi-dimensional datasets that can be made accessible for processing using grid, cloud, or leadership class computing facilities. This is a much more responsive workflow with minimum filtering of the raw data, which leaves decisions effecting science quality as late as possible. At Jefferson Lab several aspects of this computing model are being investigated. It is expected that, on the five to ten year timescale, streaming data readout and processing will become the norm.