All times are CST
Tuesday, Nov. 15
10:45 a.m. (Virtual Zoom Meeting Link)
Bogdan Nicolae, Argonne National Laboratory
“Perspectives on the Versatility of a Searchable Lineage for Scalable HPC Data Management”
Checkpointing is the most widely used approach to provide resilience for HPC applications by enabling restart in case of failures. However, coupled with a searchable lineage that records the evolution of intermediate data and metadata during runtime, it can become a powerful technique in a wide range of scenarios at scale: verify and understand the results more thoroughly by sharing and analyzing intermediate results (which facilitates provenance, reproducibility, and explainability), new algorithms and ideas that reuse and revisit intermediate and historical data frequently (either fully or partially), manipulation of the application states (job pre-emption using suspend-resume, debugging), etc. This talk advocates a new data model and associated tools (DataStates, VELOC) that facilitate such scenarios. Avoid direct use of a data service to read and write datasets; instead, during runtime, users should tag datasets with properties that express hints, constraints, and persistency semantics. Doing so will automatically generate a searchable record of intermediate data checkpoints, or data states, optimized for I/O. Such an approach brings new capabilities and enables high performance, scalability, and FAIR-ness through a range of transparent optimizations. The talk will introduce DataStates and VELOC, will underline several vital technical details, and will conclude with several examples of where they were successfully applied.
11:30 a.m. (Virtual Zoom Meeting Link)
Andrew Tasman Powis, Princeton Plasma Physics Laboratory
“Beyond Fusion – Plasma Simulation for the Semiconductor Industry”
The Princeton Plasma Physics Laboratory (PPPL) has pursued and delivered excellence in scientific high-performance computing and algorithm design for many decades. This includes development of the gyrokinetic algorithm and delivery of code bases such as XGC, GTS, TRANSP, M3D and Gkeyll which are widely utilized within the burning plasma and heliophysics communities. Nonetheless, this foundation of skills in plasma physics, applied math and computer science readily lends itself to a more diverse set of applications and the laboratory is growing its efforts to facilitate computational modelling of low-temperature plasma (LTP) phenomena and plasma chemistry. LTPs are widely applied in industry, most notably within the semi-conductor manufacturing sector, which is predicted to double in value to over $1 trillion by the end of this decade. Combining its computational heritage and expertise in low-temperature plasma theory and experiments, the lab is developing a new open-source Low-Temperature Plasma Particle-in-Cell code to support this thrust. The software has been tested on NERSC’s Perlmutter and code validation is being performed in collaboration with experimentalists and industry partners around the globe. Additionally, we are leveraging codes such as LAMMPS and Gaussian to capture plasma surface chemical processes relevant to this domain of plasma physics. This talk will focus on PPPL’s legacy in high-performance computing and explain how the lab is leveraging that experience to tackle some of the greatest challenges facing our world today using advanced supercomputers.
1:00 p.m. (Virtual Zoom Meeting Link)
Sunita Chandrasekaran and Johannes Doerfert, Brookhaven National Laboratory and Lawrence Livermore National Laboratory
“ECP SOLLVE and its race to Frontier”
OpenMP is a popular tool for on-node programming that is supported by a strong community of vendors, national labs, and academic groups. Several Exascale Computing Project (ECP) applications include OpenMP as part of their strategy for reaching exascale levels of performance. This talk represents the ECP SOLLVE project, where we continue to work with application partners and members of the OpenMP language committee to extend the OpenMP feature set to meet ECP application needs, especially with regard to accelerator support. This talk will present latest updates on the LLVM/Clang implementations/enhancements, their applicability on ECP applications and beyond. We will also present the current status of OpenMP offloading compiler implementations on pre-exascale and exascale system(s), their maturity and stability using our validation and verification testsuite.
1:45 p.m. (Virtual Zoom Meeting Link)
James A. Ang, Pacific Northwest National Laboratory
“New Horizons for HPC”
High Performance Computing is entering an era that will require significant adaptations; fundamental technologies are changing, new models of computing are emerging, and traditional ecosystems are being disrupted. The speaker describes an open innovation model, guided by HPC as a lead user and enabled by the CHIPS and Science Act, that can be an organizing principle for future computing research, bridge the valley of death with new public-private partnership models, and address the critical role of workforce development.
2:30 p.m. (Virtual Zoom Meeting Link)
Dominic Manno, Los Alamos National Laboratory
“GUFI: The Grand Unified File Index: Performant, Secure, Accessible, and Extensible, Pick Any Four”
Modern data centers routinely store massive data sets resulting in millions of directories and billions of files to support thousands of simultaneous users. While existing file systems store metadata that makes it possible to query the location of specific data sets or determine which data sets are responsible for the most capacity use per user, such queries typically do not perform well at the scale of modern data center file counts. In this paper we describe the Grand Unified File Index (GUFI) that enables both data center users and data center administrators to rapidly and securely search and sift through billions of file entries to rapidly locate and characterize data sets of interest. The hierarchical indexing used by GUFI preserves access permissions so that the index can be directly securely accessed by users and also enables advanced analysis of storage system use at a large-scale data center. Further, the indexing method used in GUFII is extremely extensible, allowing data center customization trivially. Compared to the existing state-of-the-art index for file system metadata, GUFI is able to provide speedups of 1.5x to 230x for queries executed by administrators using a real file system namespace. Queries executed by users, which typically cannot rely on data center wide indexing services, see even greater speedups using GUFI.
3:15 p.m. (Virtual Zoom Meeting Link)
Inder Monga, Lawrence Berkeley National Laboratory
“ESnet6: How ESnet’s Next-generation Infrastructure Will Enable Integrated Research Initiative Workflows”
This talk will discuss the newly completed upgrade of the ESnet6 infrastructure, including the complexities of completing the project during the pandemic. ESnet Executive Director Inder Monga will provide a brief overview on the architecture of the new facility, the bandwidth deployed, the automation software stack, and the services it enables. Focus will be on recent demonstrations with laboratories that illustrate the support for the upcoming Integrated Research Initiative and how the features of ESnet6 enable that vision.
4:00 p.m. (Virtual Zoom Meeting Link)
Shantenu Jha, Brookhaven National Laboratory
“ZettaWorks: Taking ExaWorks to the next frontier”
High-performance workflows are necessary for scientific discovery. We outline how ExaWorks is enabling workflows at extreme scales, and a vision for ExaWorks beyond exascale.