DOE Gordon Bell Prize

2024 Gordon Bell Prize Finalists

MProt-DPO[CB3]: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization

Gautham Dharuman, Kyle Hippe, Alexander Brace, Sam Foreman, Väinö Hatanpää, Varuni K. Sastry, Huihuo Zheng, Logan Ward, Servesh Muralidharan, Archit Vasan, Bharat Kale, Carla M. Mann, Heng Ma, Yun-Hsuan Cheng, Yuliana Zamora, Shengchao Liu, Chaowei Xiao, Murali Emani, Tom Gibbs, Mahidhar Tatineni, Deepak Canchi, Jerome Mitchell, Koichi Yamada, Maria Garzaran, Michael E. Papka, Ian Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan

» About

This novel work presents a scalable, multimodal workflow for protein design that trains an LLM to generate protein sequences, computationally evaluates the generated sequences, and then exploits them to fine-tune the model.  Direct Preference Optimization steers the LLM toward the generation of preferred sequences, and enhanced workflow technology enables its efficient execution. A 3.5B and a 7B model demonstrate scalability and exceptional mixed precision performance of the full workflow on ALPS, Aurora, Frontier, Leonardo and PDX.

Affiliations: Argonne National Laboratory, California Institute of Technology, CINECA, NVIDIA Corp., University of California at Berkeley

Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression

Hatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes

» About

This team developed an output accuracy-preserving method for exploiting low-precision data types in matrix computations and used it to create a highly efficient Cholesky-based solver. Their tile-centric adaptive precision matrix operations, and task-based execution, enabled the largest-ever Genome-Wide Association Studies (GWAS) of 305K patients from a real data set, using a multivariate approach to identify genetic risk factors.  Its outstanding scaling and very high mixed-precision performance was demonstrated on 8,100 GPUs of Alps, 36,100 GPUs of Frontier, 4096 GPUs of Leonardo, and 18432 GPUs of Summit.

Affiliations: KAUST, Massachusetts Institute of Technology, Saint Louis University, NVIDIA Corp.

Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System

Kylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan Thompson, Delyan Z. Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros III, Michael James, Sivasankaran Rajamanickam

» About

This team has created an Embedded Atom Method (EAM)-based molecular dynamics code that exploits the ultra-fast communication and high memory bandwidth afforded by the 850,000 core-Cerebras Wafer-Scale Engine. It attains perfect weak scaling across the full system for grain boundary problems involving copper, tungsten and tantalum atoms, and can extend to multiple wafers. For problems up to 800,000 atoms, it calculates significantly more timesteps per second than EAM in LAMMPS on Quartz and Frontier, directly benefiting the modeling of phenomena that emerge at long timescales.

Affiliations: Cerebras Systems, Sandia National Laboratories, Lawrence Livermore National Laboratory, Los Alamos National Laboratory

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers

Siddharth Singh, Prajwal Singhania, Aditya Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele

» About

This work presents AxoNN, a scalable, portable, open-source framework for training and fine-tuning large language models (LLMs) and describes optimizations applied by the team that enable it to handle LLMs with hundreds of billions to trillions of parameters. They provide results of a study on the potential for the memorization of training data by large LLMs, using AxoNN on up to 405 billion parameters on Frontier.  Their evaluations show the exceptional scaling and performance attained when training GPT-style transformer models with up 640 billion parameters on Alps, Frontier and Perlmutter.

Affiliations: University of Maryland, Max Planck Institute for Information Systems, University of California, Berkeley

Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials

Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca

» About

This work describes a novel approach for accurately simulating complex biochemical phenomena via biomolecular-scale Ab Initio Molecular Dynamics simulations at quantum molecular wave function level.  Multiple algorithmic innovations were employed to overcome the computational challenges posed by the use of the second-order Moller-Plesset perturbation theory. Evaluated using biomolecules with up to 2,043,328 electrons, the code exhibits high parallel efficiencies on the full Perlmutter and Frontier systems and sustained an unprecedented 59% of FP64 peak performance on Frontier.

Affiliations: University of Melbourne, Australian National University, AMD Inc., Oak Ridge National Laboratory

2024 ACM Gordon Bell Prize for Climate Modeling Finalists

Boosting Earth System Model Outputs and Saving PetaBytes in Their Storage Using Exascale Climate Emulators

Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, and Ying Sun

» About

To advance our understanding of global climate change at the local scale there are both massive computational and data storage challenges. This work leverages the state-of-the-art in high-performance computing to develop a scalable implementation of an exascale climate emulator that illustrates the potential of supercomputers to address the challenges of climate modeling at ultra-high resolution. They also demonstrate the potential for significant savings in the petabytes of storage required for storing and sharing climate simulations. Their approach uses the Spherical Harmonic Transform (SHT) for modeling spatio-temporal climate data that can accommodate climate data sourced from various spatial resolutions. The exascale climate emulator developed in this study holds significant potential for the climate community, advancing climate research and policy making by making high resolution climate data more readily available so that we can better address the threat of global climate change.

Affiliations: Extreme Computing & Statistics & Earth Science, King Abdullah University of Science and Technology, KSA Computational and Information Sciences Lab, NSF National Center for Atmospheric Research, USA NVIDIA, USA Department of Computer Science, Saint Louis University, USA Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, USA

ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong-Youl Choi, Ashwin M. Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

» About

The combination of rapidly advancing AI techniques and abundant simulation data from multi-model ensemble projects such as CMIP6 forms the basis for ORBIT the Oak Ridge Base Foundation Model for Earth System Predictability.

They designed ORBIT to scale up 113 billion parameters, the largest dense vision transformer to date, and were able to incorporate a record setting 91 channels of climate variables. ORBIT was pre-trained on 10 different CMIP6 datasets that included 1.2 million observation data points. ORBIT was able to achieve impressive performance during training delivering up to 1.6 exaFLOPS/684 PFLOPS for the 10/113 billion parameter models on 49,152 Frontier AMD GPUs using mixed single BFLOAT16 precision using a novel Hybrid Sharded Tensor-Data Orthogonal Parallelism (Hybrid-STOP).

Importantly, this approach is not reliant on specialized HPC architectures, making it applicable on a broad range of HPC systems. Thus, ORBIT represents a significant step forward in advancing our Earth system modeling capabilities.

Affiliations: Oak Ridge National Laboratory, Oak Ridge, United States, AMD Research and Advanced Development, Santa Clara, United States

» 2023 Gordon Bell Award Finalists

Large-scale Materials Modeling at Quantum Accuracy: Ab Initio Simulations of Quasicrystals and Interacting Extended Defects in Metallic Alloys

Sambit Das, Bikash Kanungo, Vishal Subramanian, and others (eight authors total) as part of a team that includes the University of Michigan, Indian Institute of Science, and Oak Ridge National Laboratory

In this work, the team developed a mixed method to combine density function theory (DFT) and the quantum many body (QMB) problem using a machine learning technique. The effort achieves high accuracy of calculation and affords large-scale modeling with the inverse-DFT that links the QMB method to DFT. They realized the ground-stage energy calculation while keeping the accuracy commensurate with QMB, using more than 60% of resources on the Frontier supercomputer housed within the Oak Ridge Leadership Computing Facility.

Exascale Multiphysics Nuclear Reactor Simulations for Advanced Designs

Elia Merzaria, Steven Hamilton, Thomas Evans, and others (12 authors total) featuring a team from Pennsylvania State University, Oak Ridge National Laboratory, Argonne National Laboratory, and University of Illinois at Urbana-Champaign

The team simulated an advanced nuclear reactor system coupling radiation transport with heat and fluid simulation, including the high-fidelity, high-resolution Monte-Carlo code, Shift, and the computational fluid dynamics code, NekRS. Nek5000/RS was implemented on ORNL’s Frontier system and achieved 1 billion spectral elements and 350 billion degrees of freedom, while Shift achieved very high weak-scaling on 8192 system nodes. As a result, they calculated six reactions in 214,896 fuel pin regions below 1% statistical error, yielding first-of-a-kind resolution for a Monte Carlo transport application.

Scaling the Leading Accuracy of Deep Equivariant Models to Biomolecular Simulations of Realistic Size

Albert Musaelian, Anders Johansson, Simon Batzner, and Boris Kozinsky as part of a team from the Harvard John A. Paulson School of Engineering and Applied Sciences

The group developed the Allegro architecture to bridge the accuracy-speed tradeoff of atomistic simulations and enable the description of dynamics in structures of unprecedented complexity at quantum fidelity. This is achieved through a combination of innovative model architecture, massive parallelization, and model implementations optimized for efficient GPU utilization. Allegro’s scalability is illustrated by a nanoseconds-long stable simulation of protein dynamics and up to 44-million atom structure of a complete, all-atom, explicitly solvated HIV capsid on the Perlmutter system at the National Energy Research Scientific Computing Center. They achieved strong scaling up to 100 million atoms.

See the full list of 2023 Gordon Bell finalists [HERE].

Related Stories

Gordon Bell Prize for Climate Modeling

The Simple Cloud-Resolving E3SM Atmosphere Model Running on the Frontier Exascale System

Authors: Mark Taylor, Peter M. Caldwell, Luca Bertagna, Conrad Clevenger, Aaron S. Donahue, James G. Foucar, Oksana Guba, Benjamin R. Hillman, Noel Keen, Jayesh Krishna, Matthew R. Norman, Sarat Sreepathi, Christopher R. Terai, James B. White III, Danqing Wu, Andrew G. Salinger, Renata B. McCoy, L. Ruby Leung, and David C. Bader

This work introduces an efficient and performance portable implementation of the Simple Cloud Resolving E3SM Atmosphere Model (SCREAM). SCREAM is a full-featured atmospheric global circulation cloud-resolving model. A significant advancement is SCREAM was developed anew using C++ and incorporates the Kokkos library to abstract the on-node execution model for both CPUs and GPUs. To date, only a few global atmosphere models have been ported to GPUs. SCREAM was able to run on both AMD and NVIDIA GPUs and on nearly an entire exascale system (Frontier). On the Frontier system, it achieved a groundbreaking performance, simulating 1.26 years per day for a practical cloud-resolving simulation. This constitutes a pivotal stride in climate modeling, offering enhanced and highly necessary predictions regarding the potential outcomes of future climate changes.

See the full list of 2023 Gordon Bell Prize for Climate Modeling finalists [HERE].

Related Stories