Cerebras Systems, the pioneer in accelerating generative AI, announced the achievement of a 130x speedup over Nvidia A100 GPUs on a key nuclear energy HPC simulation kernel, developed by researchers at Argonne National Laboratory. This result demonstrates the performance and versatility of the Cerebras Wafer-Scale Engine (WSE-2) and ensures that the U.S. continues to be the global leader in supercomputing for energy and defense applications.
Monte Carlo particle transport is a major focus in the field of HPC as it provides high fidelity simulation of radiation transport and is vital to fission and fusion reactor designs. In this research collaboration, a Cerebras CS-2 system dramatically outperformed a highly optimized GPU implementation in the most demanding portion of the Monte Carlo neutron particle transport algorithm – the macroscopic cross section lookup kernel. This kernel represents the most computationally intensive portion of the full simulation, accounting for up to 85% of the total runtime for many nuclear energy applications. This work further validates Argonne’s ALCF AI Testbed program, which aims to bring AI accelerators to the forefront of U.S. supercomputing infrastructure, exploring capabilities beyond what is achievable with GPUs.
“I’ve implemented this kernel in a half dozen different programming models and have run it on just about every HPC architecture over the last decade,” said John R. Tramm, Assistant Computational Scientist, Argonne National Laboratory. “The performance numbers we were able to get out of the Cerebras machine impressed our team – a clear advancement over what has been possible on CPU or GPU architectures to-date. Our team’s work adds to growing evidence that AI accelerators have serious potential to disrupt GPU dominance in the field of HPC simulation.”
Also Read : Lumen wins $110 million contract from Defense Information Systems Agency
Monte Carlo neutron particle transport provides high fidelity simulation of radiation transport, which is a critical component of fission and fusion reactor design. Within this algorithm, the macroscopic cross section lookup kernel assembles statistical distribution data used to generate random samples for a particle’s behavior as it moves through a simulated geometry and interacts with various materials. ANL scientists implemented an optimized version of the macroscopic cross-section lookup kernel using the Cerebras SDK and the CSL programming language. The implementation took advantage of Cerebras CS-2’s wafer scale architecture of up to 850,000 cores and 40GB of on-chip SRAM which provided a combination of extreme bandwidth and low latency – an ideal match for Monte Carlo particle simulations. This research also validates the ability of external researchers to develop their own HPC applications for the Cerebras architecture, unlocking new levels of performance on a wide variety of computational problems.
“These published results highlight not only the incredible performance of the CS-2, but also its architectural efficiency,” said Andrew Feldman, CEO and co-founder of Cerebras Systems. “The Cerebras CS-2 system, powered by the WSE-2 processor, has 48x more transistors than the A100 but achieved a 130x speedup, showing a 2.7x gain in architectural efficiency for a problem that is widely optimized for GPUs.”
Moreover, the Cerebras CS-2 demonstrated strong scaling, meaning it achieved high performance on both small- and large-scale simulations. The researchers noted that in smaller scale simulations, no amount of GPUs working in parallel would be able to match the performance of a single CS-2.
The Cerebras CS-2, powered by the WSE-2, is purpose-built for generative AI and scientific applications. It has delivered remarkable results, often characterized as “100x” improvements in scientific computing. Notably, in a multi-dimensional seismic processing project conducted by the King Abdullah University of Science and Technology (KAUST), a cluster of 48 CS-2s achieved performance comparable to the world’s fastest supercomputer. Similarly, researchers at the National Energy Technology Laboratory used the CS-2 to perform computational fluid dynamics a staggering 470 times faster than its Joule Supercomputer. Additionally at TotalEnergies, the CS-2 accelerated stencil computations by an impressive 228 times when compared to a GPU-based solution.
SOURCE : BusinessWire