distributed GNNs
Dec 2024 – Present
Plexus tames billion-edge graphs with a 3D parallel approach to full-graph GNN training, distributing both graph data and computation across thousands of GPUs.
- Proposed a novel 3D parallel algorithm that addresses the memory, communication, and load-balancing challenges of large-scale GNN training.
- Designed a performance model to automatically select optimal 3D virtual GPU grid configurations, and a double-permutation scheme to achieve near-perfect load balancing for sparse graph data.
- Achieved unprecedented scalability up to 2048 GPUs on the Frontier and Perlmutter supercomputers, delivering up to a 54.2x speedup over state-of-the-art frameworks.
Published at SC 2025. PDF