collectives
2025 – Present
Designing scalable and performant collective communication for distributed deep learning on GPU supercomputers, targeting the skewed and dynamic traffic patterns of modern workloads such as Mixture-of-Experts (MoE).
SABRE — Skew-aware Adaptive All-to-allv · ICS 2026 · PDF
- Proposed skew-aware adaptive all-to-allv algorithms for dynamic deep-learning workloads, adapting the communication strategy to highly imbalanced, runtime-varying message sizes.
The Big Send-off — Scalable and Performant Collectives for Deep Learning · IPDPS 2026 · PDF
- Built high-performance collective communication primitives for deep learning on GPU-based supercomputers, delivering scalable and performant collectives across large GPU counts.