collectives

2025 – Present

Designing scalable and performant collective communication for distributed deep learning on GPU supercomputers, targeting the skewed and dynamic traffic patterns of modern workloads such as Mixture-of-Experts (MoE).

SABRE — Skew-aware Adaptive All-to-allv · ICS 2026 · PDF

Proposed skew-aware adaptive all-to-allv algorithms for dynamic deep-learning workloads, adapting the communication strategy to highly imbalanced, runtime-varying message sizes.

The Big Send-off — Scalable and Performant Collectives for Deep Learning · IPDPS 2026 · PDF

Built high-performance collective communication primitives for deep learning on GPU-based supercomputers, delivering scalable and performant collectives across large GPU counts.