Cunyang Wei

Ph.D. Student at the Parallel Software and Systems Group, University of Maryland, College Park.

prof_pic.png

Department of Computer Science

University of Maryland

College Park, MD, US

cunyang [at] umd.edu

I am a Ph.D. student at the University of Maryland, College Park, advised by Prof. Abhinav Bhatele in the Parallel Software and Systems Group (PSSG). My current research focuses on parallel optimization for large-scale GNNs, performance variability, and collective communication.

Prior to joining UMD, I completed my Master’s degree at the Institute of Computing Technology (ICT), University of Chinese Academy of Sciences, Beijing, China, in 2023, where I was advised by Prof. Haipeng Jia. During my time at ICT, I focused on irregular matrix multiplication, resulting in first-author publications in TPDS, ICPP, and HPCC.

My research interests include high-performance computing, parallel algorithms, and optimizing AI workloads for modern hardware. I have been honored with awards such as the National Scholarship of China and Outstanding Graduate of Beijing.

For additional details, please refer to my full CV.

selected publications

  1. ICS
    Skew-aware Adaptive All-to-allv Algorithms for Dynamic Deep Learning Workloads
    Cunyang Wei and Abhinav Bhatele
    In ACM International Conference on Supercomputing (ICS), 2026
  2. IPDPS
    The Case of the Elusive Application Performance on Production GPU Supercomputers
    Cunyang Wei, Keshav Pradeep, and Abhinav Bhatele
    In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2026
  3. SC
    Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training
    Aditya K. Ranjan, Siddharth Singh, Cunyang Wei, and 1 more author
    In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2025
  4. TPDS
    IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs
    Cunyang Wei, Haipeng Jia, Yunquan Zhang, and 3 more authors
    IEEE Transactions on Parallel and Distributed Systems (TPDS), 2024
  5. ICPP
    IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs
    Cunyang Wei, Haipeng Jia, Yunquan Zhang, and 2 more authors
    In International Conference on Parallel Processing (ICPP), 2022