publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. ICS
    Skew-aware Adaptive All-to-allv Algorithms for Dynamic Deep Learning Workloads
    Cunyang Wei and Abhinav Bhatele
    In ACM International Conference on Supercomputing (ICS), 2026
  2. IPDPS
    The Case of the Elusive Application Performance on Production GPU Supercomputers
    Cunyang Wei, Keshav Pradeep, and Abhinav Bhatele
    In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2026
  3. IPDPS
    The Big Send-off: Scalable and Performant Collectives for Deep Learning
    Siddharth Singh, Keshav Pradeep, Mahua Singh, and 2 more authors
    In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2026

2025

  1. SC
    Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training
    Aditya K. Ranjan, Siddharth Singh, Cunyang Wei, and 1 more author
    In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2025
  2. Poster
    Unmasking Performance Variability in GPU Codes on Production Supercomputers
    Cunyang Wei, Rishi Keshav Pradeep, and Abhinav Bhatele
    2025

2024

  1. TPDS
    IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs
    Cunyang Wei, Haipeng Jia, Yunquan Zhang, and 3 more authors
    IEEE Transactions on Parallel and Distributed Systems (TPDS), 2024
  2. IPDPS
    VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs
    Luhan Wang, Haipeng Jia, Lei Xu, and 4 more authors
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024

2023

  1. ICPADS
    SA_TRSM: A Shape-Aware Auto-Tuning Framework for Small-Scale Irregular-Shaped TRSM
    Rongyuan Guo, Haipeng Jia, Yunquan Zhang, and 3 more authors
    In IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2023

2022

  1. ICPP
    IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs
    Cunyang Wei, Haipeng Jia, Yunquan Zhang, and 2 more authors
    In International Conference on Parallel Processing (ICPP), 2022
  2. HPCC
    LBBGEMM: A Load-Balanced Batch GEMM Framework on ARM CPUs
    Cunyang Wei, Haipeng Jia, Yunquan Zhang, and 2 more authors
    In IEEE International Conference on High Performance Computing & Communications (HPCC), 2022
  3. HPCC
    EgpuIP: An Embedded GPU Accelerated Library for Image Processing
    Luhan Wang, Haipeng Jia, Yunquan Zhang, and 2 more authors
    In IEEE International Conference on High Performance Computing & Communications (HPCC), 2022