image processing
Oct 2020 – 2022
High-performance image-processing libraries on CPUs and embedded GPUs.
High-performance Image Processing on ARMv8 · 2020 – 2021
- Categorized image-processing algorithms into three types: data-irrelevant, data-sharing, and irregular-memory-access algorithms, and tailored an optimization strategy to each.
- Built the library using ARM NEON intrinsics for the low-level kernels and OpenMP for multi-threaded performance.
- Optimized algorithms, memory access, SIMD usage, and assembly instructions to substantially improve throughput.
- Achieved speedups of 1.2x (cvtColor), 2x (Resize), and 2x (Filter) over the OpenCV library.
EgpuIP — Embedded GPU Accelerated Image Processing · HPCC 2022 · PDF
An embedded-GPU image-processing library built on GLES (OpenGL ES), addressing the lack of a general optimization guide for image processing on mobile GPUs.
- Proposed performance optimization chains that classify image-processing algorithms into three modes — data-independent, data-sharing, and data-related — by their memory-access and computation characteristics.
- Derived four optimization directions: (1) optimize off-chip memory access for memory-bound data-independent algorithms; (2) exploit data locality via shared memory and cache for data-sharing algorithms; (3) redesign algorithms to share computational results across threads for data-related algorithms; and (4) fully utilize compute resources across all three.
- Built the EgpuIP library and validated it on histogram equalization, Gaussian pyramid, and integral filter, achieving up to 19x, 88x, and 3x speedups over OpenCV, respectively.