Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm
Published in PPoPP 25, 2025
Recommended citation: Wang, Zhibin, Xi Lin, Xue Li, Pinhuan Wang, Ziheng Meng, Hang Liu, Chen Tian, and Sheng Zhong. "Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm." In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 441-454. 2025. https://dl.acm.org/doi/abs/10.1145/3710848.3710884
The Louvain algorithm is one of the most popular algorithms for community detection. Observing that existing implementations suffer from inaccurate pruning and inefficient intermediate state management, we introduce GALA, GPU-Accelerated Louvain Algorithm, which incorporates two key innovations. The first innovation is a novel modularity gain-based pruning strategy, supported by rigorous theoretical guarantees of optimality and able to reduce up to 76% of vertices as well as their corresponding computations. To take advantage of the memory hierarchy and parallelism of GPUs, the second innovation is workload-aware kernels, featuring a shuffle-based kernel founded on the warp-level primitives for exchange states and a hash-based kernel that prioritizes shared memory in hashtable design. GALA further scales to multiple GPUs by minimizing the synchronization overhead between GPUs through a dense-sparse synchronization strategy. We evaluate the performance of GALA through theoretical analysis and practical experiments on various real-world graphs. The experimental results confirm that GALA significantly improves the performance of the parallel Louvain algorithm on GPUs, surpassing state-of-the-art solutions by 6x on average.
Recommended citation: Wang, Zhibin, Xi Lin, Xue Li, Pinhuan Wang, Ziheng Meng, Hang Liu, Chen Tian, and Sheng Zhong. “Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm.” In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 441-454. 2025.