A compressible stream solver based on GPU-CUDA point granular heterogeneous parallel
-
摘要:
采用GPU众核技术提升CFD软件性能是高性能计算的发展趋势。其中的关键问题之一是在中小型工作站乃至个人电脑(PC机)上实现CFD软件的计算加速和高效仿真。本文基于NVIDIA CUDA(compute unified device architecture),采用上下松弛隐式时间推进格式(data-parallel lower-upper relaxation,DPLUR),建立了CPU/GPU异构并行三维可压缩流的RANS(Reynold-averaged Naviers-Stokes equations)求解器,研究了影响该求解器计算加速的关键因素和应对策略,通过合理分配线程块内线程数量,减少主机与设备之间的数据通信,充分利用共享内存和高效求解Navier-Stokes方程,提高了求解器的并行效率,并在PC机上获得了相对于传统CPU求解器20倍以上的加速比。测试结果表明,在三维情况下,基于DPLUR的CPU/GPU异构并行三维可压缩流的RANS求解器比传统求解器的LUSGS迭代方法的收敛速度最高可提升40倍。
-
关键词:
- 计算流体力学 /
- 图像处理器 /
- 异构并行 /
- 上下松弛隐式时间推进格式 /
- 上下对称高斯-赛德尔迭代格式
Abstract:The use of Graphics Processing Unit (GPU) multi-core technology to improve the performance of Computational Fluid Dynamics (CFD) software is a development trend in high-performance computing. One of the key issues is to achieve computational acceleration and efficient simulation of CFD software on small and medium-sized workstations and even personal computers (PCs). In this article, a grid block granularity parallel scheme based on Jacobi iteration method and Lax-Friedrich flux scheme - Data Parallel Lower-Upper Relaxation (DPLUR) is adopted and then establish a Reynold-averaged Naviers-Stokes equations (RANS) solver for heterogeneous parallel three-dimensional compressible flows. This article focuses on the key factors and coping strategies that affect the computational acceleration of the solver, and an acceleration ratio of over 20 times compared to traditional Central Processing Unit (CPU) solvers on a PC has achieved. The test results show that the RANS solver based on DPLUR for CPU/GPU heterogeneous parallel three-dimensional compressible flows in three-dimensional situations, can improve the convergence speed by up to 40 times compared to the famous lower-upper symmetric Gauss-Seidel (LUSGS) method.
-
表 1 CPU和GPU硬件特点
Table 1. Hardware characteristics of CPU and GPU
硬件 特点 CPU 存储和控制单元多、计算单元少,
适于内存分配、逻辑控制计算。GPU 存储和控制单元少,计算单元多,
适于高密集型和高并发型计算。表 2 并行拓扑模式实验配置
Table 2. Test setup for parallel topology mode
组别 Grid Dim Block Dim 1 (i, j, l) (k, l, l) 2 (i, k, l) (j, l, l) 3 (j, k, l) (i, l, l) 4 (l, j, k) (i, l, l) 表 3 硬件配置
Table 3. Hardware configuration
硬件 型号 核心数 (最大内/显存)/GB CPU I9-13900K 24 128 GPU RTX3090 10496 24 表 4 不同网格数工况
Table 4. Test of different grids
GROUP 网格大小 总网格数 1 273×193×3 324576 2 137×97×3 84000 3 69×49×3 30429 -
[1] AFZAL A, ANSARI Z, FAIZABADI A R, et al. Parallelization strategies for computational fluid dynamics software: state of the art review[J]. Archives of Computational Methods in Engineering, 2017, 24(2): 337-363. doi: 10.1007/s11831-016-9165-4 [2] GARBEY M, VASSILEVSKI Y V. Aparallel solver for unsteady incompressible 3D Navier-Stokes equations[J]. Parallel Computing, 2001, 27(4): 363-389. doi: 10.1016/S0167-8191(00)00067-3 [3] BRANDVIK T, PULLAN G. Acceleration of a 3D Euler solver using commodity graphics hardware[C]//46th AIAA Aerospace Sciences Meeting and Exhibit, Nevada. : AIAA, 2008: 607-617. [4] LAI Jianqi, YU Hang, TIAN Zhengyu, et al. Hybrid MPI and CUDA parallelization for CFD applications on multi-GPU HPC clusters[J]. Scientific Programming, 2020, 2020(1): 8862123. doi: 10.1155/2020/8862123 [5] ZHANG Xi, GUO Xiaohu, WENG Yue, et al. Hybrid MPI and CUDA paralleled finite volume unstructured CFD simulations on a multi-GPU system[J]. Future Generation Computer Systems, 2023, 139(C): 1-16. doi: 10.1016/j.future.2022.09.005 [6] KAZEMI-KAMYAB V, VANZUIJLEN A H, BIJL H. Analysis and application of high order implicit Runge-Kutta schemes to collocated finite volume discretization of the incompressible Navier-Stokes equations[J]. Computers & Fluids, 2015, 108: 107-115. doi: 10.1016/j.compfluid.2014.11.025 [7] Kazemi-Kamyab V, van Zuijlen A, Bijl H. Analysis and application of high order implicit Runge–Kutta schemes to collocated finite volume discretization of the incompressible Navier–Stokes equations[J]. Computers & Fluids, 2015, 108: 107-115. doi: 10.1016/j.compfluid.2014.11.025 [8] CHEN R F, WANG Z J. Fast, block lower-upper symmetric Gauss-seidel scheme for arbitrary grids[J]. AIAA Journal, 2000, 38(12): 2238-2245. doi: 10.2514/2.914 [9] DJOMEHRI M J , JIN H H . Hybrid MPI+OpenMP Programming of an overset CFD solver and performance investigations: 20020054487[R]. Moffett Field, US: Ames Research Center, 2002. [10] LUO Lixiang, EDWARDS J R, LUO Hong, et al. A fine-grained block ILU scheme on regular structures for GPGPUs[J]. Computers & Fluids, 2015, 119: 149-161. doi: 10.1016/j.compfluid.2015.07.005 [11] ZHANG Jiale, MA Zhihua, CHEN Hongquan, et al. A GPU-accelerated implicit meshless method for compressible flows[J]. Journal of Computational Physics, 2018, 360: 39-56. doi: 10.1016/j.jcp.2018.01.037 [12] LI Zongzhe, YAO Lu, CAO Wei, et al. Notice of Retraction: Parallel block multigrid preconditioner for 3D Navier-Stokes equations on unstructured grids[C]//2010 International Conference on Computer Application and System Modeling. Piscataway, US: IEEE, 2010: V13-150-V13-154. [13] WRIGHT M J, CANDLER G V, PRAMPOLINI M. Data-parallel lower-upper relaxation method for the Navier-Stokes equations[J]. AIAA Journal, 1996, 34(7): 1371-1377. doi: 10.2514/3.13242 [14] TANNO I, MORINISHI K, SATOFUKA N, et al. Calculation by artificial compressibility method and virtual flux method on GPU[J]. Computers & Fluids, 2011, 45(1): 162-167. doi: 10.1016/j.compfluid.2011.02.005 [15] ZHOU Bohao, HUANG Xudong, ZHANG Ke, et al. Implicit block data-parallel relaxation scheme of Navier-Stokes equations using graphics processing units[J]. Physics of Fluids, 2022, 34(11): 117109. doi: 10.1063/5.0119698 [16] 朱柯辛. 面向CFD层次化内存中的缓存管理研究[D]. 成都: 电子科技大学, 2024. ZHU Kexin. Research on cache management in hierarchical memory for CFD applications[D]. Chengdu: University of Electronic Science and Technology of China, 2024. (in ChineseZHU Kexin. Research on cache management in hierarchical memory for CFD applications[D]. Chengdu: University of Electronic Science and Technology of China, 2024. (in Chinese) [17] 梁正虹. CPU+GPU异构并行计算研究及其在可压缩流动中的应用[D]. 绵阳: 西南科技大学, 2021. LIANG Zhenghong. Research on CPU+GPU heterogeneous parallel computing and its application in compressible flow[D]. Mianyang: Southwest University of Science and Technology, 2021. (in ChineseLIANG Zhenghong. Research on CPU+GPU heterogeneous parallel computing and its application in compressible flow[D]. Mianyang: Southwest University of Science and Technology, 2021. (in Chinese) [18] 雷江. 基于GPU的高精度格式并行方法研究[D]. 长沙: 国防科技大学, 2019. LEI Jiang. Research on parallel methods using high-order scheme on GPU[D]. Changsha: National University of Defense Technology, 2019. (in ChineseLEI Jiang. Research on parallel methods using high-order scheme on GPU[D]. Changsha: National University of Defense Technology, 2019. (in Chinese) [19] 孟伟超. 基于GPU/CPU多级并行CFD优化策略的研究[D]. 上海: 上海交通大学, 2012. MENG Weichao. A study on GPU/CPU based multi-level optimization strategies for CFD method[D]. Shanghai: Shanghai Jiao Tong University, 2012. (in ChineseMENG Weichao. A study on GPU/CPU based multi-level optimization strategies for CFD method[D]. Shanghai: Shanghai Jiao Tong University, 2012. (in Chinese) [20] 李文涛. GPU并行编程框架安全可靠性研究[D]. 长沙: 湖南大学, 2021. LI Wentao. Research on the security and reliability of GPU parallel programming frameworks [D]. Changsha: Hunan University, 2021. (in ChineseLI Wentao. Research on the security and reliability of GPU parallel programming frameworks [D]. Changsha: Hunan University, 2021. (in Chinese) [21] 徐传福, 车永刚, 李大力, 等. 天河超级计算机上超大规模高精度计算流体力学并行计算研究进展[J]. 计算机工程与科学, 2020, 42(10): 1815-1826. XU Chuanfu, CHE Yonggang, LI Dali, et al. Research progresses of large-scale parallel computing for high-order CFD on the Tianhe supercomputer[J]. Computer Engineering & Science, 2020, 42(10): 1815-1826. (in ChineseXU Chuanfu, CHE Yonggang, LI Dali, et al. Research progresses of large-scale parallel computing for high-order CFD on the Tianhe supercomputer[J]. Computer Engineering & Science, 2020, 42(10): 1815-1826. (in Chinese) [22] 赖剑奇. 基于GPU的超声速复杂流动并行计算研究[D]. 长沙: 国防科技大学, 2020. LAI Jianqi. Research on GPU-based parallel computing for supersonic complex flow applications[D]. Changsha: National University of Defense Technology, 2020. (in ChineseLAI Jianqi. Research on GPU-based parallel computing for supersonic complex flow applications[D]. Changsha: National University of Defense Technology, 2020. (in Chinese) [23] 叶创超. 基于GPU异构计算的可压缩复杂流动高精度数值模拟[D]. 合肥: 中国科学技术大学, 2020. YE Chuangchao. High order numerical simulation of complex compressible flows based on GPU heterogeneous computing[D]. Hefei: University of Science and Technology of China, 2020. (in ChineseYE Chuangchao. High order numerical simulation of complex compressible flows based on GPU heterogeneous computing[D]. Hefei: University of Science and Technology of China, 2020. (in Chinese) [24] 曹维. 大规模CFD高效CPU/GPU异构并行计算关键技术研究[D]. 长沙: 国防科学技术大学, 2014. CAO Wei. Research on efficient and large-scale CPU and GPU heterogeneous parallel computing for CFD applications[D]. Changsha: National University of Defense Technology, 2014. (in ChineseCAO Wei. Research on efficient and large-scale CPU and GPU heterogeneous parallel computing for CFD applications[D]. Changsha: National University of Defense Technology, 2014. (in Chinese) [25] 张加乐. 面向求解三维复杂流动问题的GPU并行算法及其应用研究[D]. 南京: 南京航空航天大学, 2018. ZHANG Jiale. Research on GPU-accelerated numerical methods and their applications for three-dimensional complex flows[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2018. (in ChineseZHANG Jiale. Research on GPU-accelerated numerical methods and their applications for three-dimensional complex flows[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2018. (in Chinese) [26] 吕相文. 高性能计算云环境下GPU并行计算技术及应用研究[D]. 南京: 南京航空航天大学, 2015. LYU Xiangwen. Research on GPU parallel computing technology and application in high-performance computing cloud environment [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015. (in ChineseLYU Xiangwen. Research on GPU parallel computing technology and application in high-performance computing cloud environment [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015. (in Chinese) -

下载: