留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于GPU-CUDA点粒度异构并行的可压缩流求解器

王清池 兰旭东 李昊昱 段依沁

王清池, 兰旭东, 李昊昱, 等. 基于GPU-CUDA点粒度异构并行的可压缩流求解器[J]. 航空动力学报, 2026, 41(5):20240435 doi: 10.13224/j.cnki.jasp.20240435
引用本文: 王清池, 兰旭东, 李昊昱, 等. 基于GPU-CUDA点粒度异构并行的可压缩流求解器[J]. 航空动力学报, 2026, 41(5):20240435 doi: 10.13224/j.cnki.jasp.20240435
WANG Qingchi, LAN Xudong, LI Haoyu, et al. A compressible stream solver based on GPU-CUDA point granular heterogeneous parallel[J]. Journal of Aerospace Power, 2026, 41(5):20240435 doi: 10.13224/j.cnki.jasp.20240435
Citation: WANG Qingchi, LAN Xudong, LI Haoyu, et al. A compressible stream solver based on GPU-CUDA point granular heterogeneous parallel[J]. Journal of Aerospace Power, 2026, 41(5):20240435 doi: 10.13224/j.cnki.jasp.20240435

基于GPU-CUDA点粒度异构并行的可压缩流求解器

doi: 10.13224/j.cnki.jasp.20240435
详细信息
    作者简介:

    王清池(2001-),女,硕士,研究方向为计算流体力学、流体力学。E-mail:wqc22@mails.tsinghua.edu.cn

    通讯作者:

    兰旭东(1981-),男,教授,博士,研究方向为气动、传热、发动机和无人直升机整机开发等。E-mail:lanxd@mail.tsinghua.edu.cn

  • 中图分类号: V211.4;TB126

A compressible stream solver based on GPU-CUDA point granular heterogeneous parallel

  • 摘要:

    采用GPU众核技术提升CFD软件性能是高性能计算的发展趋势。其中的关键问题之一是在中小型工作站乃至个人电脑(PC机)上实现CFD软件的计算加速和高效仿真。本文基于NVIDIA CUDA(compute unified device architecture),采用上下松弛隐式时间推进格式(data-parallel lower-upper relaxation,DPLUR),建立了CPU/GPU异构并行三维可压缩流的RANS(Reynold-averaged Naviers-Stokes equations)求解器,研究了影响该求解器计算加速的关键因素和应对策略,通过合理分配线程块内线程数量,减少主机与设备之间的数据通信,充分利用共享内存和高效求解Navier-Stokes方程,提高了求解器的并行效率,并在PC机上获得了相对于传统CPU求解器20倍以上的加速比。测试结果表明,在三维情况下,基于DPLUR的CPU/GPU异构并行三维可压缩流的RANS求解器比传统求解器的LUSGS迭代方法的收敛速度最高可提升40倍。

     

  • 图 1  LUSGS格式推进

    Figure 1.  Advancement of LUSGS

    图 2  DPLUR格式推进

    Figure 2.  Advancement of DPLUR

    图 3  异构并行迭代程序架构

    Figure 3.  Heterogeneous parallel iterative program architecture

    图 4  各计算步骤耗时

    Figure 4.  Time consumption of each calculation step

    图 5  不同并行拓扑形式耗时对比

    Figure 5.  Comparison of time consumption for different parallel topology forms

    图 6  数组-结构体访存模式对比

    Figure 6.  Comparison of array-structure memory access patterns

    图 7  不同结构体存储模式耗时对比

    Figure 7.  Comparison of time consumption for different struct storage modes

    图 8  3DB算例边界条件与网格

    Figure 8.  Boundary conditions and grid for 3DB

    图 9  无量纲化湍动能k和湍流频率ω算例验证

    Figure 9.  Test of dimensionless turbulence kinetic energy k and turbulence frequency ω

    图 10  DPLUR和LUSGS收敛速度对比

    Figure 10.  Comparison of convergence speed between DPLUR and LUSGS

    图 11  DPLUR和LUSGS加速比对比

    Figure 11.  Comparison of acceleration ratios between DPLUR and LUSGS

    图 12  CPU/GPU异构体系中两种格式性能对比

    Figure 12.  Performance comparison of two formats in CPU/GPU heterogeneous systems

    图 13  3DZP算例边界条件

    Figure 13.  Boundary conditions of 3DZP

    图 14  无量纲化湍动能k和湍流频率ω算例验证

    Figure 14.  Test of dimensionless turbulence kinetic energy k and turbulence frequency ω

    图 15  3种网格数的无量纲化湍流系数Mu和湍流频率ω对比

    Figure 15.  Comparison of dimensionless turbulence coefficient Mu and turbulence frequency ω for three grid sizes

    图 16  3种网格数的加速比对比

    Figure 16.  Comparison of speedup ratios for three grid sizes

    图 17  DPLUR和LUSGS收敛速度对比

    Figure 17.  Comparison of convergence speed between DPLUR and LUSGS

    图 18  DPLUR和LUSGS加速比对比

    Figure 18.  Comparison of acceleration ratios between DPLUR and LUSGS

    图 19  3DANW算例边界条件及网格

    Figure 19.  Boundary conditions and grids of 3DANW

    图 20  无量纲化湍动能k和湍流频率ω算例验证

    Figure 20.  Test of dimensionless turbulence kinetic energy k and turbulence frequency ω

    图 21  DPLUR和LUSGS收敛速度对比

    Figure 21.  Comparison of convergence speed between DPLUR and LUSGS

    图 22  DPLUR和LUSGS加速比对比

    Figure 22.  Comparison of acceleration ratios between DPLUR and LUSGS

    表  1  CPU和GPU硬件特点

    Table  1.   Hardware characteristics of CPU and GPU

    硬件特点
    CPU存储和控制单元多、计算单元少,
    适于内存分配、逻辑控制计算。
    GPU存储和控制单元少,计算单元多,
    适于高密集型和高并发型计算。
    下载: 导出CSV

    表  2  并行拓扑模式实验配置

    Table  2.   Test setup for parallel topology mode

    组别 Grid Dim Block Dim
    1 i, j, l k, l, l
    2 i, k, l j, l, l
    3 j, k, l i, l, l
    4 l, j, k i, l, l
    下载: 导出CSV

    表  3  硬件配置

    Table  3.   Hardware configuration

    硬件 型号 核心数 (最大内/显存)/GB
    CPU I9-13900K 24 128
    GPU RTX3090 10496 24
    下载: 导出CSV

    表  4  不同网格数工况

    Table  4.   Test of different grids

    GROUP 网格大小 总网格数
    1 273×193×3 324576
    2 137×97×3 84000
    3 69×49×3 30429
    下载: 导出CSV
  • [1] AFZAL A, ANSARI Z, FAIZABADI A R, et al. Parallelization strategies for computational fluid dynamics software: state of the art review[J]. Archives of Computational Methods in Engineering, 2017, 24(2): 337-363. doi: 10.1007/s11831-016-9165-4
    [2] GARBEY M, VASSILEVSKI Y V. Aparallel solver for unsteady incompressible 3D Navier-Stokes equations[J]. Parallel Computing, 2001, 27(4): 363-389. doi: 10.1016/S0167-8191(00)00067-3
    [3] BRANDVIK T, PULLAN G. Acceleration of a 3D Euler solver using commodity graphics hardware[C]//46th AIAA Aerospace Sciences Meeting and Exhibit, Nevada. : AIAA, 2008: 607-617.
    [4] LAI Jianqi, YU Hang, TIAN Zhengyu, et al. Hybrid MPI and CUDA parallelization for CFD applications on multi-GPU HPC clusters[J]. Scientific Programming, 2020, 2020(1): 8862123. doi: 10.1155/2020/8862123
    [5] ZHANG Xi, GUO Xiaohu, WENG Yue, et al. Hybrid MPI and CUDA paralleled finite volume unstructured CFD simulations on a multi-GPU system[J]. Future Generation Computer Systems, 2023, 139(C): 1-16. doi: 10.1016/j.future.2022.09.005
    [6] KAZEMI-KAMYAB V, VANZUIJLEN A H, BIJL H. Analysis and application of high order implicit Runge-Kutta schemes to collocated finite volume discretization of the incompressible Navier-Stokes equations[J]. Computers & Fluids, 2015, 108: 107-115. doi: 10.1016/j.compfluid.2014.11.025
    [7] Kazemi-Kamyab V, van Zuijlen A, Bijl H. Analysis and application of high order implicit Runge–Kutta schemes to collocated finite volume discretization of the incompressible Navier–Stokes equations[J]. Computers & Fluids, 2015, 108: 107-115. doi: 10.1016/j.compfluid.2014.11.025
    [8] CHEN R F, WANG Z J. Fast, block lower-upper symmetric Gauss-seidel scheme for arbitrary grids[J]. AIAA Journal, 2000, 38(12): 2238-2245. doi: 10.2514/2.914
    [9] DJOMEHRI M J , JIN H H . Hybrid MPI+OpenMP Programming of an overset CFD solver and performance investigations: 20020054487[R]. Moffett Field, US: Ames Research Center, 2002.
    [10] LUO Lixiang, EDWARDS J R, LUO Hong, et al. A fine-grained block ILU scheme on regular structures for GPGPUs[J]. Computers & Fluids, 2015, 119: 149-161. doi: 10.1016/j.compfluid.2015.07.005
    [11] ZHANG Jiale, MA Zhihua, CHEN Hongquan, et al. A GPU-accelerated implicit meshless method for compressible flows[J]. Journal of Computational Physics, 2018, 360: 39-56. doi: 10.1016/j.jcp.2018.01.037
    [12] LI Zongzhe, YAO Lu, CAO Wei, et al. Notice of Retraction: Parallel block multigrid preconditioner for 3D Navier-Stokes equations on unstructured grids[C]//2010 International Conference on Computer Application and System Modeling. Piscataway, US: IEEE, 2010: V13-150-V13-154.
    [13] WRIGHT M J, CANDLER G V, PRAMPOLINI M. Data-parallel lower-upper relaxation method for the Navier-Stokes equations[J]. AIAA Journal, 1996, 34(7): 1371-1377. doi: 10.2514/3.13242
    [14] TANNO I, MORINISHI K, SATOFUKA N, et al. Calculation by artificial compressibility method and virtual flux method on GPU[J]. Computers & Fluids, 2011, 45(1): 162-167. doi: 10.1016/j.compfluid.2011.02.005
    [15] ZHOU Bohao, HUANG Xudong, ZHANG Ke, et al. Implicit block data-parallel relaxation scheme of Navier-Stokes equations using graphics processing units[J]. Physics of Fluids, 2022, 34(11): 117109. doi: 10.1063/5.0119698
    [16] 朱柯辛. 面向CFD层次化内存中的缓存管理研究[D]. 成都: 电子科技大学, 2024. ZHU Kexin. Research on cache management in hierarchical memory for CFD applications[D]. Chengdu: University of Electronic Science and Technology of China, 2024. (in Chinese

    ZHU Kexin. Research on cache management in hierarchical memory for CFD applications[D]. Chengdu: University of Electronic Science and Technology of China, 2024. (in Chinese)
    [17] 梁正虹. CPU+GPU异构并行计算研究及其在可压缩流动中的应用[D]. 绵阳: 西南科技大学, 2021. LIANG Zhenghong. Research on CPU+GPU heterogeneous parallel computing and its application in compressible flow[D]. Mianyang: Southwest University of Science and Technology, 2021. (in Chinese

    LIANG Zhenghong. Research on CPU+GPU heterogeneous parallel computing and its application in compressible flow[D]. Mianyang: Southwest University of Science and Technology, 2021. (in Chinese)
    [18] 雷江. 基于GPU的高精度格式并行方法研究[D]. 长沙: 国防科技大学, 2019. LEI Jiang. Research on parallel methods using high-order scheme on GPU[D]. Changsha: National University of Defense Technology, 2019. (in Chinese

    LEI Jiang. Research on parallel methods using high-order scheme on GPU[D]. Changsha: National University of Defense Technology, 2019. (in Chinese)
    [19] 孟伟超. 基于GPU/CPU多级并行CFD优化策略的研究[D]. 上海: 上海交通大学, 2012. MENG Weichao. A study on GPU/CPU based multi-level optimization strategies for CFD method[D]. Shanghai: Shanghai Jiao Tong University, 2012. (in Chinese

    MENG Weichao. A study on GPU/CPU based multi-level optimization strategies for CFD method[D]. Shanghai: Shanghai Jiao Tong University, 2012. (in Chinese)
    [20] 李文涛. GPU并行编程框架安全可靠性研究[D]. 长沙: 湖南大学, 2021. LI Wentao. Research on the security and reliability of GPU parallel programming frameworks [D]. Changsha: Hunan University, 2021. (in Chinese

    LI Wentao. Research on the security and reliability of GPU parallel programming frameworks [D]. Changsha: Hunan University, 2021. (in Chinese)
    [21] 徐传福, 车永刚, 李大力, 等. 天河超级计算机上超大规模高精度计算流体力学并行计算研究进展[J]. 计算机工程与科学, 2020, 42(10): 1815-1826. XU Chuanfu, CHE Yonggang, LI Dali, et al. Research progresses of large-scale parallel computing for high-order CFD on the Tianhe supercomputer[J]. Computer Engineering & Science, 2020, 42(10): 1815-1826. (in Chinese

    XU Chuanfu, CHE Yonggang, LI Dali, et al. Research progresses of large-scale parallel computing for high-order CFD on the Tianhe supercomputer[J]. Computer Engineering & Science, 2020, 42(10): 1815-1826. (in Chinese)
    [22] 赖剑奇. 基于GPU的超声速复杂流动并行计算研究[D]. 长沙: 国防科技大学, 2020. LAI Jianqi. Research on GPU-based parallel computing for supersonic complex flow applications[D]. Changsha: National University of Defense Technology, 2020. (in Chinese

    LAI Jianqi. Research on GPU-based parallel computing for supersonic complex flow applications[D]. Changsha: National University of Defense Technology, 2020. (in Chinese)
    [23] 叶创超. 基于GPU异构计算的可压缩复杂流动高精度数值模拟[D]. 合肥: 中国科学技术大学, 2020. YE Chuangchao. High order numerical simulation of complex compressible flows based on GPU heterogeneous computing[D]. Hefei: University of Science and Technology of China, 2020. (in Chinese

    YE Chuangchao. High order numerical simulation of complex compressible flows based on GPU heterogeneous computing[D]. Hefei: University of Science and Technology of China, 2020. (in Chinese)
    [24] 曹维. 大规模CFD高效CPU/GPU异构并行计算关键技术研究[D]. 长沙: 国防科学技术大学, 2014. CAO Wei. Research on efficient and large-scale CPU and GPU heterogeneous parallel computing for CFD applications[D]. Changsha: National University of Defense Technology, 2014. (in Chinese

    CAO Wei. Research on efficient and large-scale CPU and GPU heterogeneous parallel computing for CFD applications[D]. Changsha: National University of Defense Technology, 2014. (in Chinese)
    [25] 张加乐. 面向求解三维复杂流动问题的GPU并行算法及其应用研究[D]. 南京: 南京航空航天大学, 2018. ZHANG Jiale. Research on GPU-accelerated numerical methods and their applications for three-dimensional complex flows[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2018. (in Chinese

    ZHANG Jiale. Research on GPU-accelerated numerical methods and their applications for three-dimensional complex flows[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2018. (in Chinese)
    [26] 吕相文. 高性能计算云环境下GPU并行计算技术及应用研究[D]. 南京: 南京航空航天大学, 2015. LYU Xiangwen. Research on GPU parallel computing technology and application in high-performance computing cloud environment [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015. (in Chinese

    LYU Xiangwen. Research on GPU parallel computing technology and application in high-performance computing cloud environment [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015. (in Chinese)
  • 加载中
图(22) / 表(4)
计量
  • 文章访问数:  250
  • HTML浏览量:  121
  • PDF量:  34
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-07-01
  • 网络出版日期:  2026-02-13

目录

    /

    返回文章
    返回