Block-asynchronous and Jacobi smoothers for a multigrid solver on GPU-accelerated HPC clusters