GPU-accelerated sparse matrix-vector product for a hybridizable discontinuous Galerkin method

Abstract

The iterative solution of the large systems of equations that result from discontinuous Galerkin (DG) discretizations require the ability to carry out fast matrix-vector products. DG matrices have a sparse block structure with a constant number of non-zero equal-sized non-overlapping blocks per row. General-purpose sparse matrix-vector product algorithms are not designed to exploit the specic structure of the DG matrices and, as a consequence, result in sub-optimal performance. To address this issue, we propose a sparse matrix-vector product for DG discretizations based on a dense tensor contraction. A GPU implementation of the proposed algorithm for a hybridizable discontinuous Galerkin (HDG) method is tested on the NVIDIA GEFORCE GTX 285. The results show that the tensor contraction performs at about 20 to 25 GFLOP/s in double precision with a sustained eciency of more than 40 percent (60 GBytes/s) of the peak memory bandwidth (160 GBytes/s). Moreover, for HDG matrices in double precision, the proposed method is 2 times faster than the general sparse matrix-vector products provided by the GPU library CUSPARSE and about 30 times faster than MATLAB running on a CPU.

Publication
49th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition
Click the Cite button above to import publication metadata into your reference management software.
Ngoc Cuong Nguyen
Ngoc Cuong Nguyen
Principal Research Scientist

My research interests include computational mechanics, molecular mechanics, nanophotonics, scientific computing, and machine learning.