Lattice Boltzmann Method on GPU: Sailfish

Sailfish logoSince 2009 we develop, an open source fluid simulation package implementing the lattice Boltzmann method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL under codename "Sailfish".

We take a novel approach to GPU code implementation and use run-time code generation techniques and a high level programming language (Python) to achieve state of the art performance, while allowing easy experimentation with different LBM models and tuning for various types of hardware.


Sailfish implements:

Feature typeSupported variants
lattices D2Q9
D3Q13, D3Q15, D3Q19, D3Q27
body forces Guo’s method
Exact Difference Method
relaxation dynamics LBGK
ELBM (entropic)
multicomponent models Shan-Chen
free energy [PRE78]
turbulence Smagorinsky LES
other models single-phase Shan-Chen
shallow water
incompressible LBGK
other features round-off minimization model
distributed simulations ad-hoc (SSH)
precision single
output formats numpy
computational backends CUDA
on-GPU statistics 1D profiles of the first 4 moments of velocity and density
1D profiles of correlations of velocity components and density
total kientic energy and enstrophy



High-level architecture of a distributed Sailfish simulation divided into 4 subdomains. The controller process uses the execnet Python library to start machine master processes on two computational nodes, which then spawn children processes to handle individual subdomains. The master and subdomain handlers use the ZeroMQ library to communicate.

Full-size image (43 K)


Performance comparison of different models using the D3Q19 lattice, AB memory access pattern. Used acronyms: ELBM: entropic LBM, SC: Shan–Chen, FE: free energy. Both logical GPUs were used for the K10 tests. The SM clock of the K40 card was set to 875 MHz. Left panel: single precision. Right panel: double precision.

Full-size image (56 K)

  1. Sailfish Reference Manual — Sailfish 2013.1 documentation
  2. sailfish-team/sailfish
  3. arXiv:1311.2404