In GPU, shared memory is organized into a number of banks (e.g. 16 or 32). Consecutive location of shared memory fall into different banks.

There are no problem to access different banks at the same time, and multiple accessing of the same element in a bank can also happen in parallel. However, access to different elements in the same bank happens sequentially.

The bank conflicts problem can be solved by change allocated shared memory size (even in case where it will leave artificial blank spots)

Fast path: all threads in a warp access different banks bank conflict 1.webp Fast path 2: multiple threads in a warp access the same address (broadcast) bank conflict 2.webp

Slow path: multiple threads in a wrap access the same bank and access is serialized bank conflict 3.webp